--- language: - en tags: - protein-language-models - sparse-autoencoder license: mit --- # Sparse Autoencoders for ESM-2 (8M) Interpret protein language model representations using sparse autoencoders trained on ESM-2 (8M) layers. These models decompose complex neural representations into interpretable features, enabling deeper understanding of how protein language models process sequence information. * 📊 Model details in the [InterPLM pre-print](https://www.biorxiv.org/content/10.1101/2024.11.14.623630v1) * 👩‍💻 Training and analysis code in the [GitHub repo](https://github.com/ElanaPearl/InterPLM) * 🧬 Explore features at [InterPLM.ai](https://www.interplm.ai) ## Model Details - Base Model: ESM-2 8M (6 layers) - Architecture: Sparse Autoencoder - Input Dimension: 320 - Feature Dimension: 10,240 ## Available Models We provide SAE models trained on different layers of ESM-2-8M: | Model name | ESM2 model | ESM2 layer | |-|-|-| | [InterPLM-esm2-8m-l1](https://huggingface.co/Elana/InterPLM-esm2-8m/tree/main/layer_1) | esm2_t6_8m_UR50D | 1 | | [InterPLM-esm2-8m-l2](https://huggingface.co/Elana/InterPLM-esm2-8m/tree/main/layer_2) | esm2_t6_8m_UR50D | 2 | | [InterPLM-esm2-8m-l3](https://huggingface.co/Elana/InterPLM-esm2-8m/tree/main/layer_3) | esm2_t6_8m_UR50D | 3 | | [InterPLM-esm2-8m-l4](https://huggingface.co/Elana/InterPLM-esm2-8m/tree/main/layer_4) | esm2_t6_8m_UR50D | 4 | | [InterPLM-esm2-8m-l5](https://huggingface.co/Elana/InterPLM-esm2-8m/tree/main/layer_5) | esm2_t6_8m_UR50D | 5 | | [InterPLM-esm2-8m-l6](https://huggingface.co/Elana/InterPLM-esm2-8m/tree/main/layer_6) | esm2_t6_8m_UR50D | 6 | All models share the same architecture and dictionary size (10,240). See [here](https://huggingface.co/Elana/InterPLM-esm2-650m) for SAEs trained on ESM-2 650M. The 650M SAEs capture more known biological concepts than the 8M but require additional compute for both ESM embedding and SAE feature extraction. ## Usage Extract interpretable features from protein sequences: ```python from interplm.sae.inference import load_sae_from_hf from interplm.esm.embed import embed_single_sequence # Get ESM embeddings for protein sequence embeddings = embed_single_sequence( sequence="MRWQEMGYIFYPRKLR", model_name="esm2_t6_8M_UR50D", layer=4 # Choose ESM layer (1-6) ) # Load SAE model and extract features sae = load_sae_from_hf(plm_model="esm2-8m", plm_layer=4) features = sae.encode(embeddings) ``` For detailed training and analysis examples, see the [GitHub README](https://github.com/ElanaPearl/InterPLM/blob/main/README.md). ## Model Variants The SAEs we've trained have arbitrary scales between features since encoder/decoder weights could be linearly scaled without changing reconstructions. To make features comparable, we normalize them to activate between 0-1 based on max activation values from Swiss-Prot (since this is our primary analysis dataset). By default, use our pre-normalized SAEs (`ae_normalized.pt`). As this might not perfectly scale features not present in Swiss-Prot proteins, for custom normalization use `ae_unnormalized.pt` with [this code](https://github.com/ElanaPearl/InterPLM/blob/main/interplm/sae/normalize.py).