Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
datasets:
|
| 4 |
+
- charlieoneill/csLG
|
| 5 |
+
- JSALT2024-Astro-LLMs/astro_paper_corpus
|
| 6 |
+
language:
|
| 7 |
+
- en
|
| 8 |
+
tags:
|
| 9 |
+
- sparse-autoencoder
|
| 10 |
+
- embeddings
|
| 11 |
+
- interpretability
|
| 12 |
+
- scientific-nlp
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Sparse Autoencoders for Scientific Paper Embeddings
|
| 16 |
+
|
| 17 |
+
This repository contains a collection of Sparse Autoencoders (SAEs) trained on embeddings from scientific papers in two domains: Computer Science (cs.LG) and Astrophysics (astro.PH). These SAEs are designed to disentangle semantic concepts in dense embeddings while maintaining semantic fidelity.
|
| 18 |
+
|
| 19 |
+
## Model Description
|
| 20 |
+
|
| 21 |
+
### Overview
|
| 22 |
+
|
| 23 |
+
The SAEs in this repository are trained on embeddings of scientific paper abstracts from arXiv, specifically from the cs.LG (Computer Science - Machine Learning) and astro.PH (Astrophysics) categories. They are designed to extract interpretable features from dense text embeddings derived from large language models.
|
| 24 |
+
|
| 25 |
+
### Model Architecture
|
| 26 |
+
|
| 27 |
+
Each SAE follows a top-k architecture with varying hyperparameters:
|
| 28 |
+
- k: number of active latents (16, 32, 64, or 128)
|
| 29 |
+
- n: total number of latents (3072, 4608, 6144, 9216, or 12288)
|
| 30 |
+
|
| 31 |
+
The naming convention for the models is:
|
| 32 |
+
`{domain}_{k}_{n}_{batch_size}.pth`
|
| 33 |
+
|
| 34 |
+
For example, `csLG_128_3072_256.pth` represents an SAE trained on cs.LG data with k=128, n=3072, and a batch size of 256.
|
| 35 |
+
|
| 36 |
+
## Intended Uses & Limitations
|
| 37 |
+
|
| 38 |
+
These SAEs are primarily intended for:
|
| 39 |
+
1. Extracting interpretable features from dense embeddings of scientific texts
|
| 40 |
+
2. Enabling fine-grained control over semantic search in scientific literature
|
| 41 |
+
3. Studying the structure of semantic spaces in specific scientific domains
|
| 42 |
+
|
| 43 |
+
Limitations:
|
| 44 |
+
- The models are domain-specific (cs.LG and astro.PH) and may not generalize well to other domains
|
| 45 |
+
- Performance may vary depending on the quality and domain-specificity of the input embeddings
|
| 46 |
+
|
| 47 |
+
## Training Data
|
| 48 |
+
|
| 49 |
+
The SAEs were trained on embeddings of abstracts from:
|
| 50 |
+
- cs.LG: 153,000 papers
|
| 51 |
+
- astro.PH: 272,000 papers
|
| 52 |
+
|
| 53 |
+
## Training Procedure
|
| 54 |
+
|
| 55 |
+
The SAEs were trained using a custom loss function combining reconstruction loss, sparsity constraints, and an auxiliary loss. For detailed training procedures, please refer to our paper (link to be added upon publication).
|
| 56 |
+
|
| 57 |
+
## Evaluation Results
|
| 58 |
+
|
| 59 |
+
Performance metrics for various configurations:
|
| 60 |
+
|
| 61 |
+
| k | n | Domain | MSE | Log FD | Act Mean |
|
| 62 |
+
|-----|-------|----------|--------|---------|----------|
|
| 63 |
+
| 16 | 3072 | astro.PH | 0.2264 | -2.7204 | 0.1264 |
|
| 64 |
+
| 16 | 3072 | cs.LG | 0.2284 | -2.7314 | 0.1332 |
|
| 65 |
+
| 64 | 9216 | astro.PH | 0.1182 | -2.4682 | 0.0539 |
|
| 66 |
+
| 64 | 9216 | cs.LG | 0.1240 | -2.3536 | 0.0545 |
|
| 67 |
+
| 128 | 12288 | astro.PH | 0.0936 | -2.7025 | 0.0399 |
|
| 68 |
+
| 128 | 12288 | cs.LG | 0.0942 | -2.0858 | 0.0342 |
|
| 69 |
+
|
| 70 |
+
MSE: Normalised Mean Squared Error
|
| 71 |
+
Log FD: Mean log density of feature activations
|
| 72 |
+
Act Mean: Mean activation value across non-zero features
|
| 73 |
+
|
| 74 |
+
For full results, please refer to our paper (link to be added upon publication).
|
| 75 |
+
|
| 76 |
+
## Ethical Considerations
|
| 77 |
+
|
| 78 |
+
While these models are designed to improve interpretability, users should be aware that:
|
| 79 |
+
1. The extracted features may reflect biases present in the scientific literature used for training
|
| 80 |
+
2. Interpretations of the features should be validated carefully, especially when used for decision-making processes
|
| 81 |
+
|
| 82 |
+
## Citation
|
| 83 |
+
|
| 84 |
+
If you use these models in your research, please cite our paper (citation to be added upon publication).
|
| 85 |
+
|
| 86 |
+
## Additional Information
|
| 87 |
+
|
| 88 |
+
For more details on the methodology, feature families, and applications in semantic search, please refer to our full paper (link to be added upon publication).
|