charlieoneill commited on
Commit
5a7816e
1 Parent(s): 7633ea4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - charlieoneill/csLG
5
+ - JSALT2024-Astro-LLMs/astro_paper_corpus
6
+ language:
7
+ - en
8
+ tags:
9
+ - sparse-autoencoder
10
+ - embeddings
11
+ - interpretability
12
+ - scientific-nlp
13
+ ---
14
+
15
+ # Sparse Autoencoders for Scientific Paper Embeddings
16
+
17
+ This repository contains a collection of Sparse Autoencoders (SAEs) trained on embeddings from scientific papers in two domains: Computer Science (cs.LG) and Astrophysics (astro.PH). These SAEs are designed to disentangle semantic concepts in dense embeddings while maintaining semantic fidelity.
18
+
19
+ ## Model Description
20
+
21
+ ### Overview
22
+
23
+ The SAEs in this repository are trained on embeddings of scientific paper abstracts from arXiv, specifically from the cs.LG (Computer Science - Machine Learning) and astro.PH (Astrophysics) categories. They are designed to extract interpretable features from dense text embeddings derived from large language models.
24
+
25
+ ### Model Architecture
26
+
27
+ Each SAE follows a top-k architecture with varying hyperparameters:
28
+ - k: number of active latents (16, 32, 64, or 128)
29
+ - n: total number of latents (3072, 4608, 6144, 9216, or 12288)
30
+
31
+ The naming convention for the models is:
32
+ `{domain}_{k}_{n}_{batch_size}.pth`
33
+
34
+ For example, `csLG_128_3072_256.pth` represents an SAE trained on cs.LG data with k=128, n=3072, and a batch size of 256.
35
+
36
+ ## Intended Uses & Limitations
37
+
38
+ These SAEs are primarily intended for:
39
+ 1. Extracting interpretable features from dense embeddings of scientific texts
40
+ 2. Enabling fine-grained control over semantic search in scientific literature
41
+ 3. Studying the structure of semantic spaces in specific scientific domains
42
+
43
+ Limitations:
44
+ - The models are domain-specific (cs.LG and astro.PH) and may not generalize well to other domains
45
+ - Performance may vary depending on the quality and domain-specificity of the input embeddings
46
+
47
+ ## Training Data
48
+
49
+ The SAEs were trained on embeddings of abstracts from:
50
+ - cs.LG: 153,000 papers
51
+ - astro.PH: 272,000 papers
52
+
53
+ ## Training Procedure
54
+
55
+ The SAEs were trained using a custom loss function combining reconstruction loss, sparsity constraints, and an auxiliary loss. For detailed training procedures, please refer to our paper (link to be added upon publication).
56
+
57
+ ## Evaluation Results
58
+
59
+ Performance metrics for various configurations:
60
+
61
+ | k | n | Domain | MSE | Log FD | Act Mean |
62
+ |-----|-------|----------|--------|---------|----------|
63
+ | 16 | 3072 | astro.PH | 0.2264 | -2.7204 | 0.1264 |
64
+ | 16 | 3072 | cs.LG | 0.2284 | -2.7314 | 0.1332 |
65
+ | 64 | 9216 | astro.PH | 0.1182 | -2.4682 | 0.0539 |
66
+ | 64 | 9216 | cs.LG | 0.1240 | -2.3536 | 0.0545 |
67
+ | 128 | 12288 | astro.PH | 0.0936 | -2.7025 | 0.0399 |
68
+ | 128 | 12288 | cs.LG | 0.0942 | -2.0858 | 0.0342 |
69
+
70
+ MSE: Normalised Mean Squared Error
71
+ Log FD: Mean log density of feature activations
72
+ Act Mean: Mean activation value across non-zero features
73
+
74
+ For full results, please refer to our paper (link to be added upon publication).
75
+
76
+ ## Ethical Considerations
77
+
78
+ While these models are designed to improve interpretability, users should be aware that:
79
+ 1. The extracted features may reflect biases present in the scientific literature used for training
80
+ 2. Interpretations of the features should be validated carefully, especially when used for decision-making processes
81
+
82
+ ## Citation
83
+
84
+ If you use these models in your research, please cite our paper (citation to be added upon publication).
85
+
86
+ ## Additional Information
87
+
88
+ For more details on the methodology, feature families, and applications in semantic search, please refer to our full paper (link to be added upon publication).