Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -6,7 +6,7 @@ language:
|
|
| 6 |
---
|
| 7 |
|
| 8 |
# bigram-subnetworks-pythia-410m
|
| 9 |
-
We release bigram subnetworks as described in [Chang and Bergen (2025)](https://
|
| 10 |
These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
|
| 11 |
This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](https://huggingface.co/EleutherAI/pythia-410m).
|
| 12 |
|
|
@@ -14,7 +14,7 @@ This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](http
|
|
| 14 |
|
| 15 |
A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
|
| 16 |
For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
|
| 17 |
-
For details on how these subnetworks were trained, see
|
| 18 |
|
| 19 |
For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
|
| 20 |
```
|
|
@@ -30,5 +30,6 @@ model, tokenizer, config = load_subnetwork_model('EleutherAI/pythia-410m', mask_
|
|
| 30 |
author={Chang, Tyler A. and Bergen, Benjamin K.},
|
| 31 |
journal={Preprint},
|
| 32 |
year={2024},
|
|
|
|
| 33 |
}
|
| 34 |
</pre>
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
# bigram-subnetworks-pythia-410m
|
| 9 |
+
We release bigram subnetworks as described in [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).
|
| 10 |
These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
|
| 11 |
This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](https://huggingface.co/EleutherAI/pythia-410m).
|
| 12 |
|
|
|
|
| 14 |
|
| 15 |
A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
|
| 16 |
For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
|
| 17 |
+
For details on how these subnetworks were trained, see [Chang and Bergen (2025)](https://arxiv.org/abs/2504.15471).
|
| 18 |
|
| 19 |
For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
|
| 20 |
```
|
|
|
|
| 30 |
author={Chang, Tyler A. and Bergen, Benjamin K.},
|
| 31 |
journal={Preprint},
|
| 32 |
year={2024},
|
| 33 |
+
url={https://arxiv.org/abs/2504.15471},
|
| 34 |
}
|
| 35 |
</pre>
|