tylerachang
/

bigram-subnetworks-pythia-410m

tylerachang commited on Apr 21

Commit

74784c6

verified ·

1 Parent(s): 383cf90

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md ADDED Viewed

+---
+license: apache-2.0
+language:
+- eng
+---
+# bigram-subnetworks-pythia-410m
+We release bigram subnetworks as described in [Chang and Bergen (2025)](https://tylerachang.github.io/).
+These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the current token) in Transformer language models.
+This repository contains the bigram subnetwork for [EleutherAI/pythia-410m](https://huggingface.co/EleutherAI/pythia-410m).
+## Format
+A subnetwork file is a pickled Python dictionary that maps the original model parameter names to numpy binary masks with the same shapes as the original model parameters (1: keep, 0: drop).
+For details on usage, see: https://github.com/tylerachang/bigram-subnetworks.
+For details on how these subnetworks were trained, see the paper linked above.
+For minimal usage, download the code at https://github.com/tylerachang/bigram-subnetworks (or just the file `circuit_loading_utils.py`) and run in Python:
+```
+from circuit_loading_utils import load_bigram_subnetwork_dict, load_subnetwork_model
+mask_dict = load_bigram_subnetwork_dict('EleutherAI/pythia-410m')
+model, tokenizer, config = load_subnetwork_model('EleutherAI/pythia-410m', mask_dict)
+```
+## Citation
+<pre>
+@article{chang-bergen-2025-bigram,
+  title={Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models},
+  author={Chang, Tyler A. and Bergen, Benjamin K.},
+  journal={Preprint},
+  year={2024},
+}
+</pre>