Commit
1307814
1 Parent(s): db820de

Create README.md (#1)

Browse files

- Create README.md (eb87be668c8ef0112d034ee9475e9622c0446098)


Co-authored-by: David Brandfonbrener <[email protected]>

Files changed (1) hide show
  1. README.md +27 -0
README.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CoLoR-filter
2
+
3
+ See accompanying code at: https://github.com/davidbrandfonbrener/color-filter-olmo
4
+
5
+
6
+ To download the data, we recommend using the huggingface-cli.
7
+
8
+ To download all the data, run `huggingface-cli download hlzhang109/CoLoR-filter --local-dir YOUR_PATH`.
9
+
10
+ This will download the data to your huggingface cache and create a local-dir with symbolic links to the data. If you actually want the data at `YOUR_PATH`, set it as the `--cache-dir` in the command.
11
+
12
+ WARNING: the data is large since it contains a copy of tokenized C4 to ensure that the selected data indices match with the tokenized raw data. The C4 data is ~300GB and the rest of the repo is ~50GB of which ~45GB is the 1.2B model and optimizer checkpoints.
13
+
14
+ If you only want to download some files (e.g. just the models), use the cli. For example, `huggingface-cli download hlzhang109/CoLoR-filter --local-dir YOUR_PATH --include "models/*"`.
15
+
16
+ ## Citation
17
+
18
+ If you use this code in your research, please cite the following paper:
19
+
20
+ ```bibtex
21
+ @inproceedings{,
22
+ title={},
23
+ author={},
24
+ booktitle={},
25
+ year={},
26
+ }
27
+ ```