Update README.md
Browse files
README.md
CHANGED
|
@@ -75,8 +75,7 @@ The model was not finetuned or post-trained, but due to inclusion of instruction
|
|
| 75 |
messages = []
|
| 76 |
messages.append({"role": "system", "content" : You are a helpful assistant."}
|
| 77 |
messages.append({"role": "user", "content" : What do you think of Goethe's Faust?"}
|
| 78 |
-
|
| 79 |
-
chat_input = tokenizer.apply_chat_template(formatted_messages, tokenize=False, add_generation_prompt=True)
|
| 80 |
print(chat_input)
|
| 81 |
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)
|
| 82 |
|
|
@@ -153,7 +152,10 @@ After finishing all iterations, the coda block processes the last state and prod
|
|
| 153 |
Please refer to the paper for benchmark performance on standard benchmarks.
|
| 154 |
|
| 155 |
## Limitations
|
| 156 |
-
Our checkpoint is trained for only 47000 steps on a broadly untested mixture
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
## License
|
| 159 |
This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
|
|
|
|
| 75 |
messages = []
|
| 76 |
messages.append({"role": "system", "content" : You are a helpful assistant."}
|
| 77 |
messages.append({"role": "user", "content" : What do you think of Goethe's Faust?"}
|
| 78 |
+
chat_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|
|
|
| 79 |
print(chat_input)
|
| 80 |
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)
|
| 81 |
|
|
|
|
| 152 |
Please refer to the paper for benchmark performance on standard benchmarks.
|
| 153 |
|
| 154 |
## Limitations
|
| 155 |
+
Our checkpoint is trained for only 47000 steps on a broadly untested data mixture with a constant learning rate. As an academic project, the model is trained only on publicly available data and the 800B token count, while large in comparison to older fully open-source models such as the Pythia series, is small in comparison to modern open-source efforts such as OLMo, and tiny in comparison to the datasets used to train industrial open-weight models.
|
| 156 |
+
|
| 157 |
+
## Technical Specifications
|
| 158 |
+
This model was trained on 21 segments of 4096 AMD MI-250X GPUs on the OLCF Frontier Supercomputer in early December 2024. The model was trained using ROCM 6.2.0, and PyTorch 2.6 nightly pre-release 24/11/02. The code used to train the model can be found at https://github.com/seal-rg/recurrent-pretraining.
|
| 159 |
|
| 160 |
## License
|
| 161 |
This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
|