tomg-group-umd
/

huginn-0125

Text Generation

Model card Files Files and versions

JonasGeiping commited on Feb 9

Commit

64e294e

·

verified ·

1 Parent(s): 288f483

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -75,8 +75,7 @@ The model was not finetuned or post-trained, but due to inclusion of instruction
 messages = []
 messages.append({"role": "system", "content" : You are a helpful assistant."}
 messages.append({"role": "user", "content" : What do you think of Goethe's Faust?"}
-formatted_messages = [{"role": "Huginn" if m["role"] == "assistant" else m["role"], "content": m.content.strip()} for m in messages]
-chat_input = tokenizer.apply_chat_template(formatted_messages, tokenize=False, add_generation_prompt=True)
 print(chat_input)
 input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)
@@ -153,7 +152,10 @@ After finishing all iterations, the coda block processes the last state and prod
 Please refer to the paper for benchmark performance on standard benchmarks.
 ## Limitations
-Our checkpoint is trained for only 47000 steps on a broadly untested mixture, and the learning rate is never cooled down from its peak. As an academic project, the model is trained only on publicly available data and the 800B token count, while large in comparison to older fully open-source models such as the Pythia series, is small in comparison to modern open-source efforts such as OLMo, and tiny in comparison to the datasets used to train industrial open-weight models.
 ## License
 This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.

 messages = []
 messages.append({"role": "system", "content" : You are a helpful assistant."}
 messages.append({"role": "user", "content" : What do you think of Goethe's Faust?"}
+chat_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 print(chat_input)
 input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)
 Please refer to the paper for benchmark performance on standard benchmarks.
 ## Limitations
+Our checkpoint is trained for only 47000 steps on a broadly untested data mixture with a constant learning rate. As an academic project, the model is trained only on publicly available data and the 800B token count, while large in comparison to older fully open-source models such as the Pythia series, is small in comparison to modern open-source efforts such as OLMo, and tiny in comparison to the datasets used to train industrial open-weight models.
+## Technical Specifications
+This model was trained on 21 segments of 4096 AMD MI-250X GPUs on the OLCF Frontier Supercomputer in early December 2024. The model was trained using ROCM 6.2.0, and PyTorch 2.6 nightly pre-release 24/11/02. The code used to train the model can be found at https://github.com/seal-rg/recurrent-pretraining.
 ## License
 This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.