Update README.md
Browse files
README.md
CHANGED
@@ -97,6 +97,7 @@ datasets:
|
|
97 |
# Huginn-0125
|
98 |
This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
|
99 |
All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
|
|
|
100 |
|
101 |
8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
|
102 |
this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing and our entire training run.
|
|
|
97 |
# Huginn-0125
|
98 |
This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
|
99 |
All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
|
100 |
+
For more information, see the paper page: https://huggingface.co/papers/2502.05171.
|
101 |
|
102 |
8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
|
103 |
this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing and our entire training run.
|