HuggingFaceTB
/

SmolLM3-3B-checkpoints

Model card Files Files and versions

eliebak HF Staff commited on about 22 hours ago

Commit

6df90a1

·

verified ·

1 Parent(s): 8007a2c

fix

not me putting the readme in the commit msg 😅

Files changed (1) hide show

README.md +65 -1

README.md CHANGED Viewed

@@ -12,4 +12,68 @@ language:
   - ru
 base_model:
   - HuggingFaceTB/SmolLM3-3B-Base
----

   - ru
 base_model:
   - HuggingFaceTB/SmolLM3-3B-Base
+---
+# SmolLM3 Checkpoints
+We are releasing intermediate checkpoints of SmolLM3 to enable further research.
+## Pre-training
+We release checkpoints every 40,000 steps, which equals 94.4B tokens.
+The GBS (Global Batch Size) in tokens for SmolLM3-3B is 2,359,296. To calculate the number of tokens from a given step:
+```python
+nb_tokens = nb_step * GBS
+```
+### Training Stages
+**Stage 1:** Steps 0 to 3,450,000 (86 checkpoints)
+[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/stage1_8T.yaml)
+**Stage 2:** Steps 3,450,000 to 4,200,000 (19 checkpoints)
+[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/stage2_8T_9T.yaml)
+**Stage 3:** Steps 4,200,000 to 4,720,000 (13 checkpoints)
+[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/stage3_9T_11T.yaml)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/944zWNgcI1I06RZuoP11B.png)
+### Long Context Extension
+For the additional 2 stages that extend the context length to 64k, we sample checkpoints every 4,000 steps (9.4B tokens) for a total of 10 checkpoints:
+**Long Context 4k to 32k**
+[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/long_context_4k_to_32k.yaml)
+**Long Context 32k to 64k**
+[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/long_context_32k_to_64k.yaml)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/jBOiemVtbfi9YD7Pki6sY.png)
+## Post-training
+We release checkpoints at every step of our post-training recipe: Mid training, SFT, APO soup, and LC expert.
+![image.png](https://cdn-uploads.huggingface.co/production/uploads/651e96991b97c9f33d26bde6/bDzh-A5X-gi3mY_RbLOSB.png)
+## How to Load a Checkpoint
+```python
+# pip install transformers
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+checkpoint = "HuggingFaceTB/SmolLM3-3B-checkpoints"
+revision = "stage1-step-40000" # replace by the revision you want
+device = torch.device("cuda" if torch.cuda.is_available() else "mps" if hasattr(torch, 'mps') and torch.mps.is_available() else "cpu")
+tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision=revision)
+model = AutoModelForCausalLM.from_pretrained(checkpoint, revision=revision).to(device)
+inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
+outputs = model.generate(inputs)
+print(tokenizer.decode(outputs[0]))
+```
+## License
+[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)