Transformers

SmolLM3 Checkpoints

We are releasing intermediate checkpoints of SmolLM3 to enable further research.

Pre-training

We release checkpoints every 40,000 steps, which equals 94.4B tokens. The GBS (Global Batch Size) in tokens for SmolLM3-3B is 2,359,296. To calculate the number of tokens from a given step:

nb_tokens = nb_step * GBS

Training Stages

Stage 1: Steps 0 to 3,450,000 (86 checkpoints) config

Stage 2: Steps 3,450,000 to 4,200,000 (19 checkpoints) config

Stage 3: Steps 4,200,000 to 4,720,000 (13 checkpoints) config

image/png

Long Context Extension

For the additional 2 stages that extend the context length to 64k, we sample checkpoints every 4,000 steps (9.4B tokens) for a total of 10 checkpoints:

Long Context 4k to 32k config

Long Context 32k to 64k config

image/png

Post-training

We release checkpoints at every step of our post-training recipe: Mid training, SFT, APO soup, and LC expert.

image.png

How to Load a Checkpoint

# pip install transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM3-3B-checkpoints"
revision = "stage1-step-40000" # replace by the revision you want
device = torch.device("cuda" if torch.cuda.is_available() else "mps" if hasattr(torch, 'mps') and torch.mps.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision=revision)
model = AutoModelForCausalLM.from_pretrained(checkpoint, revision=revision).to(device)
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HuggingFaceTB/SmolLM3-3B-checkpoints

Finetuned
(5)
this model