Nemo base model pretrained on 2billion of 4billion tokens. Intended for additional conversational/instruct tuning.

Eval Loss: 1.95439 -> 1.92584

Stopped this run at 12k steps out of about 21k. Main issues found were with DCLM dataset which is too low quality to use for such a small training job. I'll go back to this with higher quality data.

Downloads last month
0
Safetensors
Model size
12.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support