Nemo base model pretrained on 2billion of 4billion tokens. Intended for additional conversational/instruct tuning.

Eval Loss: 1.95439 -> 1.92584

Stopped this run at 12k steps out of about 21k. Main issues found were with DCLM dataset which is too low quality to use for such a small training job. I'll go back to this with higher quality data.

Downloads last month: 0

Safetensors

Model size

12.2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support