Nemo base model pretrained on 2billion of 4billion tokens. Intended for additional conversational/instruct tuning.
Eval Loss: 1.95439 -> 1.92584
Stopped this run at 12k steps out of about 21k. Main issues found were with DCLM dataset which is too low quality to use for such a small training job. I'll go back to this with higher quality data.
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support