Base models trained on 1T high-quality tokens, demonstrating strong competitiveness among existing SOTA small models (<2B).
ParScale
community
AI & ML interests
None defined yet.
Recent Activity
Base models trained on 1T high-quality tokens, demonstrating strong competitiveness among existing SOTA small models (<2B).
Instruct models from the ParScale-1.8B base models, trained on SmolTalk-1M to enable conversational capabilities.
Checkpoints for PEFT Qwen-2.5; backbone weight is frozen.
Continual pre-training Qwen-2.5-3B model.