view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others β’ 4 days ago β’ 479
meta-llama/Meta-Llama-3-8B-Instruct Text Generation β’ 8B β’ Updated 23 days ago β’ 1.44M β’ β’ 4.06k
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper β’ 2501.07301 β’ Published Jan 13 β’ 99
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper β’ 2501.08313 β’ Published Jan 14 β’ 295
view article Article Fine-tune ModernBERT for text classification using synthetic data By davidberenstein1957 β’ Dec 30, 2024 β’ 38
view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ Jan 2 β’ 41
view article Article Fine-tune a SmolLM on domain-specific synthetic data from a LLM By davidberenstein1957 β’ Jan 3 β’ 37
view article Article Accelerating Language Model Inference with Mixture of Attentions By hba123 and 1 other β’ Jan 7 β’ 24