Intern-S1: A Scientific Multimodal Foundation Model Paper β’ 2508.15763 β’ Published 6 days ago β’ 236
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr β’ Feb 7 β’ 211
view article Article π¦Έπ»#14: What Is MCP, and Why Is Everyone β Suddenly!β Talking About It? By Kseniase β’ Mar 17 β’ 328
view article Article I trained a Language Model to schedule events with GRPO! By anakin87 β’ Apr 29 β’ 86
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper β’ 2506.13585 β’ Published Jun 16 β’ 263
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others β’ May 12 β’ 516
TransMLA: Multi-head Latent Attention Is All You Need Paper β’ 2502.07864 β’ Published Feb 11 β’ 57
view article Article Open-source DeepResearch β Freeing our search agents By m-ric and 4 others β’ Feb 4 β’ 1.29k
view article Article SmolVLM - small yet mighty Vision Language Model By andito and 4 others β’ Nov 26, 2024 β’ 350
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. β’ 26 items β’ Updated May 1 β’ 574
view article Article Mixture of Experts Explained By osanseviero and 5 others β’ Dec 11, 2023 β’ 852
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x β’ Jun 23, 2024 β’ 35
FLAME: Factuality-Aware Alignment for Large Language Models Paper β’ 2405.01525 β’ Published May 2, 2024 β’ 29
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper β’ 2405.01434 β’ Published May 2, 2024 β’ 57
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper β’ 2405.00732 β’ Published Apr 29, 2024 β’ 122