view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! 5 days ago • 34
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design Paper • 2506.04734 • Published 6 days ago • 18
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL By toslali-ibm and 5 others • 8 days ago • 44
NextCoder Collection NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. • 5 items • Updated May 5 • 47
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning Paper • 2505.23754 • Published 12 days ago • 15
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Feb 20 • 61
OpenMath Collection A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated 5 days ago • 44
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 5 days ago • 163
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Paper • 2410.10816 • Published Oct 14, 2024 • 21
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context By philschmid and 7 others • Jul 23, 2024 • 234
Running on CPU Upgrade 13.2k 13.2k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots