view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 26 days ago • 417
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • 17 days ago • 140
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks By Pclanglais • Aug 4, 2024 • 30
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval Paper • 2505.16967 • Published 15 days ago • 22
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published Apr 30 • 56
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 94
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17 • 92
view article Article DABStep: Data Agent Benchmark for Multi-step Reasoning By eggie5 and 5 others • Feb 4 • 89
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 115
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 128
RLVR Collection Model and data for 'Expanding RL with Verifiable Rewards Across Diverse Domains' • 3 items • Updated Mar 31 • 11
ReSearch Collection Trained models as described in the paper "ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning" • 5 items • Updated Mar 27 • 6
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 238
view article Article The N Implementation Details of RLHF with PPO By vwxyzjn and 2 others • Oct 24, 2023 • 58