LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published 10 days ago • 76
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published 9 days ago • 63
Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm Simulators for Conditional Synthetic Data Generation Paper • 2509.02040 • Published 8 days ago • 13
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published 10 days ago • 76
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation Paper • 2506.06962 • Published Jun 8 • 29
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy Paper • 2506.13284 • Published Jun 16 • 27
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation Paper • 2506.03930 • Published Jun 4 • 26
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Paper • 2505.15929 • Published May 21 • 49
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning Paper • 2505.16400 • Published May 22 • 34
WebDreamer Collection Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents • 6 items • Updated Apr 14 • 5
WebDreamer Collection Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents • 6 items • Updated Apr 14 • 5