RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation Paper • 2603.25804 • Published 7 days ago • 26
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 9 days ago • 92
The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus Paper • 2603.20105 • Published 14 days ago • 36
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use Paper • 2603.08262 • Published 25 days ago • 42
view article Article Beyond Semantic Similarity: Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline 20 days ago • 39
Nemotron RAG Collection Set of tools to build retrieval-augmented generation (RAG) systems, improve search and ranking accuracy, and extract structured data from complex docs • 9 items • Updated 2 days ago • 84
view article Article Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation 21 days ago • 14
Transformers.js V4 demos Collection A collection of demos built with Transformers.js V4 • 23 items • Updated about 13 hours ago • 42
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published Feb 9 • 262
MARS: Modular Agent with Reflective Search for Automated AI Research Paper • 2602.02660 • Published Feb 2 • 66
Endless Terminals: Scaling RL Environments for Terminal Agents Paper • 2601.16443 • Published Jan 23 • 18
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published Jan 13 • 150
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published Jan 14 • 92
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published Jan 13 • 39