CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks Paper • 2507.23751 • Published 22 days ago • 4
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B Text Generation • 2B • Updated 10 days ago • 10.7k • 210
τ^2-Bench: Evaluating Conversational Agents in a Dual-Control Environment Paper • 2506.07982 • Published Jun 9 • 6
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face By abidlabs and 4 others • 25 days ago • 158
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM Paper • 2504.14286 • Published Apr 19 • 1
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Paper • 2507.17512 • Published about 1 month ago • 36