Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published 1 day ago • 16
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Paper • 2501.03226 • Published 12 days ago • 35
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published 14 days ago • 82
view article Article Introducing smolagents: simple agents that write actions in code. 18 days ago • 505
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published 7 days ago • 29
view article Article 🅰️ℹ️ 1️⃣0️⃣1️⃣ **What is HtmlRAG, Multimodal RAG and Agentic RAG?** By Kseniase • 9 days ago • 6
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Paper • 2402.17553 • Published Feb 27, 2024 • 23
C4AI Aya Expanse Collection Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. • 3 items • Updated Dec 16, 2024 • 30
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 29 days ago • 38
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 128
AgentInstruct: Toward Generative Teaching with Agentic Flows Paper • 2407.03502 • Published Jul 3, 2024 • 51
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Paper • 2410.22304 • Published Oct 29, 2024 • 17
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning Paper • 2410.21845 • Published Oct 29, 2024 • 13
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published Oct 23, 2024 • 200
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 52
Direct Language Model Alignment from Online AI Feedback Paper • 2402.04792 • Published Feb 7, 2024 • 30