ugryumnik
's Collections
Backlog
updated
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
•
2503.01785
•
Published
•
78
When an LLM is apprehensive about its answers -- and when its
uncertainty is justified
Paper
•
2503.01688
•
Published
•
21
Predictive Data Selection: The Data That Predicts Is the Data That
Teaches
Paper
•
2503.00808
•
Published
•
57
Chain of Draft: Thinking Faster by Writing Less
Paper
•
2502.18600
•
Published
•
48
Multi-Turn Code Generation Through Single-Step Rewards
Paper
•
2502.20380
•
Published
•
31
Self-rewarding correction for mathematical reasoning
Paper
•
2502.19613
•
Published
•
84
MPO: Boosting LLM Agents with Meta Plan Optimization
Paper
•
2503.02682
•
Published
•
27
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
276
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
121
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
114
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper
•
2310.12823
•
Published
•
36
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
101
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
98
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
•
2501.03262
•
Published
•
99
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
97
Agent Laboratory: Using LLM Agents as Research Assistants
Paper
•
2501.04227
•
Published
•
91
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse
Task Synthesis
Paper
•
2412.19723
•
Published
•
88
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
•
2501.18492
•
Published
•
87
Towards Best Practices for Open Datasets for LLM Training
Paper
•
2501.08365
•
Published
•
61
Paper
•
2412.15115
•
Published
•
365
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
•
2412.14922
•
Published
•
89
Training Large Language Models to Reason in a Continuous Latent Space
Paper
•
2412.06769
•
Published
•
85
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
•
2411.04905
•
Published
•
125
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
139
Survey on Evaluation of LLM-based Agents
Paper
•
2503.16416
•
Published
•
89
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
•
2503.21460
•
Published
•
77
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
•
2503.21614
•
Published
•
39
ScholarCopilot: Training Large Language Models for Academic Writing with
Accurate Citations
Paper
•
2504.00824
•
Published
•
41
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
•
2504.01990
•
Published
•
272
Inference-Time Scaling for Generalist Reward Modeling
Paper
•
2504.02495
•
Published
•
54
Agentic Knowledgeable Self-awareness
Paper
•
2504.03553
•
Published
•
28
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training
Tokens
Paper
•
2504.07096
•
Published
•
73
Missing Premise exacerbates Overthinking: Are Reasoning Models losing
Critical Thinking Skill?
Paper
•
2504.06514
•
Published
•
39
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
•
2504.07128
•
Published
•
83
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
•
2504.09643
•
Published
•
34
Breaking the Data Barrier -- Building GUI Agents Through Task
Generalization
Paper
•
2504.10127
•
Published
•
17
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
•
2504.10481
•
Published
•
84