Unified Agent Trajectory Specification

university

https://www.cs.cmu.edu/~neulab/

AI & ML interests

None defined yet.

authored 4 papers 6 months ago

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published Oct 28, 2025 • 31

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29, 2025 • 47

Simulating Environments with Reasoning Models for Agent Training

Paper • 2511.01824 • Published Nov 3, 2025 • 2

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published Dec 8, 2025 • 40

authored 10 papers 11 months ago

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17, 2025 • 40

Evaluating Vision-Language Models as Evaluators in Path Planning

Paper • 2411.18711 • Published Nov 27, 2024

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published Mar 13, 2025 • 25

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

Paper • 2503.19877 • Published Mar 25, 2025 • 1

VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge

Paper • 2504.10342 • Published Apr 14, 2025 • 11

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Paper • 2504.12329 • Published Apr 12, 2025

Overtrained Language Models Are Harder to Fine-Tune

Paper • 2503.19206 • Published Mar 24, 2025 • 2

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Paper • 2505.10185 • Published May 15, 2025 • 26

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Paper • 2506.03930 • Published Jun 4, 2025 • 27

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79

authored a paper about 1 year ago

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Paper • 2505.10185 • Published May 15, 2025 • 26

authored 4 papers about 1 year ago

Beyond Browsing: API-Based Web Agents

Paper • 2410.16464 • Published Oct 21, 2024 • 2

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10, 2025 • 101

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Paper • 2504.07079 • Published Apr 9, 2025 • 12

VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge

Paper • 2504.10342 • Published Apr 14, 2025 • 11

authored a paper about 1 year ago

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

Paper • 2504.00824 • Published Apr 1, 2025 • 43