kaizuberbuehler
's Collections
Agents
updated
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
Tasks
Paper
•
2412.14161
•
Published
•
52
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper
•
2412.21139
•
Published
•
23
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse
Task Synthesis
Paper
•
2412.19723
•
Published
•
89
AgentGen: Enhancing Planning Abilities for Large Language Model based
Agent via Environment and Task Generation
Paper
•
2408.00764
•
Published
•
1
More Agents Is All You Need
Paper
•
2402.05120
•
Published
•
54
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
•
2402.07456
•
Published
•
45
Generative Agents: Interactive Simulacra of Human Behavior
Paper
•
2304.03442
•
Published
•
12
Language Agent Tree Search Unifies Reasoning Acting and Planning in
Language Models
Paper
•
2310.04406
•
Published
•
10
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and
Optimisation
Paper
•
2312.13010
•
Published
•
5
GAIA: a benchmark for General AI Assistants
Paper
•
2311.12983
•
Published
•
196
LLM Agent Operating System
Paper
•
2403.16971
•
Published
•
69
Octopus v2: On-device language model for super agent
Paper
•
2404.01744
•
Published
•
59
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler
Generation
Paper
•
2404.12753
•
Published
•
44
Scaling Instructable Agents Across Many Simulated Worlds
Paper
•
2404.10179
•
Published
•
29
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
Computer Environments
Paper
•
2404.07972
•
Published
•
50
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
Paper
•
2404.05902
•
Published
•
23
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper
•
2404.05719
•
Published
•
83
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web
Navigating Agent
Paper
•
2404.03648
•
Published
•
29
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper
•
2305.16291
•
Published
•
10
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper
•
2309.08172
•
Published
•
13
The Rise and Potential of Large Language Model Based Agents: A Survey
Paper
•
2309.07864
•
Published
•
7
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper
•
2303.11366
•
Published
•
5
LEGENT: Open Platform for Embodied Agents
Paper
•
2404.18243
•
Published
•
23
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
•
2405.12399
•
Published
•
31
OpenVLA: An Open-Source Vision-Language-Action Model
Paper
•
2406.09246
•
Published
•
39
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex
Interactive Tasks
Paper
•
2305.17390
•
Published
•
3
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
Paper
•
2407.18961
•
Published
•
41
AppWorld: A Controllable World of Apps and People for Benchmarking
Interactive Coding Agents
Paper
•
2407.18901
•
Published
•
34
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper
•
2407.21787
•
Published
•
13
OmniParser for Pure Vision Based GUI Agent
Paper
•
2408.00203
•
Published
•
26
WebArena: A Realistic Web Environment for Building Autonomous Agents
Paper
•
2307.13854
•
Published
•
25
Diffusion Augmented Agents: A Framework for Efficient Exploration and
Transfer Learning
Paper
•
2407.20798
•
Published
•
25
Diversity Empowers Intelligence: Integrating Expertise of Software
Engineering Agents
Paper
•
2408.07060
•
Published
•
43
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
•
2408.06292
•
Published
•
124
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java
Paper
•
2408.14354
•
Published
•
42
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated
clinical environments
Paper
•
2405.07960
•
Published
•
1
On the limits of agency in agent-based models
Paper
•
2409.10568
•
Published
•
14
DSBench: How Far Are Data Science Agents to Becoming Data Science
Experts?
Paper
•
2409.07703
•
Published
•
69
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks
at Scale
Paper
•
2409.16299
•
Published
•
12
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer
Use
Paper
•
2411.10323
•
Published
•
35
Generative World Explorer
Paper
•
2411.11844
•
Published
•
78
Paper
•
2412.13501
•
Published
•
29
Large Action Models: From Inception to Implementation
Paper
•
2412.10047
•
Published
•
35
A3: Android Agent Arena for Mobile GUI Agents
Paper
•
2501.01149
•
Published
•
22
ResearchTown: Simulator of Human Research Community
Paper
•
2412.17767
•
Published
•
14
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital
World
Paper
•
2412.17589
•
Published
•
12
Agent-SafetyBench: Evaluating the Safety of LLM Agents
Paper
•
2412.14470
•
Published
•
12
GenEx: Generating an Explorable World
Paper
•
2412.09624
•
Published
•
96
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web
Tutorials
Paper
•
2412.09605
•
Published
•
30
The BrowserGym Ecosystem for Web Agent Research
Paper
•
2412.05467
•
Published
•
21
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper
•
2412.04454
•
Published
•
66
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and
Proactive Robotic Failure Detection
Paper
•
2412.04455
•
Published
•
39
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper
•
2412.01928
•
Published
•
45
Mars-PO: Multi-Agent Reasoning System Preference Optimization
Paper
•
2411.19039
•
Published
•
1
Flow-DPO: Improving LLM Mathematical Reasoning through Online
Multi-Agent Learning
Paper
•
2410.22304
•
Published
•
18
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics
Manipulation
Paper
•
2411.17636
•
Published
•
2
Cooperative Strategic Planning Enhances Reasoning Capabilities in Large
Language Models
Paper
•
2410.20007
•
Published
•
1
Enhancing LLM Agents for Code Generation with Possibility and Pass-rate
Prioritized Experience Replay
Paper
•
2410.12236
•
Published
•
1
Large Language Model-Brained GUI Agents: A Survey
Paper
•
2411.18279
•
Published
•
32
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper
•
2411.17465
•
Published
•
87
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning
for Web Agents
Paper
•
2411.06559
•
Published
•
14
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile
Manipulation
Paper
•
2411.04999
•
Published
•
18
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle
Grandmaster Level
Paper
•
2411.03562
•
Published
•
68
Agent Laboratory: Using LLM Agents as Research Assistants
Paper
•
2501.04227
•
Published
•
91
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning
and Reflection
Paper
•
2501.04575
•
Published
•
24
SDPO: Segment-Level Direct Preference Optimization for Social Agents
Paper
•
2501.01821
•
Published
•
19
SOTOPIA: Interactive Evaluation for Social Intelligence in Language
Agents
Paper
•
2310.11667
•
Published
•
3
WebWalker: Benchmarking LLMs in Web Traversal
Paper
•
2501.07572
•
Published
•
19
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Paper
•
2501.05707
•
Published
•
20
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub
Issue Resolution
Paper
•
2501.05040
•
Published
•
15
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper
•
2501.09747
•
Published
•
23
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous
Reinforcement Learning
Paper
•
2406.11896
•
Published
•
20
From Novice to Expert: LLM Agent Policy Optimization via Step-wise
Reinforcement Learning
Paper
•
2411.03817
•
Published
•
1
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper
•
2501.10120
•
Published
•
49
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper
•
2501.12326
•
Published
•
57
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
•
2501.11425
•
Published
•
105
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Paper
•
2501.11733
•
Published
•
29
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in
Realistic Environments
Paper
•
2501.10893
•
Published
•
26
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in
Virtual 3D Spaces
Paper
•
2501.12909
•
Published
•
70
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI
Systems
Paper
•
2501.11067
•
Published
•
13
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper
•
2501.13200
•
Published
•
68
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Paper
•
2502.02584
•
Published
•
17
Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models
Beneficial?
Paper
•
2502.00674
•
Published
•
13
TwinMarket: A Scalable Behavioral and Social Simulation for Financial
Markets
Paper
•
2502.01506
•
Published
•
38
Large Language Model Guided Self-Debugging Code Generation
Paper
•
2502.02928
•
Published
•
13
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference
Optimization
Paper
•
2502.04306
•
Published
•
19
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
•
2502.06060
•
Published
•
37
CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging
Paper
•
2502.05664
•
Published
•
23
Hephaestus: Improving Fundamental Agent Capabilities of Large Language
Models through Continual Pre-Training
Paper
•
2502.06589
•
Published
•
18
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
Paper
•
2502.08047
•
Published
•
27
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language
Models for Vision-Driven Embodied Agents
Paper
•
2502.09560
•
Published
•
36
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in
Agentic Tasks
Paper
•
2502.08235
•
Published
•
57
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
•
2502.14499
•
Published
•
190
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex
Task Automation on PC
Paper
•
2502.14282
•
Published
•
20
Magma: A Foundation Model for Multimodal AI Agents
Paper
•
2502.13130
•
Published
•
58
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for
Multimodal Web Agents
Paper
•
2502.11357
•
Published
•
10
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem
Understanding
Paper
•
2502.19400
•
Published
•
48
Towards an AI co-scientist
Paper
•
2502.18864
•
Published
•
48
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning
Trajectories for Complex Problem Solving
Paper
•
2502.16111
•
Published
•
9
TAG: A Decentralized Framework for Multi-Agent Hierarchical
Reinforcement Learning
Paper
•
2502.15425
•
Published
•
9
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided
Multi-Agent Collaboration
Paper
•
2502.17110
•
Published
•
12
WebGames: Challenging General-Purpose Web-Browsing AI Agents
Paper
•
2502.18356
•
Published
•
12
VEM: Environment-Free Exploration for Training GUI Agent with Value
Environment Model
Paper
•
2502.18906
•
Published
•
12
Curie: Toward Rigorous and Automated Scientific Experimentation with AI
Agents
Paper
•
2502.16069
•
Published
•
19
Agentic Reward Modeling: Integrating Human Preferences with Verifiable
Correctness Signals for Reliable Reward Systems
Paper
•
2502.19328
•
Published
•
22
ATLaS: Agent Tuning via Learning Critical Steps
Paper
•
2503.02197
•
Published
•
8
Gemini Robotics: Bringing AI into the Physical World
Paper
•
2503.20020
•
Published
•
24