-
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published -
Mapping Natural Language Commands to Web Elements
Paper • 1808.09132 • Published -
Learning to Navigate the Web
Paper • 1812.09195 • Published -
Interactive Task and Concept Learning from Natural Language Instructions and GUI Demonstrations
Paper • 1909.00031 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2409.08264
-
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection
Paper • 2409.08513 • Published • 9 -
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper • 2409.08264 • Published • 40 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 55 -
LLMs + Persona-Plug = Personalized LLMs
Paper • 2409.11901 • Published • 28
-
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 37 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
Automating Thought of Search: A Journey Towards Soundness and Completeness
Paper • 2408.11326 • Published • 1 -
Building Math Agents with Multi-Turn Iterative Preference Learning
Paper • 2409.02392 • Published • 14
-
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 92 -
CogVLM2: Visual Language Models for Image and Video Understanding
Paper • 2408.16500 • Published • 56 -
Learning to Move Like Professional Counter-Strike Players
Paper • 2408.13934 • Published • 21 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 109
-
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation
Paper • 2408.11812 • Published • 4 -
WebArena: A Realistic Web Environment for Building Autonomous Agents
Paper • 2307.13854 • Published • 23 -
Agent Workflow Memory
Paper • 2409.07429 • Published • 25 -
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper • 2409.08264 • Published • 40
-
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 37 -
On the limits of agency in agent-based models
Paper • 2409.10568 • Published • 12 -
On the Diagram of Thought
Paper • 2409.10038 • Published • 8 -
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Paper • 2409.07703 • Published • 59
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 31 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 43 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 34 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 59
-
SLiMe: Segment Like Me
Paper • 2309.03179 • Published • 29 -
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Paper • 2309.02591 • Published • 14 -
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25 -
LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning
Paper • 2309.06440 • Published • 9