Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published 14 days ago • 64
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published Dec 16, 2024 • 18
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 49
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts Paper • 2405.04520 • Published May 7, 2024 • 1
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Paper • 2411.02337 • Published Nov 4, 2024 • 36
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 49
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12, 2024 • 17