GUI Agents - a m-ric Collection

m-ric 's Collections

Could be useful one day

Scaling Laws 📏

🚀 Spinning Up in LLMs

🧑‍⚖️ LLM-as-a-judge

🔎⇒💬 RAG

🛣️ Grammar

💡 Interpretability - understanding LLMs

LLM foundations

🔧 Optimization Mechanics 🔧

Open-source AI Releases - August '24

Mother of all Training Clusters

GUI Agents

updated May 16

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Paper • 2310.11441 • Published Oct 17, 2023 • 28
UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper • 2501.12326 • Published Jan 21 • 62
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Paper • 2406.08451 • Published Jun 12, 2024 • 26
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

Paper • 2406.10819 • Published Jun 16, 2024 • 1
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 66
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27 • 62
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Paper • 2503.16365 • Published Mar 20 • 41
Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11 • 145