InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation Paper • 2407.06423 • Published Jul 8, 2024
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction Paper • 2503.15661 • Published Mar 19 • 2
StarFlow: Generating Structured Workflow Outputs From Sketch Images Paper • 2503.21889 • Published Mar 27 • 1
Rendering-Aware Reinforcement Learning for Vector Graphics Generation Paper • 2505.20793 • Published 15 days ago • 11
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering Paper • 2306.09996 • Published Jun 16, 2023
Benchmarking Vision Language Models for Cultural Understanding Paper • 2407.10920 • Published Jul 15, 2024
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding Paper • 2306.08832 • Published Jun 15, 2023
Rendering-Aware Reinforcement Learning for Vector Graphics Generation Paper • 2505.20793 • Published 15 days ago • 11
Distilling semantically aware orders for autoregressive image generation Paper • 2504.17069 • Published Apr 23 • 6
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper • 2504.08942 • Published Apr 11 • 27
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 85
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 85
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14