Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 60
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases Paper • 2406.10290 • Published Jun 12, 2024
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 98
ThinK: Thinner Key Cache by Query-Driven Pruning Paper • 2407.21018 • Published Jul 30, 2024 • 31
ThinK: Thinner Key Cache by Query-Driven Pruning Paper • 2407.21018 • Published Jul 30, 2024 • 31
ThinK: Thinner Key Cache by Query-Driven Pruning Paper • 2407.21018 • Published Jul 30, 2024 • 31
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities Paper • 2401.06961 • Published Jan 13, 2024
Towards Understanding the Behaviors of Optimal Deep Active Learning Algorithms Paper • 2101.00977 • Published Dec 29, 2020
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations Paper • 2310.11207 • Published Oct 17, 2023
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models Paper • 2209.07511 • Published Sep 15, 2022