-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2505.19147
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 166 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 128 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 112 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 87
-
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
Paper • 2505.18125 • Published • 109 -
On-Policy RL with Optimal Reward Baseline
Paper • 2505.23585 • Published • 14 -
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Paper • 2505.23604 • Published • 23 -
Are Reasoning Models More Prone to Hallucination?
Paper • 2505.23646 • Published • 24
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 285 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 268 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 54 -
Seedream 3.0 Technical Report
Paper • 2504.11346 • Published • 64
-
One-Minute Video Generation with Test-Time Training
Paper • 2504.05298 • Published • 105 -
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 134 -
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 157 -
Antidistillation Sampling
Paper • 2504.13146 • Published • 61
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 157 -
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 285 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 2 -
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
Paper • 2504.07956 • Published • 46