Simple Semi-supervised Knowledge Distillation from Vision-Language Models via texttt{D}ual-texttt{H}ead texttt{O}ptimization Paper • 2505.07675 • Published 7 days ago • 14
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published 4 days ago • 41
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis Paper • 2505.10046 • Published 5 days ago • 10
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Paper • 2505.10554 • Published 4 days ago • 106
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Paper • 2505.08617 • Published 6 days ago • 35
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning Paper • 2505.10320 • Published 4 days ago • 17
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning Paper • 2505.10557 • Published 4 days ago • 39
view article Article TinyAgents: A Minimal Experiment with Code Agents and MCP Tools By albertvillanova • 4 days ago • 25
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • 5 days ago • 84
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published 7 days ago • 114
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation Paper • 2504.21650 • Published 19 days ago • 15
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper • 2505.04601 • Published 12 days ago • 21
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers Paper • 2505.04842 • Published 12 days ago • 12
LLM-Independent Adaptive RAG: Let the Question Speak for Itself Paper • 2505.04253 • Published 13 days ago • 11
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training Paper • 2505.00358 • Published 19 days ago • 20
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Paper • 2505.05467 • Published 11 days ago • 13
view article Article Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs By davidberenstein1957 and 1 other • 13 days ago • 28