CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering Paper • 2501.01371 • Published Jan 2, 2025 • 1
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published Mar 6, 2025 • 72
Lost in Embeddings: Information Loss in Vision-Language Models Paper • 2509.11986 • Published Sep 15, 2025 • 28
HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering Paper • 2509.09713 • Published Sep 8, 2025 • 24
Inpainting-Guided Policy Optimization for Diffusion Large Language Models Paper • 2509.10396 • Published Sep 12, 2025 • 15
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 190
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2, 2025 • 83
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28, 2025 • 110
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published Aug 28, 2025 • 89
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds Paper • 2508.14879 • Published Aug 20, 2025 • 68
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22, 2025 • 160
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Paper • 2508.04825 • Published Aug 6, 2025 • 58
Adapting Vision-Language Models Without Labels: A Comprehensive Survey Paper • 2508.05547 • Published Aug 7, 2025 • 11
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation Paper • 2508.09987 • Published Aug 13, 2025 • 25
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer Paper • 2508.10893 • Published Aug 14, 2025 • 31