Submitted by zlatamaria 83 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA · 9 authors 3
Submitted by Shunian 27 FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion · 8 authors 1
Submitted by abhi1nandy2 25 Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs · 3 authors 1
Submitted by russwang 24 MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning · 13 authors 1
Submitted by chenguolin 16 PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers · 7 authors 2
Submitted by thomagram 14 STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis · 10 authors 1
Submitted by dcml0714 12 Audio-Aware Large Language Models as Judges for Speaking Styles · 11 authors 3
Submitted by cg1177 6 Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision · 8 authors 1
Submitted by Hoyard 4 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model · 7 authors 1
Submitted by zhwang01 4 CodeContests+: High-Quality Test Case Generation for Competitive Programming · 5 authors 1
Submitted by EmetTheGolum 4 Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery · 4 authors 1
Submitted by JohnCage 4 Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward · 8 authors 1
Submitted by guineapig 4 HASHIRU: Hierarchical Agent System for Hybrid Intelligent Resource Utilization · 3 authors 1
Submitted by salman-abdullah 3 MIRIAD: Augmenting LLMs with millions of medical query-response pairs · 10 authors 1
Submitted by MauroC 3 Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data · 6 authors 1
Submitted by benshi34 2 When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration · 6 authors 1
Submitted by sy1998 2 When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding · 10 authors 1
Submitted by lss727 2 Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning · 9 authors 1
Submitted by neildlf 2 GuideX: Guided Synthetic Data Generation for Zero-Shot Information Extraction · 4 authors 1
Submitted by DhavalPatel - AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance · 8 authors 1
Submitted by scott-yjyang - Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning · 11 authors 1