Submitted by Xueqing 82 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation · 44 authors 3
Submitted by nicolaus625 48 CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following · 5 authors 2
Submitted by mparvez 34 Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team · 4 authors 2
Submitted by shun-zheng 32 Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs · 12 authors 5
Submitted by zhangshaolei 26 Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model · 5 authors 2
Submitted by koustuvs 23 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning · 30 authors 2
Submitted by amsabour 16 Align Your Flow: Scaling Continuous-Time Flow Map Distillation · 3 authors 4
Submitted by yilunzhao 16 Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure · 4 authors 2
Submitted by cetosignis 12 From Bytes to Ideas: Language Modeling with Autoregressive U-Nets · 6 authors 3
Submitted by ahmedheakl 11 Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees · 5 authors 2
Submitted by zichenwen 10 EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models · 8 authors 2
Submitted by Liuff23 9 xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations · 33 authors 2
Submitted by CostaliyA 9 CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios · 9 authors 2
Submitted by Siyuc 6 Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders · 5 authors 2
Submitted by XaiverZ 6 Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning · 3 authors 2
Submitted by Xuandong 5 AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents · 4 authors 3
Submitted by akhaliq 4 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs · 46 authors 2
Submitted by JJ-TMT 4 CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation · 4 authors 2
Submitted by dsouzadaniel 3 Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers · 5 authors 2
Submitted by amanchadha 3 Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations · 15 authors 2
Submitted by FaiyazAbdullah114708 2 VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning · 7 authors 2
Submitted by BeileiCui 2 TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented Contrast · 4 authors 2
Submitted by hsichelin 2 EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction · 4 authors 2
Submitted by ChetKao 2 Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning · 7 authors 2
Submitted by MaxDu 1 DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance · 2 authors 2