OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models Paper • 2503.08686 • Published 10 days ago • 18
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published 12 days ago • 10
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts Paper • 2503.05066 • Published 15 days ago • 4
PlainQAFact: Automatic Factuality Evaluation Metric for Biomedical Plain Language Summaries Generation Paper • 2503.08890 • Published 10 days ago • 2
Temporal Regularization Makes Your Video Generator Stronger Paper • 2503.15417 • Published 2 days ago • 20
STEVE: AStep Verification Pipeline for Computer-use Agent Training Paper • 2503.12532 • Published 5 days ago • 13
ViSpeak: Visual Instruction Feedback in Streaming Videos Paper • 2503.12769 • Published 5 days ago • 8
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction Paper • 2503.11227 • Published 7 days ago • 8
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning Paper • 2503.13360 • Published 4 days ago • 5