ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published 27 days ago • 96
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization Paper • 2506.10920 • Published 25 days ago • 6
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs Paper • 2506.09522 • Published 27 days ago • 20
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 31
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Paper • 2504.13820 • Published Apr 18 • 17
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Paper • 2504.13173 • Published Apr 17 • 19
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22 • 61
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Paper • 2504.13055 • Published Apr 17 • 19
Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published Apr 17 • 34
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published Apr 17 • 39
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding Paper • 2504.09925 • Published Apr 14 • 38
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 276
Towards Visual Text Grounding of Multimodal Large Language Model Paper • 2504.04974 • Published Apr 7 • 16
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10 • 29
Clinical ModernBERT: An efficient and long context encoder for biomedical text Paper • 2504.03964 • Published Apr 4 • 6
Concept Lancet: Image Editing with Compositional Representation Transplant Paper • 2504.02828 • Published Apr 3 • 17
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Paper • 2504.02821 • Published Apr 3 • 11