Coherent Multimodal Reasoning with Iterative Self-Evaluation for Vision-Language Models Paper • 2508.02886 • Published 19 days ago • 1
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published 10 days ago • 135
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated 2 days ago • 219
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 510
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 263