ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions? Paper • 2508.13680 • Published 3 days ago • 5
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published 2 days ago • 19
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Paper • 2508.13998 • Published 3 days ago • 13
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Paper • 2508.14041 • Published 3 days ago • 47
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models Paper • 2508.09834 • Published 9 days ago • 45
G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration Paper • 2508.11379 • Published 7 days ago • 10
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Paper • 2508.13009 • Published 4 days ago • 19
view article Article Vision Language Model Alignment in TRL ⚡️ By sergiopaniego and 4 others • 16 days ago • 69
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer Paper • 2508.10893 • Published 8 days ago • 30
Optimization-Free Style Transfer for 3D Gaussian Splats Paper • 2508.05813 • Published 15 days ago • 4
Technical Report: Full-Stack Fine-Tuning for the Q Programming Language Paper • 2508.06813 • Published 13 days ago • 5