FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published 1 day ago • 12
Do generative video models learn physical principles from watching videos? Paper • 2501.09038 • Published 3 days ago • 10
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 24 days ago • 94
view article Article Building a Real-Time Video Chat with Gemini 2.0, Gradio, and WebRTC 👀👂 By freddyaboulton • 5 days ago • 3
view article Article Train 400x faster Static Embedding Models with Sentence Transformers 3 days ago • 100
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published 4 days ago • 39
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering Paper • 2501.05131 • Published 9 days ago • 32
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published 4 days ago • 48
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published 8 days ago • 32
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published 5 days ago • 45
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published 7 days ago • 29
Demystifying Domain-adaptive Post-training for Financial LLMs Paper • 2501.04961 • Published 9 days ago • 10
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains Paper • 2501.05707 • Published 8 days ago • 18