view article Article Unlocking Healthcare AI: I'm Releasing State-of-the-Art Medical Models for Free. Forever. By MaziyarPanahi • 2 days ago • 57
view article Article Introducing ColQwen-Omni: Retrieve in every modality By manu and 4 others • 2 days ago • 39
view article Article Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders By orionweller and 5 others • 3 days ago • 33
NextCoder Collection NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. • 6 items • Updated 10 days ago • 67
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published 16 days ago • 34
EmoNet Collection The full collection of our EmoNet effort. More info available at: https://huggingface.co/blog/felfri/emonet • 8 items • Updated 27 days ago • 5
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning Paper • 2506.21355 • Published 22 days ago • 9
DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement Paper • 2305.08227 • Published May 14, 2023 • 1
view article Article How to generate text: using different decoding methods for language generation with Transformers By patrickvonplaten • Mar 1, 2020 • 222
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10 • 29
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 114
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • Jun 3 • 202
view article Article LTX-Video LoRA training study (Single image/style training) By neph1 • Jan 14 • 3
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper • 2506.05573 • Published Jun 5 • 71