DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement Paper • 2305.08227 • Published May 14, 2023 • 1
view article Article How to generate text: using different decoding methods for language generation with Transformers By patrickvonplaten • Mar 1, 2020 • 219
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10 • 29
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published 25 days ago • 103
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • 25 days ago • 167
view article Article LTX-Video LoRA training study (Single image/style training) By neph1 • Jan 14 • 3
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper • 2506.05573 • Published 22 days ago • 68
FlexPainter: Flexible and Multi-View Consistent Texture Generation Paper • 2506.02620 • Published 25 days ago • 14
SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers Paper • 2506.00830 • Published 27 days ago • 7
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints By sergeipetrov and 3 others • May 1, 2024 • 77
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 4 items • Updated 29 days ago • 171
view article Article Improving Prompt Consistency with Structured Generations By willkurt and 2 others • Apr 30, 2024 • 64
view article Article Blazingly fast whisper transcriptions with Inference Endpoints By mfuntowicz and 5 others • May 13 • 70
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection Paper • 2505.07293 • Published May 12 • 26