OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper ⢠2505.04601 ⢠Published 9 days ago ⢠20
view article Article Remote VAEs for decoding with HF endpoints š¤ By hlky and 1 other ⢠Feb 24 ⢠39
SmolVLM: Redefining small and efficient multimodal models Paper ⢠2504.05299 ⢠Published Apr 7 ⢠181
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others ⢠Feb 20 ⢠248
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper ⢠2502.02737 ⢠Published Feb 4 ⢠229
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback Paper ⢠2501.03916 ⢠Published Jan 7 ⢠15
view article Article Fine-tune ModernBERT for text classification using synthetic data By davidberenstein1957 ⢠Dec 30, 2024 ⢠36
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Paper ⢠2412.16112 ⢠Published Dec 20, 2024 ⢠23
VisualLens: Personalization through Visual History Paper ⢠2411.16034 ⢠Published Nov 25, 2024 ⢠18
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages Paper ⢠2411.14343 ⢠Published Nov 21, 2024 ⢠7
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Paper ⢠2411.07232 ⢠Published Nov 11, 2024 ⢠67
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper ⢠2411.07975 ⢠Published Nov 12, 2024 ⢠31
view article Article Extending *Transformer layers as Painters* to DiT's By NagaSaiAbhinay ⢠Aug 31, 2024 ⢠11
view article Article Train custom AI models with the trainer API and adapt them to š¤ By not-lain ⢠Jun 29, 2024 ⢠33
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x ⢠Jun 23, 2024 ⢠34
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper ⢠2404.14047 ⢠Published Apr 22, 2024 ⢠46
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x ⢠Jun 23, 2024 ⢠84