view article Article Ļ0 and Ļ0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 ⢠152
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25, 2024 ⢠17
Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). ⢠28 items ⢠Updated Feb 14 ⢠18
MobileNetV4 pretrained weights Collection Weights for MobileNet-V4 pretrained in timm ⢠17 items ⢠Updated Sep 22, 2024 ⢠18
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper ⢠2406.11271 ⢠Published Jun 17, 2024 ⢠21
What If We Recaption Billions of Web Images with LLaMA-3? Paper ⢠2406.08478 ⢠Published Jun 12, 2024 ⢠42
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper ⢠2405.18392 ⢠Published May 28, 2024 ⢠12
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper ⢠2405.15738 ⢠Published May 24, 2024 ⢠47
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma ⢠16 items ⢠Updated Apr 3 ⢠146