Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.05472

To read... eventually

about 14 hours ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 12
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65

Question Aware Vision Transformer for Multimodal Reasoning

Paper • 2402.05472 • Published Feb 8 • 8

Question Aware Vision Transformer for Multimodal Reasoning

Paper • 2402.05472 • Published Feb 8 • 8
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 38
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8 • 39
More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3 • 51

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 158
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 47

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

Paper • 2311.05698 • Published Nov 9, 2023 • 9
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 84
PolyMaX: General Dense Prediction with Mask Transformer

Paper • 2311.05770 • Published Nov 9, 2023 • 6

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs