-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 126 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85
Collections
Discover the best community collections!
Collections including paper arxiv:2407.08083
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 27 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 56 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 36 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 28
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 59 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 38 -
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
Paper • 2405.14224 • Published • 12 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138
-
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Paper • 2309.14327 • Published • 21 -
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 27 -
Memory^3: Language Modeling with Explicit Memory
Paper • 2407.01178 • Published • 3 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2
-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 104 -
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Paper • 2403.09977 • Published • 9 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 11
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 124 -
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Paper • 2306.16527 • Published • 47 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 38 -
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 7