LMMs-Lab

community

https://www.lmms-lab.com/

lmmslab

EvolvingLMMs-Lab

Activity Feed

AI & ML interests

Feeling and building the multimodal intelligence.

Recent Activity

THUdyh authored a paper 18 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Jingkang authored a paper 19 days ago

Sparse Mixture-of-Experts are Domain Generalizable Learners

Jingkang authored a paper 19 days ago

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

View all activity

Papers

A Simple Baseline for Streaming Video Understanding

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

View all Papers

lmms-lab 's collections 18

OneVision-Encoder

HEVC-Style Vision Transformer

lmms-lab-encoder/onevision-encoder-large

0.3B • Updated Feb 5 • 1.76k • 14
lmms-lab-encoder/onevision-encoder-large-lang

Updated Feb 10 • 149 • 8

OpenMMReasoner

OpenMMReasoner/OpenMMReasoner-ColdStart

Image-Text-to-Text • 8B • Updated Dec 30, 2025 • 391 • 3
OpenMMReasoner/OpenMMReasoner-RL

Image-Text-to-Text • 8B • Updated Dec 30, 2025 • 129 • 17
OpenMMReasoner/OpenMMReasoner-SFT-874K

Viewer • Updated Dec 30, 2025 • 874k • 225 • 6
OpenMMReasoner/OpenMMReasoner-RL-74K

Viewer • Updated Nov 25, 2025 • 74.7k • 237 • 9

LLaVA-Critic-R1

lmms-lab/LLaVA-Critic-R1-7B

8B • Updated Jul 19, 2025 • 380
lmms-lab/LLaVA-Critic-R1-7B-Plus-Qwen

8B • Updated Jul 26, 2025 • 45 • 5
lmms-lab/LLaVA-Critic-R1-7B-Plus-Mimo

8B • Updated Aug 28, 2025 • 20 • 1
lmms-lab/LLaVA-Critic-R1-7B-LLaMA32v

11B • Updated Aug 28, 2025 • 5

Aero-1-Audio

Runtime error

Agents

43

Aero 1 Audio Demo

💬

43

Demo for Aero-1-Audio
lmms-lab/Aero-1-Audio

Text Generation • 2B • Updated Jun 7, 2025 • 513 • 91

VideoMMMU

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23, 2025 • 24
lmms-lab/VideoMMMU

Viewer • Updated May 5, 2025 • 900 • 3.59k • 13

LLaVA-Critic

as a general evaluator for assessing model performance

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 37
lmms-lab/llava-critic-7b

8B • Updated Oct 4, 2024 • 1.14k • 15
lmms-lab/llava-critic-72b

73B • Updated Oct 4, 2024 • 4 • 15
lmms-lab/llava-critic-113k

Viewer • Updated Oct 5, 2024 • 113k • 924 • 28

LLaVA-OneVision

a model good at arbitrary types of visual input

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61
lmms-lab/LLaVA-OneVision-Mid-Data

Viewer • Updated Aug 26, 2024 • 563k • 92 • 21
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 19.2k • 235
lmms-lab/LLaVA-NeXT-Data

Viewer • Updated Aug 30, 2024 • 779k • 4.87k • 46

LongVA

Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 33
lmms-lab/LongVA-7B

Text Generation • 8B • Updated Jun 26, 2024 • 243 • 15
lmms-lab/LongVA-7B-DPO

Text Generation • 8B • Updated Jun 26, 2024 • 357 • 10
lmms-lab/v_niah_needles

Viewer • Updated Jun 15, 2024 • 5 • 20 • 4

LLaVA-NeXT

Some powerful image models.

lmms-lab/llava-next-110b

Text Generation • 112B • Updated May 14, 2024 • 15 • 21
lmms-lab/llava-next-72b

Text Generation • 73B • Updated Aug 22, 2024 • 170 • 14
lmms-lab/llava-next-qwen-32b

Text Generation • 33B • Updated Jul 16, 2024 • 41 • 7
lmms-lab/llama3-llava-next-8b

Text Generation • Updated Aug 17, 2024 • 2.06k • 106

LongVT

Runtime error

Agents

3

LongVT Demo

🎬

3

Analyze long videos and answer questions about them
longvideotool/LongVT-RL

Video-Text-to-Text • Updated Dec 4, 2025 • 373 • 3
longvideotool/LongVT-SFT

Video-Text-to-Text • Updated Dec 4, 2025 • 61 • 1
longvideotool/LongVT-RFT

Video-Text-to-Text • Updated Dec 4, 2025 • 250 • 1

LLaVA-OneVision-1.5

https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5

mvp-lab/LLaVA-OneVision-1.5-Instruct-Data

Viewer • Updated Nov 21, 2025 • 21.9M • 96k • 71
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M

Viewer • Updated Nov 24, 2025 • 91.5M • 162k • 69
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 21, 2025 • 14.9k • 62
lmms-lab/LLaVA-OneVision-1.5-4B-Instruct

Image-Text-to-Text • 5B • Updated Feb 6 • 7.8k • 18

MMSearch-R1

MMSearch-R1 is a solution designed to train LMMs to perform on-demand multimodal search in real-world environment.

lmms-lab/MMSearch-R1-7B-0807

8B • Updated Aug 7, 2025 • 5
lmms-lab/MMSearch-R1-7B

8B • Updated Jul 30, 2025 • 89 • 9
lmms-lab/FVQA

Viewer • Updated Aug 9, 2025 • 6.66k • 420 • 7
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25, 2025 • 64

EgoLife

CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5, 2025 • 46
Runtime error

Agents

14

EgoGPT

👁

14

Analyze video to describe actions and transcribe audio
lmms-lab/EgoIT-99K

Viewer • Updated Mar 7, 2025 • 199k • 7.43k • 9
lmms-lab/EgoLife

Viewer • Updated Mar 13, 2025 • 32k • 23.3k • 18

Multimodal-SAE

The collection of the sae that hooked on llava

Running on Zero

Agents

9

Multimodal SAE

💬

9

Demo for Multimodal-SAE
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 19
lmms-lab/llava-sae-explanations-5k

Viewer • Updated Nov 22, 2024 • 9.8k • 57 • 5
lmms-lab/llama3-llava-next-8b-hf-sae-131k

Updated Nov 26, 2024 • 10 • 7

LLaVA-Video

Models focus on video understanding (previously known as LLaVA-NeXT-Video).

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 41
lmms-lab/LLaVA-Video-178K

Viewer • Updated Oct 11, 2024 • 1.63M • 43.6k • 194
lmms-lab/LLaVA-Video-7B-Qwen2

Video-Text-to-Text • 8B • Updated Oct 25, 2024 • 25.7k • 125
lmms-lab/LLaVA-Video-72B-Qwen2

Text Generation • 73B • Updated Oct 25, 2024 • 182 • 22

LMMs-Eval

Dataset Collection of LMMs-Eval

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
lmms-lab/VQAv2

Viewer • Updated Jan 26, 2024 • 770k • 23k • 32
lmms-lab/MME

Viewer • Updated Dec 23, 2023 • 2.37k • 39.1k • 32
lmms-lab/DocVQA

Viewer • Updated Apr 18, 2024 • 16.6k • 29.3k • 78

LLaVA-Next-Interleave

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 42
lmms-lab/llava-next-interleave-qwen-7b

Text Generation • 8B • Updated Jul 24, 2024 • 285 • 27
lmms-lab/llava-next-interleave-qwen-7b-dpo

Text Generation • 8B • Updated Jul 12, 2024 • 120 • 12
lmms-lab/M4-Instruct-Data

Updated Jul 21, 2024 • 1.22k • 78

LMMs-Eval-Lite

Making Lite version of the dataset to accelerate holistic evaluation during model development!

lmms-lab/LMMs-Eval-Lite

Viewer • Updated Jul 4, 2024 • 8.5k • 11.3k • 7
lmms-lab/llava-bench-in-the-wild

Viewer • Updated Mar 8, 2024 • 60 • 6.21k • 10
lmms-lab/CMMMU

Viewer • Updated Mar 8, 2024 • 12k • 536 • 4
lmms-lab/MMMU

Viewer • Updated Mar 8, 2024 • 11.6k • 42.7k • 7