AI & ML interests
Feeling and building the multimodal intelligence.
Recent Activity
View all activity
as a general evaluator for assessing model performance
a model good at arbitrary types of visual input
Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/
Some powerful image models.
CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/
The collection of the sae that hooked on llava
Models focus on video understanding (previously known as LLaVA-NeXT-Video).
Dataset Collection of LMMs-Eval
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper • 2407.07895 • Published • 43 -
lmms-lab/llava-next-interleave-qwen-7b
Text Generation • 8B • Updated • 695 • 26 -
lmms-lab/llava-next-interleave-qwen-7b-dpo
Text Generation • 8B • Updated • 55 • 11 -
lmms-lab/M4-Instruct-Data
Updated • 1.31k • 70
Making Lite version of the dataset to accelerate holistic evaluation during model development!
CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/
The collection of the sae that hooked on llava
as a general evaluator for assessing model performance
Models focus on video understanding (previously known as LLaVA-NeXT-Video).
a model good at arbitrary types of visual input
Dataset Collection of LMMs-Eval
Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper • 2407.07895 • Published • 43 -
lmms-lab/llava-next-interleave-qwen-7b
Text Generation • 8B • Updated • 695 • 26 -
lmms-lab/llava-next-interleave-qwen-7b-dpo
Text Generation • 8B • Updated • 55 • 11 -
lmms-lab/M4-Instruct-Data
Updated • 1.31k • 70
Some powerful image models.
Making Lite version of the dataset to accelerate holistic evaluation during model development!