EMOVA Hugging Face

Enterprise

community

https://emova-ollm.github.io/

emova-ollm

Activity Feed

AI & ML interests

Omni-modal Large Language Models, Multi-modal Large Language Models (MLLMs), Emotional spoken dialogue

Recent Activity

KaiChen1998 new activity 23 days ago

Emova-ollm/Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip:Improve language tag

KaiChen1998 updated a dataset 2 months ago

Emova-ollm/emova-alignment-7m

KaiChen1998 published a Space 2 months ago

Emova-ollm/EMOVA-demo

View all activity

Emova-ollm's activity

KaiChen1998

in Emova-ollm/Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip 23 days ago

Improve language tag

#1 opened 23 days ago by

lbourdois

KaiChen1998

posted an update 2 months ago

Post

4852

📢 Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!

🤗 EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.

✨ EMOVA Highlights
✅ State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
✅ Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
✅ Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!

🔥 You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: Emova-ollm/EMOVA-demo