Wang Chengyao's picture

Wang Chengyao

wcy1122

·

https://wcy1122.github.io/

AI & ML interests

Multimodal Intelligence

Recent Activity

updated a Space 3 days ago

wcy1122/MGM-Omni

new activity 3 days ago

wcy1122/MGM-Omni:Thanks a Million for This HF Space!

reacted to their post with 🚀 4 days ago

🚀 Introducing MGM-Omni, an omni-chatbot capable of processing text, image, video, and speech inputs, and can generate both text and speech responses. 👂 MGM-Omni support hour-level audio understanding. 🗣️ MGM-Omni support 10-minute speech generation and voice cloning. For more details, please check: 📝 Blog: https://mgm-omni.notion.site/MGM-Omni-An-Open-source-Omni-Chatbot-2395728e0b0180149ac9f24683fc9907 🌟 Code: https://github.com/dvlab-research/MGM-Omni 🤖 Model: https://huggingface.co/collections/wcy1122/mgm-omni-6896075e97317a88825032e1 🎮 Demo: https://huggingface.co/spaces/wcy1122/MGM-Omni

View all activity

Organizations

upvoted a collection 5 days ago

DeepSeek-V3.1

3 items • Updated 3 days ago • 203

upvoted a collection 6 days ago

MGM-Omni

An open-source Omni Chatbot for Long Audio and Voice Clone • 12 items • Updated 7 days ago • 6

upvoted 2 papers about 1 month ago

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 72

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 157

upvoted a paper 8 months ago

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Paper • 2412.09501 • Published Dec 12, 2024 • 49

upvoted a paper 9 months ago

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 118

upvoted a collection about 1 year ago

Llama 3.1

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 683

upvoted 2 collections over 1 year ago

MGM-Data

Official data collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 2 items • Updated Apr 21, 2024 • 7

MGM

Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated May 3, 2024 • 47

upvoted a paper over 1 year ago

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 48