
Qwen/Qwen2.5-Omni-7B
Any-to-Any
•
11B
•
Updated
•
105k
•
1.71k
This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update
Interact with Qwen using text, audio, video, or images
A unified multimodal understanding and generation model.
Chat with images, videos, or PDFs to generate text
Ask questions about images to get answers
Answer questions using images or videos
Generate responses using images and text input
Annotate and describe images with text prompts
Generate text or segment objects from an image
Demo for ShieldGemma 2, multimodal safety model
Check if text and images are safe
Chat with Qwen2.5-VL-72B using text and files
Chat with images and videos using Qwen
Generate responses to video or image inputs