Magma-8B model for UI Agents
Interact with an agent to perform web-based tasks
Generate realistic talking video from an image and audio
Dia - 1.6B Text-to-Dialogue Model
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
Generate descriptions from images using masks
Explore multilingual LLM benchmark results
Conversational speech generation
Audio to Talking Face
Space demoing Phi4 MultiModal
ML-powered speech synthesis directly in your browser
Browse tools and agents to use in smolagents
Generate text or segment objects from an image
Next-generation reasoning model that runs locally in-browser