Spark TTS
A text-to-speech model powered by SparkAudio and Mobvoi.
Demos for Phi-4-mini-instruct model
PDF to Structured Data powered by Google DeepMind Gemini 2.0
Compare latest VAE's
Break the language barrier
Large Language Diffusion Models
Generate depth maps from your images
Generate edited images using text prompts and styles
Interact with a multimodal AI model using text, images, and audio
Wan: Open and Advanced Large-Scale Video Generative Models
Execute commands from environment
The ultimate guide to training LLM on large GPU Clusters
Compare SigLIP1 and SigLIP2 on zero shot classification
Automatically discover creative knowledge inside diffusion
Detect objects in images or videos
Process audio and generate text output based on instructions
Image generator/customization/personalization
Gradio demo for MatAnyone
Generate Podcast using Kokoro-TTS!