Transform video frames using text instructions
Generate images from text prompts
Generate and convert speech using text and audio inputs