Transform video frames using text instructions
Set up and customize Stable Diffusion WebUI
Generate and convert speech using text and audio inputs