Generate audio from text input with optional audio prompt
Generate detailed images from text prompts
Interact with images using text questions