Generate text based on an image and user input
Ask questions about images and experience multimodal chat