Generate text by combining an image and a question
Generate images by repairing and modifying masked areas