Extract entities and their types from Chinese questions
Segment objects in images using points or text
Generate clothes try-on images using custom clothes and poses