34
UI-TARS
🌖
Generate click coordinates from image and instruction
Generate realistic audio from text
Generate text based on a prompt
Generate talking face animation from still images and audio
Generate animated faces from still images and videos
Generate a talking face video from text