VLMEvalKit Evaluation Results Collection
Blind vote on HF TTS models!
Generate images from text descriptions