Generate spoken responses to voice or text input
ultra-fast video model, LTX 0.9.7 13B distilled
video animacion 3 min