Incredible model much better than Veo 3 or even Sora 2
Thank you for open-sourcing LTX-2 — this is an extraordinary piece of work. I’ve been testing it in some workflows, and the temporal consistency + speed are genuinely impressive even with minimal setup. It’s rare to see this level of quality and practicality in an open model.
Huge thanks to the team for the contribution to the community — please keep up the amazing work!
please share 🤯 (outputs)
please share 🤯 (outputs)
ok here is a 10 sec vid generated on 5090, and 96 Gb RAM system, 10 secs 1440x816 res, wf in video. sageattn enabled, gemma optimised https://huggingface.co/unsloth/gemma-3-12b-it-bnb-4bit block modification in gemma_encoder.py line 493 ff. ef ltxv_gemma_clip(encoder_path, ltxv_path, processor=None, dtype=None):
class _LTXVGemmaTextEncoderModel(LTXVGemmaTextEncoderModel):
def init(self, device="cpu", dtype=dtype, model_options={}):
dtype = torch.bfloat16 # TODO: make this configurable
gemma_model = Gemma3ForConditionalGeneration.from_pretrained(
encoder_path,
local_files_only=True,
torch_dtype=dtype,
device_map={"": "cpu"},
)
to avoid ooms at 2nd runs
Thank you so much for the kind words!
We’re excited to keep pushing this forward with the community’s support 🚀