This is a GGUF quantized version of LTX-2.
unsloth/LTX-2-GGUF uses Unsloth Dynamic 2.0 methodology for SOTA performance.
This model card focuses on the LTX-2 model, as presented in the paper LTX-2: Efficient Joint Audio-Visual Foundation Model. The codebase is available here.
LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.
| Name | Notes |
|---|---|
| ltx-2-19b-dev | The full model, flexible and trainable in bf16 |
| ltx-2-19b-dev-fp8 | The full model in fp8 quantization |
| ltx-2-19b-dev-fp4 | The full model in nvfp4 quantization |
| ltx-2-19b-distilled | The distilled version of the full model, 8 steps, CFG=1 |
| ltx-2-19b-distilled-lora-384 | A LoRA version of the distilled model applicable to the full model |
| ltx-2-spatial-upscaler-x2-1.0 | An x2 spatial upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher resolution |
| ltx-2-temporal-upscaler-x2-1.0 | An x2 temporal upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher FPS |
LTX-2 is accessible right away via the following links:
You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the license.
We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager. For manual installation information, please refer to our documentation site.
The LTX-2 codebase is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'. The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
# From the repository root
uv sync
source .venv/bin/activate
To use our model, please follow the instructions in our ltx-pipelines package.
LTX-2 is supported in the Diffusers Python library for image-to-video generation.
The base (dev) model is fully trainable.
It's extremely easy to reproduce the LoRAs and IC-LoRAs we publish with the model by following the instructions on the LTX-2 Trainer Readme.
Training for motion, style or likeness (sound+appearance) can take less than an hour in many settings.
@article{hacohen2025ltx2, title={LTX-2: Efficient Joint Audio-Visual Foundation Model}, author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and Bitterman, Yaki and Kvochko, Andrew and Berkowitz, Avishai and Shalem, Daniel and Lifschitz, Daphna and Moshe, Dudu and Porat, Eitan and Richardson, Eitan and Guy Shiran and Itay Chachy and Jonathan Chetboun and Michael Finkelson and Michael Kupchick and Nir Zabari and Nitzan Guetta and Noa Kotler and Ofir Bibi and Ori Gordon and Poriya Panet and Roi Benita and Shahar Armon and Victor Kulikov and Yaron Inger and Yonatan Shiftan and Zeev Melumian and Zeev Farbman}, journal={arXiv preprint arXiv:2601.03233}, year={2025} }