😊 Welcome!
V1.0:
| Name | Storage Space | Hugging Face | Model Scope | Description |
|---|---|---|---|---|
| Wan2.1-Fun-1.3B-InP | 19.0 GB | 🤗Link | 😄Link | Wan2.1-Fun-1.3B text-to-video weights, trained at multiple resolutions, supporting start and end frame prediction. |
| Wan2.1-Fun-14B-InP | 47.0 GB | 🤗Link | 😄Link | Wan2.1-Fun-14B text-to-video weights, trained at multiple resolutions, supporting start and end frame prediction. |
| Wan2.1-Fun-1.3B-Control | 19.0 GB | 🤗Link | 😄Link | Wan2.1-Fun-1.3B video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, etc., and trajectory control. Supports multi-resolution (512, 768, 1024) video prediction at 81 frames, trained at 16 frames per second, with multilingual prediction support. |
| Wan2.1-Fun-14B-Control | 47.0 GB | 🤗Link | 😄Link | Wan2.1-Fun-14B video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, etc., and trajectory control. Supports multi-resolution (512, 768, 1024) video prediction at 81 frames, trained at 16 frames per second, with multilingual prediction support. |
DSW has free GPU time, which can be applied once by a user and is valid for 3 months after applying.
Aliyun provide free GPU time in Freetier, get it and use in Aliyun PAI-DSW to start CogVideoX-Fun within 5min!
Our ComfyUI is as follows, please refer to ComfyUI README for details.

If you are using docker, please make sure that the graphics card driver and CUDA environment have been installed correctly in your machine.
Then execute the following commands in this way:
# pull image docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun # enter image docker run -it -p 7860:7860 --network host --gpus all --security-opt seccomp:unconfined --shm-size 200g mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun # clone code git clone https://github.com/aigc-apps/CogVideoX-Fun.git # enter CogVideoX-Fun's dir cd CogVideoX-Fun # download weights mkdir models/Diffusion_Transformer mkdir models/Personalized_Model # Please use the hugginface link or modelscope link to download the model. # CogVideoX-Fun # https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP # https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP # Wan # https://huggingface.co/alibaba-pai/Wan2.1-Fun-14B-InP # https://modelscope.cn/models/PAI/Wan2.1-Fun-14B-InP
We have verified this repo execution on the following environment:
The detailed of Windows:
The detailed of Linux:
We need about 60GB available on disk (for saving weights), please check!
We'd better place the weights along the specified path:
📦 models/ ├── 📂 Diffusion_Transformer/ │ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/ │ ├── 📂 CogVideoX-Fun-V1.1-5b-InP/ │ ├── 📂 Wan2.1-Fun-14B-InP │ └── 📂 Wan2.1-Fun-1.3B-InP/ ├── 📂 Personalized_Model/ │ └── your trained trainformer model / your trained lora model (for UI load)
Since Wan2.1 has a very large number of parameters, we need to consider memory optimization strategies to adapt to consumer-grade GPUs. We provide GPU_memory_mode for each prediction file, allowing you to choose between model_cpu_offload, model_cpu_offload_and_qfloat8, and sequential_cpu_offload. This solution is also applicable to CogVideoX-Fun generation.
model_cpu_offload: The entire model is moved to the CPU after use, saving some GPU memory.model_cpu_offload_and_qfloat8: The entire model is moved to the CPU after use, and the transformer model is quantized to float8, saving more GPU memory.sequential_cpu_offload: Each layer of the model is moved to the CPU after use. It is slower but saves a significant amount of GPU memory.qfloat8 may slightly reduce model performance but saves more GPU memory. If you have sufficient GPU memory, it is recommended to use model_cpu_offload.
For details, refer to ComfyUI README.
models folder.examples folder, and their supported features vary. Use them accordingly. Below is an example using CogVideoX-Fun:
prompt, neg_prompt, guidance_scale, and seed in the file examples/cogvideox_fun/predict_t2v.py.examples/cogvideox_fun/predict_t2v.py and wait for the results. The generated videos will be saved in the folder samples/cogvideox-fun-videos.validation_image_start, validation_image_end, prompt, neg_prompt, guidance_scale, and seed in the file examples/cogvideox_fun/predict_i2v.py.validation_image_start is the starting image of the video, and validation_image_end is the ending image of the video.examples/cogvideox_fun/predict_i2v.py and wait for the results. The generated videos will be saved in the folder samples/cogvideox-fun-videos_i2v.validation_video, validation_image_end, prompt, neg_prompt, guidance_scale, and seed in the file examples/cogvideox_fun/predict_v2v.py.validation_video is the reference video for video-to-video generation. You can use the following demo video: Demo Video.examples/cogvideox_fun/predict_v2v.py and wait for the results. The generated videos will be saved in the folder samples/cogvideox-fun-videos_v2v.control_video, validation_image_end, prompt, neg_prompt, guidance_scale, and seed in the file examples/cogvideox_fun/predict_v2v_control.py.control_video is the control video extracted using operators such as Canny, Pose, or Depth. You can use the following demo video: Demo Video.examples/cogvideox_fun/predict_v2v_control.py and wait for the results. The generated videos will be saved in the folder samples/cogvideox-fun-videos_v2v_control.lora_path and relevant paths in examples/{model_name}/predict_t2v.py or examples/{model_name}/predict_i2v.py as needed.The web UI supports text-to-video, image-to-video, video-to-video, and controlled video generation (Canny, Pose, Depth, etc.). This library currently supports CogVideoX-Fun, Wan2.1, and Wan2.1-Fun. Different models are distinguished by folder names under the examples folder, and their supported features vary. Use them accordingly. Below is an example using CogVideoX-Fun:
models folder.examples/cogvideox_fun/app.py to access the Gradio interface.prompt, neg_prompt, guidance_scale, and seed, click "Generate," and wait for the results. The generated videos will be saved in the sample folder.This project is licensed under the Apache License (Version 2.0).