ai-models/calcuis/wan-gguf

Public

WeChat Login

Code Issues Pull requests Events Packages Insights

main

Branch

Tag

cαlcμ<calcuis@users.noreply.huggingface.co>

Update README.md

a06b67e9

233 commits

.gitattributes
README.md
clip_vision_h_fp16.safetensors
clip_vision_h_fp8_e4m3fn.safetensors
samples\ComfyUI_00001_.webp
samples\ComfyUI_00002_.webp
samples\ComfyUI_00003_.webp
samples\ComfyUI_00004_.webp
samples\ComfyUI_00005_.webp
samples\ComfyUI_00006_.webp
samples\ComfyUI_00007_.webp
samples\ComfyUI_00008_.webp
samples\ComfyUI_00009_.webp
samples\ComfyUI_00010_.webp
samples\ComfyUI_00011_.mp4
samples\ComfyUI_00012_.mp4
samples\ComfyUI_00013_.mp4
samples\first.png
samples\last.png
t5xxl_um_fp16.safetensors
t5xxl_um_fp8_e4m3fn_scaled.safetensors
wan2.1-flf2v-720p-14b-f32-00001-of-00005.gguf
wan2.1-flf2v-720p-14b-f32-00002-of-00005.gguf
wan2.1-flf2v-720p-14b-f32-00003-of-00005.gguf
wan2.1-flf2v-720p-14b-f32-00004-of-00005.gguf
wan2.1-flf2v-720p-14b-f32-00005-of-00005.gguf
wan2.1-flf2v-720p-14b-q2_k.gguf
wan2.1-flf2v-720p-14b-q3_k_m.gguf
wan2.1-flf2v-720p-14b-q3_k_s.gguf
wan2.1-flf2v-720p-14b-q4_0.gguf
wan2.1-flf2v-720p-14b-q4_1.gguf
wan2.1-flf2v-720p-14b-q4_k_m.gguf
wan2.1-flf2v-720p-14b-q4_k_s.gguf
wan2.1-flf2v-720p-14b-q5_0.gguf
wan2.1-flf2v-720p-14b-q5_1.gguf
wan2.1-flf2v-720p-14b-q5_k_m.gguf
wan2.1-flf2v-720p-14b-q5_k_s.gguf
wan2.1-flf2v-720p-14b-q6_k.gguf
wan2.1-flf2v-720p-14b-q8_0.gguf
wan2.1-i2v-14b-480p-q2_k.gguf
wan2.1-i2v-14b-480p-q3_k_m.gguf
wan2.1-i2v-14b-480p-q4_0.gguf
wan2.1-i2v-14b-480p-q4_1.gguf
wan2.1-i2v-14b-480p-q4_k_m.gguf
wan2.1-i2v-14b-480p-q5_0.gguf
wan2.1-i2v-14b-480p-q5_1.gguf
wan2.1-i2v-14b-480p-q5_k_m.gguf
wan2.1-i2v-14b-480p-q6_k.gguf
wan2.1-i2v-14b-480p-q8_0.gguf
wan2.1-i2v-14b-720p-q2_k.gguf

Expand all

gguf quantized version of wan video

drag gguf to > ./ComfyUI/models/diffusion_models
drag t5xxl-um to > ./ComfyUI/models/text_encoders
drag vae to > ./ComfyUI/models/vae

screenshot

workflow

for i2v model, drag clip-vision-h to > ./ComfyUI/models/clip_vision
run the .bat file in the main directory (assume you are using gguf pack below)
if you opt to use fp8 scaled umt5xxl encoder (if applies to any fp8 scale t5 actually), please use cpu offload (switch from default to cpu under device in gguf clip loader; won't affect speed); btw, it works fine for both gguf umt5xxl and gguf vae
drag any demo video (below) to > your browser for workflow

screenshot

review

pig is a lazy architecture for gguf node; it applies to all model, encoder and vae gguf file(s); if you try to run it in comfyui-gguf node, you might need to manually add pig in it's IMG_ARCH_LIST (under loader.py); easier than you edit the gguf file itself; btw, model architecture which compatible with comfyui-gguf, including wan, should work in gguf node
1.3b model: t2v, vace gguf is working fine; good for old or low end machine

run it with diffusers🧨 (alternative 1)


import torch
from transformers import UMT5EncoderModel
from diffusers import AutoencoderKLWan, WanVACEPipeline, WanVACETransformer3DModel, GGUFQuantizationConfig
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video

model_path = "https://huggingface.co/calcuis/wan-gguf/blob/main/wan2.1-v5-vace-1.3b-q4_0.gguf"
transformer = WanVACETransformer3DModel.from_single_file(
    model_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
    )

text_encoder = UMT5EncoderModel.from_pretrained(
    "chatpig/umt5xxl-encoder-gguf",
    gguf_file="umt5xxl-encoder-q4_0.gguf",
    torch_dtype=torch.bfloat16,
    )

vae = AutoencoderKLWan.from_pretrained(
    "callgg/wan-decoder",
    subfolder="vae",
    torch_dtype=torch.float32
    )

pipe = WanVACEPipeline.from_pretrained(
    "callgg/wan-decoder",
    transformer=transformer,
    text_encoder=text_encoder,
    vae=vae, 
    torch_dtype=torch.bfloat16
)

flow_shift = 3.0
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=flow_shift)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

prompt = "a pig moving quickly in a beautiful winter scenery nature trees sunset tracking camera"
negative_prompt = "blurry ugly bad"

output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=720,
    height=480,
    num_frames=57,
    num_inference_steps=24,
    guidance_scale=2.5,
    conditioning_scale=0.0,
    generator=torch.Generator().manual_seed(0),
).frames[0]
export_to_video(output, "output.mp4", fps=16)