ComfyUI HunyuanVideo-Foley Custom Node
This is a ComfyUI custom node wrapper for the HunyuanVideo-Foley model, which generates realistic audio from video and text descriptions.
- Text-Video-to-Audio Synthesis: Generate realistic audio that matches your video content
- Flexible Text Prompts: Use optional text descriptions to guide audio generation
- Multiple Samples: Generate up to 6 different audio variations per inference
- Configurable Parameters: Control guidance scale, inference steps, and sampling
- Seed Control: Reproducible results with seed parameter
- Model Caching: Efficient model loading and reuse across generations
- Automatic Model Downloads: Models are automatically downloaded to
ComfyUI/models/foley/ when needed
- Text-Video-to-Audio Synthesis: Generate realistic audio that matches your video content
- Flexible Text Prompts: Use optional text descriptions to guide audio generation
- Multiple Samples: Generate up to 6 different audio variations per inference
- Configurable Parameters: Control guidance scale, inference steps, and sampling
- Seed Control: Reproducible results with seed parameter
- Model Caching: Efficient model loading and reuse across generations
- Automatic Model Downloads: Models are automatically downloaded to
ComfyUI/models/foley/ when needed
-
Clone this repository into your ComfyUI custom_nodes directory:
cd ComfyUI/custom_nodes
git clone https://github.com/if-ai/ComfyUI_HunyuanVideoFoley.git
-
Install dependencies:
cd ComfyUI_HunyuanVideoFoley
pip install -r requirements.txt
-
Run the installation script (recommended):
python install.py
-
Restart ComfyUI to load the new nodes.
The models can be obtained in two ways:
Option 1: Automatic Download (Recommended)
- Models will be automatically downloaded to
ComfyUI/models/foley/ when you first run the node
- No manual setup required
- Progress will be shown in the ComfyUI console
Option 2: Manual Download
- Download models from HuggingFace
- Place models in
ComfyUI/models/foley/ (recommended) or ./pretrained_models/ directory
- Ensure the config file is at
configs/hunyuanvideo-foley-xxl.yaml
1. HunyuanVideo-Foley Generator
Main node for generating audio from video and text.
Inputs:
- video: Video input (VIDEO type)
- text_prompt: Text description of desired audio (STRING)
- guidance_scale: CFG scale for generation control (1.0-10.0, default: 4.5)
- num_inference_steps: Number of denoising steps (10-100, default: 50)
- sample_nums: Number of audio samples to generate (1-6, default: 1)
- seed: Random seed for reproducibility (INT)
- model_path: Path to pretrained models (optional, leave empty for auto-download)
Outputs:
- video_with_audio: Video with generated audio merged (VIDEO)
- audio_only: Generated audio file (AUDIO)
- status_message: Generation status and info (STRING)
Frame Count & Duration Limits
- Maximum Frames: 450 frames (hard limit)
- Maximum Duration: 15 seconds at 30fps
- Recommended: Keep videos ≤15 seconds for best results
- 30fps: Max 15 seconds (450 frames)
- 24fps: Max 18.75 seconds (450 frames)
- 15fps: Max 30 seconds (450 frames)
For videos longer than 15 seconds:
- Reduce FPS: Lower FPS allows longer duration within frame limit
- Segment Processing: Split long videos into 15s segments
- Audio Merging: Combine generated audio segments in post-processing
- Load Video: Use a "Load Video" node to input your video file
- Add Generator: Add the "HunyuanVideo-Foley Generator" node
- Connect Video: Connect the video output to the generator's video input
- Set Prompt: Enter a text description (e.g., "A person walks on frozen ice")
- Adjust Settings: Configure guidance scale, steps, and sample count as needed
- Generate: Run the workflow to generate audio
The node expects the following model structure:
ComfyUI\models\foley\hunyuanvideo-foley-xxl
├── hunyuanvideo_foley.pth # Main Foley model
├── vae_128d_48k.pth # DAC VAE model
└── synchformer_state_dict.pth # Synchformer model
configs/
└── hunyuanvideo-foley-xxl.yaml # Configuration file
If you find this tool useful, please consider supporting my work by:
You can also support by reporting issues or suggesting features. Your contributions help me bring updates and improvements to the project.
This custom node is based on the HunyuanVideo-Foley project. Please check the original project's license terms.
Based on the HunyuanVideo-Foley project by Tencent. Original paper and code available at: