flash-attn. Users can still install it for optimal performance.OmniGen2 is a powerful and efficient unified multimodal model. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. OmniGen2 has competitive performance across four primary capabilities:
As an open-source project, OmniGen2 provides a powerful yet resource-efficient foundation for researchers and developers exploring the frontiers of controllable and personalized generative AI.
We will release the training code, dataset, and data construction pipeline soon. Stay tuned!
Demonstration of OmniGen2's overall capabilities.
Demonstration of OmniGen2's image editing capabilities.
Demonstration of OmniGen2's in-context generation capabilities.
# 1. Clone the repo
git clone git@github.com:VectorSpaceLab/OmniGen2.git
cd OmniGen2
# 2. (Optional) Create a clean Python environment
conda create -n omnigen2 python=3.11
conda activate omnigen2
# 3. Install dependencies
# 3.1 Install PyTorch (choose correct CUDA version)
pip install torch==2.6.0 torchvision --extra-index-url https://download.pytorch.org/whl/cu124
# 3.2 Install other required packages
pip install -r requirements.txt
# Note: Version 2.7.4.post1 is specified for compatibility with CUDA 12.4.
# Feel free to use a newer version if you use CUDA 12.6 or they fixed this compatibility issue.
# OmniGen2 runs even without flash-attn, though we recommend install it for best performance.
pip install flash-attn==2.7.4.post1 --no-build-isolation
# Install PyTorch from a domestic mirror
pip install torch==2.6.0 torchvision --index-url https://mirror.sjtu.edu.cn/pytorch-wheels/cu124
# Install other dependencies from Tsinghua mirror
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# Note: Version 2.7.4.post1 is specified for compatibility with CUDA 12.4.
# Feel free to use a newer version if you use CUDA 12.6 or they fixed this compatibility issue.
# OmniGen2 runs even without flash-attn, though we recommend install it for best performance.
pip install flash-attn==2.7.4.post1 --no-build-isolation -i https://pypi.tuna.tsinghua.edu.cn/simple
# Visual Understanding
bash example_understanding.sh
# Text-to-image generation
bash example_t2i.sh
# Instruction-guided image editing
bash example_edit.sh
# In-context generation
bash example_in_context_generation.sh
Online Demo: HF Spaces. Beyond Hugging Face Spaces, we are temporarily allocating additional GPU resources to ensure smooth access to the online demos. If you notice a long queue for a particular link, please try other links:
# for only generating image
pip install gradio
python app.py
# Optional: Share demo with public link (You need to be able to access huggingface)
python app.py --share
# for generating image or text
pip install gradio
python app_chat.py
To achieve optimal results with OmniGen2, you can adjust the following key hyperparameters based on your specific use case.
text_guidance_scale: Controls how strictly the output adheres to the text prompt (Classifier-Free Guidance).image_guidance_scale: This controls how much the final image should resemble the input reference image.
max_pixels: Automatically resizes images when their total pixel count (width × height) exceeds this limit, while maintaining its aspect ratio. This helps manage performance and memory usage.
max_input_image_side_length: Maximum side length for input images.negative_prompt: Tell the model what you don't want to see in the image.
enable_model_cpu_offload: Reduces VRAM usage by nearly 50% with a negligible impact on speed.
enable_sequential_cpu_offload: Minimizes VRAM usage to less than 3GB, but at the cost of significantly slower performance.
cfg_range_start, cfg_range_end: Define the timestep range where CFG is applied. Per this paper, reducing cfg_range_end can significantly decrease inference time with a negligible impact on quality.Some suggestions for improving generation quality:
OmniGen2 natively requires an NVIDIA RTX 3090 or an equivalent GPU with approximately 17GB of VRAM. For devices with less VRAM, you can enable CPU Offload to run the model.
Performance Tip: To improve inference speed, consider decreasing the cfg_range_end parameter. Within a reasonable range, this has a negligible impact on output quality.
The following table details the inference performance of OmniGen2 on an A800 GPU:
Inference Efficiency of OmniGen2.
If you find this repository or our work useful, please consider giving a star ⭐ and citation 🦖, which would be greatly appreciated (OmniGen2 report will be available as soon as possible):
@article{wu2025omnigen2, title={OmniGen2: Exploration to Advanced Multimodal Generation}, author={Chenyuan Wu and Pengfei Zheng and Ruiran Yan and Shitao Xiao and Xin Luo and Yueze Wang and Wanli Li and Xiyan Jiang and Yexin Liu and Junjie Zhou and Ze Liu and Ziyi Xia and Chaofan Li and Haoge Deng and Jiahao Wang and Kun Luo and Bo Zhang and Defu Lian and Xinlong Wang and Zhongyuan Wang and Tiejun Huang and Zheng Liu}, journal={arXiv preprint arXiv:2506.18871}, year={2025} }
This work is licensed under Apache 2.0 license.