==================================================================================
本模型为 https://huggingface.co/tencent/HunyuanImage-3.0 模型的 qint4 量化版本,采用 https://github.com/huggingface/optimum-quanto 技术量化,采用非官方技术保存的权重文件。
本量化模型目前在 H20 96GB 单卡上通过测试。模型加载方式,采用非官方代码,详见 load_quantized_model.py 代码,目前里面包含两种加载方式,供大家参考,欢迎大家相互交流、共同研究学习,谢谢!
加载方式一:模型初始化加载需要 CPU 大约 160GB 左右,GPU 初始占用 50GB;推理开始后 CPU 占用降至 70GB 左右,GPU 占用约 55-60 GB。模型加载时会出现模型键值的警告信息,但不影响使用。
加载方式二:模型初始化加载需要 CPU 大约 75GB,GPU 初始占用 50GB;推理开始后 CPU 保持 75GB 占用, GPU 占用约 55-60GB。模型加载时,由于提供了键值 map , 所以不会出现任何警告信息。
两种方法推理时间大致相同,在 H20 上大约 12 分钟一张(9:16 / 16:9)。
==================================================================================
==================================================================================
👏 Join our WeChat and Discord | 💻 Official website(官网) Try our model!
If you develop/use HunyuanImage-3.0 in your projects, welcome to let us know.
HunyuanImage-3.0 is a groundbreaking native multimodal model that unifies multimodal understanding and generation within an autoregressive framework. Our text-to-image module achieves performance comparable to or surpassing leading closed-source models.
🧠 Unified Multimodal Architecture: Moving beyond the prevalent DiT-based architectures, HunyuanImage-3.0 employs a unified autoregressive framework. This design enables a more direct and integrated modeling of text and image modalities, leading to surprisingly effective and contextually rich image generation.
🏆 The Largest Image Generation MoE Model: This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance.
🎨 Superior Image Generation Performance: Through rigorous dataset curation and advanced reinforcement learning post-training, we've achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details.
💭 Intelligent World-Knowledge Reasoning: The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs.
If you find HunyuanImage-3.0 useful in your research, please cite our work:
@article{cao2025hunyuanimage, title={HunyuanImage 3.0 Technical Report}, author={Cao, Siyu and Chen, Hangting and Chen, Peng and Cheng, Yiji and Cui, Yutao and Deng, Xinchi and Dong, Ying and Gong, Kipper and Gu, Tianpeng and Gu, Xiusen and others}, journal={arXiv preprint arXiv:2509.23951}, year={2025} }
We extend our heartfelt gratitude to the following open-source projects and communities for their invaluable contributions: