可以才能考以下链接:
[原版 HuggingFace] https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava
[原版 Github] https://github.com/fpgaminer/joycaption
[量化版 HuggingFace] https://huggingface.co/John6666/llama-joycaption-alpha-two-hf-llava-nf4
[国内 ModelScope 镜像] https://modelscope.cn/models/muse/fancyfeast-llama-joycaption-alpha-two-hf-llava
原版介绍:
- Free and Open: It will be released for free, open weights, no restrictions, and just like bigASP, will come with training scripts and lots of juicy details on how it gets built.
- Uncensored: Equal coverage of SFW and NSFW concepts. No "cylindrical shaped object with a white substance coming out on it" here.
- Diversity: All are welcome here. Do you like digital art? Photoreal? Anime? Furry? JoyCaption is for everyone. Pains are being taken to ensure broad coverage of image styles, content, ethnicity, gender, orientation, etc.
- Minimal Filtering: JoyCaption is trained on large swathes of images so that it can understand almost all aspects of our world. almost. Illegal content will never be tolerated in JoyCaption's training.
中文翻译:
- 免费且开放:它将免费发布,采用开放权重,没有任何限制,就像 bigASP 一样,还将附带训练脚本和大量关于如何构建它的详细信息。
- 不审查:对 SFW(安全内容)和 NSFW(不安全内容)均有同等覆盖。这里不会出现“带有白色物质的圆柱形物体”之类的东西。
- 多元化:欢迎大家在此。您喜欢数字艺术?照片写实?动漫?毛茸茸风格?JoyCaption 适合所有人。正在采取措施确保对图像风格、内容、种族、性别、性取向等进行广泛覆盖。
- 最小过滤:JoyCaption 在大量图像上进行训练,以便它几乎可以理解我们世界的各个方面。几乎。JoyCaption 的训练中绝不会容忍非法内容。
For Me:
- 大量的 Stable Diffusion 以及 Flux.1-Dev 模型的标签, 都使用 JoyCaption 进行标注. 可能后续调整 Prompt 比较方便.
- 现代, 使用
Qwen2.5-VL 模型似乎也是不错的选择. 但是对 NSFW 样本的支持可能欠缺

正常情况下, 您不需要执行此命令. 因为模型已经通过 LFS 保存到了仓库中
huggingface-cli download --local-dir models/llama-joycaption-alpha-two-hf-llava-nf4 John6666/llama-joycaption-alpha-two-hf-llava-nf4
推荐使用 uv 作为包管理器
# 安装 UV
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.12
# 安装虚拟环境
uv venv -p 3.12
source ./venv/bin/activate
uv pip install -r requirements.txt -i https://mirrors.cloud.tencent.com/pypi/simple
也可以只使用 pip:
pip install -r requirements.txt -i https://mirrors.cloud.tencent.com/pypi/simple
运行:
python3 main.py example/*
在 CNB.cool 环境下, 大概每分钟十张图片.