
conda create -n SRPO python=3.10.16 -y conda activate SRPO bash ./env_setup.sh
💡 The environment dependency is basically the same as DanceGRPO
| Model | Huggingface Download URL |
|---|---|
| SRPO | diffusion_pytorch_model |
diffusion_pytorch_model.safetensors in [https://huggingface.co/tencent/SRPO]mkdir ./srpo
huggingface-cli login
huggingface-cli download --resume-download Tencent/SRPO diffusion_pytorch_model.safetensors --local-dir ./srpo/
black-forest-labs/FLUX.1-dev[https://huggingface.co/black-forest-labs/FLUX.1-dev]mkdir ./data/flux
huggingface-cli login
huggingface-cli download --resume-download black-forest-labs/FLUX.1-dev --local-dir ./data/flux
You can use it in ComfyUI.
Load the following image in ComfyUI to get the workflow, or load the JSON file directly SRPO-workflow:
Tip: The workflow JSON info was added to the image file.

from diffusers import FluxPipeline
from safetensors.torch import load_file
prompt='The Death of Ophelia by John Everett Millais, Pre-Raphaelite painting, Ophelia floating in a river surrounded by flowers, detailed natural elements, melancholic and tragic atmosphere'
pipe = FluxPipeline.from_pretrained('./data/flux',
torch_dtype=torch.bfloat16,
use_safetensors=True
).to("cuda")
state_dict = load_file("./srpo/diffusion_pytorch_model.safetensors")
pipe.transformer.load_state_dict(state_dict)
image = pipe(
prompt,
guidance_scale=3.5,
height=1024,
width=1024,
num_inference_steps=50,
max_sequence_length=512,
generator=generator
).images[0]
Inference with our cases. Replace model_path in vis.py.
torchrun --nnodes=1 --nproc_per_node=8 \
--node_rank 0 \
--rdzv_endpoint $CHIEF_IP:29502 \
--rdzv_id 456 \
vis.py
./data/flux.mkdir data
mkdir ./data/flux
huggingface-cli login
huggingface-cli download --resume-download black-forest-labs/FLUX.1-dev --local-dir ./data/flux
./hps_ckpt.mkdir ./data/hps_ckpt
huggingface-cli login
huggingface-cli download --resume-download xswu/HPSv2 HPS_v2.1_compressed.pt --local-dir ./data/hps_ckpt
huggingface-cli download --resume-download laion/CLIP-ViT-H-14-laion2B-s32B-b79K open_clip_pytorch_model.bin --local-dir ./data/hps_ckpt
./data/ps.mkdir ./data/ps
huggingface-cli login
python ./scripts/huggingface/download_hf.py --repo_id yuvalkirstain/PickScore_v1 --local_dir ./data/ps
python ./scripts/huggingface/download_hf.py --repo_id laion/CLIP-ViT-H-14-laion2B-s32B-b79K --local_dir ./data/clip
# Write training prompts into ./prompts.txt. Note: For online RL, no image-text pairs are needed—only inference text.
via ./prompts.txt
# Pre-extract text embeddings from your custom training dataset—this boosts training efficiency.
bash scripts/preprocess/preprocess_flux_rl_embeddings.sh
cp videos2caption2.json ./data/rl_embeddings
HPS-v2.1 serves as the Reward Model in our reinforcement learning process.
bash scripts/finetune/SRPO_training_hpsv2.sh
(Optional) PickScore serves as the Reward Model in our reinforcement learning process.
bash scripts/finetune/SRPO_training_ps.sh
⚠️ Current control words are designed for HPS-v2.1, so training with PickScore may yield suboptimal results vs. HPS due to this mismatch.
Run distributed training with pdsh.
#!/bin/bash
echo "$NODE_IP_LIST" | tr ',' '\n' | sed 's/:8$//' | grep -v '1.1.1.1' > /tmp/pssh.hosts
node_ip=$(paste -sd, /tmp/pssh.hosts)
pdsh -w $node_ip "conda activate SRPO;cd <project path>; bash scripts/finetune/SRPO_training_hpsv2.sh"
preprocess_flux_embedding.py and latent_flux_rl_datasets.py to pre-extract text embeddings from your custom training dataset—this boosts training efficiency.args.vis_sampling_step to modify sigma_schedule. Typically, this value matches the model's regular inference steps.For best results, use these settings as a starting point and adjust for your model/dataset:
We referenced the following works, and appreciate their contributions to the community.
If you find SRPO useful for your research and applications, please cite using this BibTeX:
@misc{shen2025directlyaligningdiffusiontrajectory, title={Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference}, author={Xiangwei Shen and Zhimin Li and Zhantao Yang and Shiyi Zhang and Yingfang Zhang and Donghao Li and Chunyu Wang and Qinglin Lu and Yansong Tang}, year={2025}, eprint={2509.06942}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2509.06942}, }