logo
4
0
WeChat Login

None Trainer

Logo

Z-Image / LongCat-Image LoRA Training Studio

Efficient LoRA fine-tuning tool based on AC-RF (Anchor-Coupled Rectified Flow) algorithm

Supports: Z-Image Turbo | LongCat-Image

中文版 README


✨ Features

FeatureDescription
🎯 Anchor-Coupled SamplingTrain only at key timesteps, efficient and stable
10-Step Fast InferenceMaintains Turbo model's acceleration structure
📉 Min-SNR WeightingReduces loss fluctuation across timesteps
🎨 Multiple Loss ModesFrequency-aware / Style-structure / Unified
🔧 Auto Hardware OptimizationDetects GPU and auto-configures (Tier S/A/B)
🖥️ Modern WebUIVue.js + FastAPI full-stack interface
📊 Real-time MonitoringLoss curves, progress, VRAM monitoring
🏷️ Ollama TaggingOne-click AI image captioning
🔄 Multi-Model SupportSwitch between Z-Image / LongCat-Image

🚀 Quick Start

Step 1: Install PyTorch (Required)

Choose based on your CUDA version:

# CUDA 12.8 (RTX 40 series recommended) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 # CUDA 12.4 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 # CUDA 12.1 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # CUDA 11.8 (older GPUs) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Step 2: Install Flash Attention (Recommended)

Flash Attention significantly reduces VRAM usage and speeds up training.

Linux - Download from Flash Attention Releases:

# Check your environment versions python --version # e.g.: Python 3.12 python -c "import torch; print(torch.version.cuda)" # e.g.: 12.8 # Download matching version (example: Python 3.12 + CUDA 12 + PyTorch 2.5) wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.5cxx11abiFALSE-cp312-cp312-linux_x86_64.whl # Install pip install flash_attn-*.whl

Windows - Download prebuilt from AI-windows-whl:

:: Example: Python 3.12 + CUDA 12.8 + PyTorch 2.9.1 pip install https://huggingface.co/Wildminder/AI-windows-whl/resolve/main/flash_attn-2.8.3+cu128torch2.9.1cxx11abiTRUE-cp313-cp313-win_amd64.whl

Tip: If no matching version exists, skip this step. The program will automatically use SDPA as fallback.

Step 3: Install Diffusers (Required)

⚠️ Note: This project requires diffusers 0.36+ (dev version), install from git:

pip install git+https://github.com/huggingface/diffusers.git

Step 4: One-Click Deploy

Linux / Mac

# Clone project git clone https://github.com/None9527/None_Z-image-Turbo_trainer.git cd None_Z-image-Turbo_trainer # One-click install dependencies chmod +x setup.sh ./setup.sh # Edit config (set model paths) cp env.example .env nano .env # Start service ./start.sh

Windows

:: Clone project git clone https://github.com/None9527/None_Z-image-Turbo_trainer.git cd None_Z-image-Turbo_trainer :: One-click install (double-click or command line) setup.bat :: Edit config (set model paths) copy env.example .env notepad .env :: Start service start.bat

Step 5: Access Web UI

After deployment, open browser: http://localhost:9198


📦 Manual Installation (Optional)

Expand for manual installation if one-click deploy fails

⚠️ Prerequisites

  • Python 3.10+
  • Node.js 18+ (for frontend build)
  • npm or pnpm

Installation Steps

# 1. Install Python dependencies pip install -r requirements.txt # 2. Install latest diffusers pip install git+https://github.com/huggingface/diffusers.git # 3. Install this project pip install -e . # 4. Build frontend (Important!) cd webui-vue npm install # or pnpm install npm run build # generates dist directory cd .. # 5. Create config file cp env.example .env # 6. Start service cd webui-vue/api && python main.py --port 9198

💡 Tip: If npm run build fails, ensure Node.js version >= 18. Use node -v to check version.


🖥️ Command Line Usage (Advanced)

Besides Web UI, you can use command line directly:

Generate Cache

# Generate Latent cache (VAE encoding) python -m zimage_trainer.cache_latents \ --model_path ./zimage_models \ --dataset_path ./datasets/your_dataset \ --output_dir ./datasets/your_dataset # Generate Text cache (text encoding) python -m zimage_trainer.cache_text_encoder \ --text_encoder ./zimage_models/text_encoder \ --input_dir ./datasets/your_dataset \ --output_dir ./datasets/your_dataset \ --max_length 512 # Optional: 256/512/1024, default 512

Start Training

First copy example config and modify paths:

# Z-Image training cp config/acrf_config.toml config/my_zimage_config.toml # Edit my_zimage_config.toml, modify [model].dit and [[dataset.sources]].cache_directory # LongCat-Image training cp config/longcat_turbo_config.toml config/my_longcat_config.toml # Edit my_longcat_config.toml, modify [model].dit and [[dataset.sources]].cache_directory

Then start training:

# Z-Image training (recommend using accelerate) python -m accelerate.commands.launch --mixed_precision bf16 \ scripts/train_zimage_v2.py --config config/my_zimage_config.toml # LongCat-Image training python -m accelerate.commands.launch --mixed_precision bf16 \ scripts/train_longcat.py --config config/my_longcat_config.toml

⚠️ Important: Must modify these paths in config:

  • [model].dit - Transformer model path
  • [model].output_dir - Output directory
  • [[dataset.sources]].cache_directory - Dataset cache path

Inference

# Load LoRA and generate image python -m zimage_trainer.inference \ --model_path ./zimage_models \ --lora_path ./output/your_lora.safetensors \ --prompt "your prompt here" \ --output_path ./output/generated.png \ --num_inference_steps 10

Start Web UI Service

# Method 1: Use script ./start.sh # Linux/Mac start.bat # Windows # Method 2: Direct start cd webui-vue/api python main.py --port 9198 --host 0.0.0.0 # Method 3: Use uvicorn (hot reload) cd webui-vue/api uvicorn main:app --port 9198 --reload

⚙️ Configuration

Environment Variables (.env)

# Service config TRAINER_PORT=9198 # Web UI port TRAINER_HOST=0.0.0.0 # Listen address # Model paths MODEL_PATH=/./zimage_models # Dataset path DATASET_PATH=./datasets # Ollama config OLLAMA_HOST=http://127.0.0.1:11434

Training Parameters (config/acrf_config.toml)

[acrf] turbo_steps = 10 # Anchor count (inference steps) shift = 3.0 # Z-Image official value jitter_scale = 0.02 # Anchor jitter [lora] network_dim = 16 # LoRA rank network_alpha = 16 # LoRA alpha [training] learning_rate = 1e-4 # Learning rate num_train_epochs = 10 # Training epochs snr_gamma = 5.0 # Min-SNR weighting loss_mode = "standard" # Loss mode (see below) [dataset] batch_size = 1 enable_bucket = true max_sequence_length = 512 # Text sequence length (must match cache)

🎨 Loss Modes

ModeDescriptionUse Case
standardBasic MSE + optional FFT/CosineGeneral training
frequencyFrequency-aware (HF L1 + LF Cosine)Sharpen details
styleStyle-structure (SSIM + Lab stats)Learn lighting/color style
unifiedFrequency + Style combinedFull enhancement

💡 Beginners: Start with standard mode, try others if unsatisfied.

📐 Freq Sub-parameters

ParameterDefaultFunctionRecommended
alpha_hf1.0High-freq (texture/edge) enhancement0.5 ~ 1.5
beta_lf0.2Low-freq (structure/lighting) lock0.1 ~ 0.5

Scenario Configs:

Scenarioalpha_hfbeta_lfNotes
Sharpen Details1.0~1.50.1Focus on textures
Keep Structure0.50.3~0.5Prevent composition shift
⭐ Balanced0.80.2Recommended default

🎨 Style Sub-parameters

ParameterDefaultFunctionRecommended
lambda_struct1.0SSIM structure lock (prevent face collapse)0.5 ~ 1.5
lambda_light0.5L-channel stats (learn lighting curves)0.3 ~ 1.0
lambda_color0.3ab-channel stats (learn color preference)0.1 ~ 0.5
lambda_tex0.5High-freq L1 (texture enhancement)0.3 ~ 0.8

Scenario Configs:

ScenariostructlightcolortexNotes
Portrait1.50.30.20.3Strong structure lock
Style Transfer0.50.80.50.3Focus on lighting/color
Detail Enhancement0.80.30.20.8Sharpen textures
⭐ Balanced1.00.50.30.5Recommended default

⚠️ Note: When both Freq and Style are enabled, high-freq penalties overlap (alpha_hf and lambda_tex). Consider reducing one.

Hardware Tiers

TierVRAMGPU ExamplesAuto Optimization
S32GB+A100/H100/5090Full performance
A24GB3090/4090High performance, native SDPA
B16GB4080/4070TiBalanced mode

📊 Workflow

StepFunctionDescription
1️⃣DatasetImport images, Ollama AI captioning
➡️
2️⃣CachePre-compute Latent and Text embeddings
➡️
3️⃣TrainAC-RF LoRA fine-tuning
➡️
4️⃣GenerateLoad LoRA and test results

🔧 FAQ

Q: Loss fluctuates a lot (0.08-0.6)?

A: Normal! Different sigma values have different prediction difficulty. Watch if EMA loss trends downward overall.

Q: CUDA Out of Memory?

A: Try these methods:

  • Increase gradient_accumulation_steps (e.g., 4 → 8)
  • Reduce network_dim (e.g., 32 → 16)
  • Ensure Flash Attention is installed
Q: How many epochs?

A: Depends on dataset size:

  • < 50 images: 10-15 epochs
  • 50-200 images: 8-10 epochs
  • > 200 images: 5-8 epochs

📬 Contact


📝 License

Apache 2.0

🙏 Acknowledgements


Made with ❤️ by None

About

No description, topics, or website provided.
Language
Python74.4%
Vue20.3%
TypeScript2.6%
Shell0.9%
Others1.8%