Project | Hugging Face | ModelScope | Space Demo | Discord | Technical Report
We introduce ACE-Step, a novel open-source foundation model for music generation that overcomes key limitations of existing approaches and achieves state-of-the-art performance through a holistic architectural design. Current methods face inherent trade-offs between generation speed, musical coherence, and controllability. For instance, LLM-based models (e.g., Yue, SongGen) excel at lyric alignment but suffer from slow inference and structural artifacts. Diffusion models (e.g., DiffRhythm), on the other hand, enable faster synthesis but often lack long-range structural coherence.
ACE-Step bridges this gap by integrating diffusion-based generation with Sana’s Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer. It further leverages MERT and m-hubert to align semantic representations (REPA) during training, enabling rapid convergence. As a result, our model synthesizes up to 4 minutes of music in just 20 seconds on an A100 GPU—15× faster than LLM-based baselines—while achieving superior musical coherence and lyric alignment across melody, harmony, and rhythm metrics. Moreover, ACE-Step preserves fine-grained acoustic details, enabling advanced control mechanisms such as voice cloning, lyric editing, remixing, and track generation (e.g., lyric2vocal, singing2accompaniment).
Rather than building yet another end-to-end text-to-music pipeline, our vision is to establish a foundation model for music AI: a fast, general-purpose, efficient yet flexible architecture that makes it easy to train sub-tasks on top of it. This paves the way for developing powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. In short, we aim to build the Stable Diffusion moment for music.
📃 2025.06.02: Released ACE-Step Technical Report (PDF).
🎮 2025.05.14: Add Stable Audio Open Small sampler pingpong. Use SDE to achieve better music consistency and quality, including lyric alignment and style alignment. Use a better method to re-implement Audio2Audio
🎤 2025.05.12: Release RapMachine and fix lora training issues
acestep --torch_compile true --cpu_offload true --overlapped_decode true
Windows need to install triton:
pip install triton-windows

🚀 2025.05.08: ComfyUI_ACE-Step node is now available! Explore the power of ACE-Step within ComfyUI. 🎉
🚀 2025.05.06: Open source demo code and model
We have evaluated ACE-Step across different hardware setups, yielding the following throughput results:
| Device | RTF (27 steps) | Time to render 1 min audio (27 steps) | RTF (60 steps) | Time to render 1 min audio (60 steps) |
|---|---|---|---|---|
| NVIDIA RTX 4090 | 34.48 × | 1.74 s | 15.63 × | 3.84 s |
| NVIDIA A100 | 27.27 × | 2.20 s | 12.27 × | 4.89 s |
| NVIDIA RTX 3090 | 12.76 × | 4.70 s | 6.48 × | 9.26 s |
| MacBook M2 Max | 2.27 × | 26.43 s | 1.03 × | 58.25 s |
We use RTF (Real-Time Factor) to measure the performance of ACE-Step. Higher values indicate faster generation speed. 27.27x means to generate 1 minute of music, it takes 2.2 seconds (60/27.27). The performance is measured on a single GPU with batch size 1 and 27 steps.
First, clone the ACE-Step repository to your local machine and navigate into the project directory:
git clone https://github.com/ace-step/ACE-Step.git
cd ACE-Step
Ensure you have the following installed:
Python: Version 3.10 or later is recommended. You can download it from python.org.Conda or venv: For creating a virtual environment (Conda is recommended).It is highly recommended to use a virtual environment to manage project dependencies and avoid conflicts. Choose one of the following methods:
Create the environment named ace_step with Python 3.10:
conda create -n ace_step python=3.10 -y
Activate the environment:
conda activate ace_step
Navigate to the cloned ACE-Step directory.
Create the virtual environment (commonly named venv):
python -m venv venv
Activate the environment:
venv\Scripts\activate.bat
(If you encounter execution policy errors, you might need to run.\venv\Scripts\Activate.ps1
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process first)source venv/bin/activate
Once your virtual environment is activated: a. (Windows Only) If you are on Windows and plan to use an NVIDIA GPU, install PyTorch with CUDA support first:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
(Adjust cu126 if you have a different CUDA version. For other PyTorch installation options, refer to the official PyTorch website).
b. Install ACE-Step and its core dependencies:
pip install -e .
The ACE-Step application is now installed. The GUI works on Windows, macOS, and Linux. For instructions on how to run it, please see the Usage section.

acestep --port 7865
acestep --checkpoint_path /path/to/checkpoint --port 7865 --device_id 0 --share true --bf16 true
--checkpoint_path is set and models exist at the path, load from checkpoint_path.--checkpoint_path is set but models do not exist at the path, auto download models to checkpoint_path.--checkpoint_path is not set, auto download models to the default path ~/.cache/ace-step/checkpoints.If you are using macOS, please use --bf16 false to avoid errors.
If you intend to integrate ACE-Step as a library into your own Python projects, you can install the latest version directly from GitHub using the following pip command.
Direct Installation via pip:
It's recommended to use this command within a virtual environment to avoid conflicts with other packages.pip install git+https://github.com/ace-step/ACE-Step.git
--checkpoint_path: Path to the model checkpoint (default: downloads automatically)--server_name: IP address or hostname for the Gradio server to bind to (default: '127.0.0.1'). Use '0.0.0.0' to make it accessible from other devices on the network.--port: Port to run the Gradio server on (default: 7865)--device_id: GPU device ID to use (default: 0)--share: Enable Gradio sharing link (default: False)--bf16: Use bfloat16 precision for faster inference (default: True)--torch_compile: Use torch.compile() to optimize the model, speeding up inference (default: False).
pip install triton-windows
--cpu_offload: Offload model weights to CPU to save GPU memory (default: False)--overlapped_decode: Use overlapped decoding to speed up inference (default: False)The ACE-Step interface provides several tabs for different music generation and editing tasks:
📋 Input Fields:
⚙️ Settings:
🚀 Generation: Click "Generate" to create music based on your inputs
The examples/input_params directory contains sample input parameters that can be used as references for generating music.
See TRAIN_INSTRUCTION.md for detailed training instructions.
This project is licensed under Apache License 2.0
ACE-Step enables original music generation across diverse genres, with applications in creative production, education, and entertainment. While designed to support positive and artistic use cases, we acknowledge potential risks such as unintentional copyright infringement due to stylistic similarity, inappropriate blending of cultural elements, and misuse for generating harmful content. To ensure responsible use, we encourage users to verify the originality of generated works, clearly disclose AI involvement, and obtain appropriate permissions when adapting protected styles or materials. By using ACE-Step, you agree to uphold these principles and respect artistic integrity, cultural diversity, and legal compliance. The authors are not responsible for any misuse of the model, including but not limited to copyright violations, cultural insensitivity, or the generation of harmful content.
🔔 Important Notice
The only official website for the ACE-Step project is our GitHub Pages site.
We do not operate any other websites.
🚫 Fake domains include but are not limited to:
ac**p.com, a**p.org, a***c.org
⚠️ Please be cautious. Do not visit, trust, or make payments on any of those sites.
This project is co-led by ACE Studio and StepFun.
If you find this project useful for your research, please consider citing:
@misc{gong2025acestep, title={ACE-Step: A Step Towards Music Generation Foundation Model}, author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, howpublished={\url{https://github.com/ace-step/ACE-Step}}, year={2025}, note={GitHub repository} }