
Heygem is a fully offline video synthesis tool designed for Windows systems that can precisely clone your appearance and voice, digitalizing your image. You can drive your virtual avatar through text and voice for video production. No internet connection is required, protecting your privacy while enjoying convenient and efficient digital experiences.

NVIDIA driver download link: https://www.nvidia.cn/drivers/lookup/
Use the command wsl --list --verbose to check if WSL is installed. The image below shows it's already installed, no need to reinstall.

- WSL installation command:
wsl --install- May fail due to network issues, try multiple times
- Need to set and remember a new username and password during installation
Update WSL using wsl --update

Download Docker for Windows, choose the appropriate installation package for your CPU architecture.
This interface indicates successful installation:

Run Docker

Accept the agreement and skip login on first run



Install using Docker, docker-compose as follows:
The docker-compose.yml file is in the /deploy directory.
Execute docker-compose up -d in the /deploy directory
Wait patiently (about half an hour, depending on internet speed), download will consume about 70GB of traffic, make sure to use WiFi
When you see three services in Docker, it indicates success

npm run build:win, after execution, HeyGem-1.0.0-setup.exe will be generated in the dist directoryHeyGem-1.0.0-setup.exe to installWe provide APIs for model training and video synthesis. After Docker starts, several ports will be exposed locally, accessible through http://127.0.0.1.
For specific code, refer to:
D:\heygem_data\voice\data
D:\heygem_data\voice\datais agreed with theguiji2025/fish-speech-zimingservice, can be modified in docker-compose
http://127.0.0.1:18180/v1/preprocess_and_tran interface
Parameter example:
{ "format": ".wav", "reference_audio": "xxxxxx/xxxxx.wav", "lang": "zh" }Response example:
{ "asr_format_audio_url": "xxxx/x/xxx/xxx.wav", "reference_audio_text": "xxxxxxxxxxxx" }Record the response results for later audio synthesis use
Interface: http://127.0.0.1:18180/v1/invoke
// Request parameters
{
"speaker": "{uuid}", // A unique UUID
"text":"xxxxxxxxxx", // Text content to synthesize
"format": "wav", // Fixed parameter
"topP": 0.7, // Fixed parameter
"max_new_tokens": 1024, // Fixed parameter
"chunk_length": 100, // Fixed parameter
"repetition_penalty": 1.2, // Fixed parameter
"temperature": 0.7, // Fixed parameter
"need_asr": false, // Fixed parameter
"streaming": false, // Fixed parameter
"is_fixed_seed": 0, // Fixed parameter
"is_norm": 0, // Fixed parameter
"reference_audio": "{voice.asr_format_audio_url}", // Return value from previous "Model Training" step
"reference_text": "{voice.reference_audio_text}" // Return value from previous "Model Training" step
}
http://127.0.0.1:8383/easy/submit// Request parameters
{
"audio_url": "{audioPath}", // Audio path
"video_url": "{videoPath}", // Video path
"code": "{uuid}", // Unique key
"chaofen": 0, // Fixed value
"watermark_switch": 0, // Fixed value
"pn": 1 // Fixed value
}
http://127.0.0.1:8383/easy/query?code=${taskCode}
GET request, parameter
taskCodeis the return value from the above synthesis interface
end_sub@hotmail.com