Real-time interactive streaming digital human enables synchronous audio and video dialogue. It can basically achieve commercial effects.
Effect of wav2lip | Effect of ernerf | Effect of musetalk
Tested on Ubuntu 20.04, Python 3.10, Pytorch 1.12 and CUDA 11.3
conda create -n nerfstream python=3.10
conda activate nerfstream
# If the cuda version is not 11.3 (confirm the version by running nvidia-smi), install the corresponding version of pytorch according to <https://pytorch.org/get-started/previous-versions/>
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
# If you need to train the ernerf model, install the following libraries
# pip install "git+https://github.com/facebookresearch/pytorch3d.git"
# pip install tensorflow-gpu==2.8.0
# pip install --upgrade "protobuf<=3.20.1"
Common installation issues FAQ
For setting up the linux cuda environment, you can refer to this article https://zhuanlan.zhihu.com/p/674972886
Download the models
Quark Cloud Disk https://pan.quark.cn/s/83a750323ef0
Google Drive https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing
Copy wav2lip256.pth to the models folder of this project and rename it to wav2lip.pth;
Extract wav2lip256_avatar1.tar.gz and copy the entire folder to the data/avatars folder of this project.
Run
python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1
Open http://serverip:8010/webrtcapi.html in a browser. First click'start' to play the digital human video; then enter any text in the text box and submit it. The digital human will broadcast this text.
The server side needs to open ports tcp:8010; udp:1-65536
If you need to purchase a high-definition wav2lip model for commercial use, Link.
Quick experience
https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3 Create an instance with this image to run it.
If you can't access huggingface, before running
export HF_ENDPOINT=https://hf-mirror.com
Usage instructions: https://livetalking-doc.readthedocs.io/en/latest
No need for the previous installation, just run directly.
docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v
The code is in /root/metahuman-stream. First, git pull to get the latest code, and then execute the commands as in steps 2 and 3.
The following images are provided:
If this project is helpful to you, please give it a star. Friends who are interested are also welcome to join in and improve this project together.