The basic steps involve first using the ElevenLabs platform to deepfake the voice, then feeding the fake voice along with a talking head video into the video-retalking model to generate the final DeepFake video.
Step 1: Deep Fake audio
In this step, you need to have a recording of the person you want to DeepFake. Go to ElevenLabs and use its speech to speech service. It may take 10 to 20 dollars. Finally, you should have a deep fake audio.
Step 2: Audio-based Lip Synchronization
We are going to use video-retalking model. If you don't want to run the model on your local, you can use some cloud service like video-retalking on Replicate. Input the fake voice together with a talking head video into the video-retalking model to produce the final DeepFake video.