VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Published: 30 November 2022 Publication History


We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame according to the same expression template using the expression editing network, resulting in a video with the canonical expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and post-processing. We use learning-based approaches for all three steps and all our modules can be tackled in a sequential pipeline without any user intervention. Furthermore, our system is a generic approach that does not need to be retrained to a specific person. Evaluations on two widely-used datasets and in-the-wild examples demonstrate the superiority of our framework over other state-of-the-art methods in terms of lip-sync accuracy and visual quality.

Supplemental Material

MP4 File
ZIP File
Appendix and Demo Video
ZIP File


  • (2025)Speech-Driven Facial GenerationComputer Science and Application10.12677/csa.2025.15102015:01(199-208)Online publication date: 2025
  • (2024)Video and Audio Deepfake Datasets and Open Issues in Deepfake Technology: Being Ahead of the CurveForensic Sciences10.3390/forensicsci40300214:3(289-377)Online publication date: 13-Jul-2024
  • (2024)VividWav2Lip: High-Fidelity Facial Animation Generation Based on Speech-Driven Lip SynchronizationElectronics10.3390/electronics1318365713:18(3657)Online publication date: 14-Sep-2024
      SA '22: SIGGRAPH Asia 2022 Conference Papers
      November 2022
      Published: 30 November 2022


      Author Tags

      1. Audio-driven Generation
      2. Facial Animation
      3. Video Synthesis


