skip to main content
10.1145/3581783.3612215acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

High Fidelity Face Swapping via Semantics Disentanglement and Structure Enhancement

Published:27 October 2023Publication History

ABSTRACT

In this paper, we propose a novel Semantics and Structure-aware face Swapping framework (S2Swap) that exploits semantics disentanglement and structure enhancement for high fidelity face generation. Different from previous methods that either 1) suffer from degraded generation fidelity due to insufficient identity-attributes disentanglement or 2) neglect the importance of structure information for identity consistency, our approach can achieve local facial semantics disentanglement beyond global identity while boosting identity consistency through structure enhancement. Specifically, to achieve identity-attributes disentanglement, our S2Swap is designed from global-local perspectives. Firstly, an Oriented Identity Transfer module is proposed to globally disentangle target identity and attributes under global identity semantics prior. Such global disentanglement enables source identity transfer to the individual target identity. Secondly, a Local Semantics Disentanglement module is devised to disentangle local identity and identity-irrelevant facial semantics, providing local semantic compensation for the global counterpart. Moreover, to boost identity consistency, a Structure-Aware Head Modeling module is introduced to provide the desired face structure enhancement through an intuitive face sketch. Finally, considering the identity-attributes trade-off, we adaptively integrate semantics and structure information in a self-learning manner. Extensive experiments qualitatively and quantitatively show that our method outperforms SOTA face swapping methods in terms of both identity transfer and attribute preservation.

References

  1. 2016. FaceSwap. https://github.com/MarekKowalski/FaceSwap.Google ScholarGoogle Scholar
  2. 2017. DeepFakes. https://github.com/deepfakes/faceswap.Google ScholarGoogle Scholar
  3. 2018. faceswap-GAN: A denoising autoencoder + adversarial losses and attention mechanisms for face swapping. https://github.com/shaoanlu/faceswap-GAN.Google ScholarGoogle Scholar
  4. Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, and Kevin Murphy. 2017. Deep Variational Information Bottleneck. In 5th International Conference on Learning Representations.Google ScholarGoogle Scholar
  5. Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. Creating a photoreal digital actor: The digital emily project. In 2009 Conference for Visual Media Production. 176--187.Google ScholarGoogle ScholarCross RefCross Ref
  6. Volker Blanz, Kristina Scherbaum, Thomas Vetter, and Hans-Peter Seidel. 2004. Exchanging Faces in Images. Comput. Graph. Forum, Vol. 23, 3 (2004), 669--676.Google ScholarGoogle ScholarCross RefCross Ref
  7. Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, Warren N. Waggenspack (Ed.). ACM, 187--194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Meng Cao, Haozhi Huang, Hao Wang, Xuan Wang, Li Shen, Sheng Wang, Linchao Bao, Zhifeng Li, and Jiebo Luo. 2021. UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing. IEEE Trans. Image Process., Vol. 30 (2021), 6107--6116.Google ScholarGoogle ScholarCross RefCross Ref
  9. Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. Simswap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia. 2003--2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. 2018. VoxCeleb2: Deep Speaker Recognition. In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association. 1086--1090.Google ScholarGoogle Scholar
  11. Jiankang Deng, Jia Guo, Jing Yang, Niannan Xue, Irene Kotsia, and Stefanos Zafeiriou. 2022. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 44, 10 (2022), 5962--5979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 285--295.Google ScholarGoogle ScholarCross RefCross Ref
  13. Claudio Ferrari, Matteo Serpentoni, Stefano Berretti, and Alberto Del Bimbo. 2022. What makes you, you? Analyzing Recognition by Swapping Face Parts. 26TH International Conference on Pattern Recognition (2022).Google ScholarGoogle Scholar
  14. Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, and Ran He. 2021. Information bottleneck disentanglement for identity swapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3404--3413.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Annual Conference on Neural Information Processing Systems. 2672--2680.Google ScholarGoogle Scholar
  16. Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z. Li. 2020. Towards Fast, Accurate and Stable 3D Dense Face Alignment. In European Conference on Computer Vision. 152--168.Google ScholarGoogle Scholar
  17. Ziheng Hu, Hongtao Xie, Lingyun Yu, Xingyu Gao, Zhihua Shang, and Yongdong Zhang. 2022. Dynamic-Aware Federated Learning for Face Forgery Video Detection. ACM Trans. Intell. Syst. Technol., Vol. 13, 4, Article 57 (jun 2022), 25 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501--1510.Google ScholarGoogle ScholarCross RefCross Ref
  19. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In 6th International Conference on Learning Representations.Google ScholarGoogle Scholar
  20. Tero Karras, Samuli Laine, and Timo Aila. 2021. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 12 (2021), 4217--4228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and Improving the Image Quality of StyleGAN. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8107--8116.Google ScholarGoogle Scholar
  22. Jiseob Kim, Jihoon Lee, and Byoung-Tak Zhang. 2022. Smooth-Swap: A Simple Enhancement for Face-Swapping with Smoothness. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10769--10778.Google ScholarGoogle Scholar
  23. Jiaming Li, Hongtao Xie, Lingyun Yu, Xingyu Gao, and Yongdong Zhang. 2021. Discriminative Feature Mining Based on Frequency Information and Metric Learning for Face Forgery Detection. IEEE Transactions on Knowledge and Data Engineering (2021), 1--1.Google ScholarGoogle Scholar
  24. Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 (2019).Google ScholarGoogle Scholar
  25. Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, and Yongdong Zhang. 2023. MomentDiff: Generative Video Moment Retrieval from Random to Real. arXiv preprint arXiv:2307.02869 (2023).Google ScholarGoogle Scholar
  26. Qi Li, Weining Wang, Chengzhong Xu, and Zhenan Sun. 2022. Learning Disentangled Representation for One-shot Progressive Face Swapping. arXiv preprint arXiv:2203.12985 (2022).Google ScholarGoogle Scholar
  27. Tao Li and Lei Lin. 2019. Anonymousnet: Natural face de-identification with measurable privacy. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0--0.Google ScholarGoogle ScholarCross RefCross Ref
  28. Yuan Lin, Shengjin Wang, Qian Lin, and Feng Tang. 2012. Face Swapping under Large Pose Variations: A 3D Model Based Approach. In Proceedings of the 2012 IEEE International Conference on Multimedia and Expo. 333--338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google ScholarGoogle Scholar
  30. Jacek Naruniec, Leonhard Helminger, Christopher Schroers, and Romann M. Weber. 2020. High-Resolution Neural Face Swapping for Visual Effects. Comput. Graph. Forum, Vol. 39, 4 (2020), 173--184.Google ScholarGoogle ScholarCross RefCross Ref
  31. Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In IEEE Conference on Computer Vision and Pattern Recognition. 2337--2346.Google ScholarGoogle Scholar
  32. Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics: Learning to Detect Manipulated Facial Images. In 2019 IEEE/CVF International Conference on Computer Vision. 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  33. Nataniel Ruiz, Eunji Chong, and James M. Rehg. 2018. Fine-Grained Head Pose Estimation Without Keypoints. In 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2074--2083.Google ScholarGoogle Scholar
  34. Zhihua Shang, Hongtao Xie, Lingyun Yu, Zhengjun Zha, and Yongdong Zhang. 2023. Constructing Spatio-Temporal Graphs for Face Forgery Detection. ACM Transactions on the Web, Vol. 17, 3 (2023), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhihua Shang, Hongtao Xie, Zhengjun Zha, Lingyun Yu, Yan Li, and Yongdong Zhang. 2021. PRRNet: Pixel-Region Relation Network for Face Forgery Detection. Pattern Recogn., Vol. 116, C (aug 2021), 10 pages.Google ScholarGoogle Scholar
  36. Luan Tran and Xiaoming Liu. 2018. Nonlinear 3D Face Morphable Model. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 7346--7355.Google ScholarGoogle ScholarCross RefCross Ref
  37. Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018b. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 5265--5274.Google ScholarGoogle Scholar
  38. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 8798--8807.Google ScholarGoogle Scholar
  39. Yuhan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, and Rongrong Ji. 2021. HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. 1136--1142.Google ScholarGoogle ScholarCross RefCross Ref
  40. Erroll Wood, Tadas Baltrusaitis, Charlie Hewitt, Matthew Johnson, Jingjing Shen, Nikola Milosavljevic, Daniel Wilde, Stephan Garbin, Toby Sharp, Ivan Stojiljkovic, et al. 2022. 3D face reconstruction with dense landmarks. arXiv preprint arXiv:2204.02776 (2022).Google ScholarGoogle Scholar
  41. Chao Xu, Jiangning Zhang, Miao Hua, Qian He, Zili Yi, and Yong Liu. 2022b. Region-Aware Face Swapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7632--7641.Google ScholarGoogle ScholarCross RefCross Ref
  42. Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, and Shengfeng He. 2022a. High-resolution Face Swapping via Latent Semantics Disentanglement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7632--7641.Google ScholarGoogle ScholarCross RefCross Ref
  43. Zhiliang Xu, Xiyu Yu, Zhibin Hong, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding, and Xiang Bai. 2021. Facecontroller: Controllable attribute editing for face in the wild. In Proceedings of the AAAI Conference on Artificial Intelligence. 3083--3091.Google ScholarGoogle ScholarCross RefCross Ref
  44. Lingyun Yu, Hongtao Xie, and Yongdong Zhang. 2022. Multimodal Learning for Temporally Coherent Talking Face Generation With Articulator Synergy. IEEE Transactions on Multimedia, Vol. 24 (2022), 2950--2962.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters, Vol. 23, 10 (2016), 1499--1503.Google ScholarGoogle ScholarCross RefCross Ref
  46. Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5103--5112.Google ScholarGoogle Scholar
  47. Yuhao Zhu, Qi Li, Jian Wang, Cheng-Zhong Xu, and Zhenan Sun. 2021. One Shot Face Swapping on Megapixels. In IEEE Conference on Computer Vision and Pattern Recognition. 4834--4844.Google ScholarGoogle Scholar

Index Terms

  1. High Fidelity Face Swapping via Semantics Disentanglement and Structure Enhancement

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 October 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)127
      • Downloads (Last 6 weeks)18

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader