research-article

High Fidelity Face Swapping via Semantics Disentanglement and Structure Enhancement

Authors:
Fengyuan Liu

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0009-0005-4044-214X
View Profile

,
Lingyun Yu

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0001-6403-761X
View Profile

,
Hongtao Xie

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0002-6249-5315
View Profile

,
Chuanbin Liu

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0002-2840-6235
View Profile

,
Zhiguo Ding

Third Research Institute of the Ministry of Public Security, Shanghai, China

Third Research Institute of the Ministry of Public Security, Shanghai, China

0000-0002-9591-8601
View Profile

,
Quanwei Yang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0003-3997-0031
View Profile

,
Yongdong Zhang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0002-1151-1792
View Profile

MM '23: Proceedings of the 31st ACM International Conference on MultimediaOctober 2023Pages 6907–6917https://doi.org/10.1145/3581783.3612215

Published:27 October 2023Publication History

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 6907–6917

ABSTRACT

In this paper, we propose a novel Semantics and Structure-aware face Swapping framework (S2Swap) that exploits semantics disentanglement and structure enhancement for high fidelity face generation. Different from previous methods that either 1) suffer from degraded generation fidelity due to insufficient identity-attributes disentanglement or 2) neglect the importance of structure information for identity consistency, our approach can achieve local facial semantics disentanglement beyond global identity while boosting identity consistency through structure enhancement. Specifically, to achieve identity-attributes disentanglement, our S2Swap is designed from global-local perspectives. Firstly, an Oriented Identity Transfer module is proposed to globally disentangle target identity and attributes under global identity semantics prior. Such global disentanglement enables source identity transfer to the individual target identity. Secondly, a Local Semantics Disentanglement module is devised to disentangle local identity and identity-irrelevant facial semantics, providing local semantic compensation for the global counterpart. Moreover, to boost identity consistency, a Structure-Aware Head Modeling module is introduced to provide the desired face structure enhancement through an intuitive face sketch. Finally, considering the identity-attributes trade-off, we adaptively integrate semantics and structure information in a self-learning manner. Extensive experiments qualitatively and quantitatively show that our method outperforms SOTA face swapping methods in terms of both identity transfer and attribute preservation.

References

2016. FaceSwap. https://github.com/MarekKowalski/FaceSwap.Google Scholar
2017. DeepFakes. https://github.com/deepfakes/faceswap.Google Scholar
2018. faceswap-GAN: A denoising autoencoder + adversarial losses and attention mechanisms for face swapping. https://github.com/shaoanlu/faceswap-GAN.Google Scholar
Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, and Kevin Murphy. 2017. Deep Variational Information Bottleneck. In 5th International Conference on Learning Representations.Google Scholar
Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. Creating a photoreal digital actor: The digital emily project. In 2009 Conference for Visual Media Production. 176--187.Google ScholarCross Ref
Volker Blanz, Kristina Scherbaum, Thomas Vetter, and Hans-Peter Seidel. 2004. Exchanging Faces in Images. Comput. Graph. Forum, Vol. 23, 3 (2004), 669--676.Google ScholarCross Ref
Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, Warren N. Waggenspack (Ed.). ACM, 187--194.Google ScholarDigital Library
Meng Cao, Haozhi Huang, Hao Wang, Xuan Wang, Li Shen, Sheng Wang, Linchao Bao, Zhifeng Li, and Jiebo Luo. 2021. UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing. IEEE Trans. Image Process., Vol. 30 (2021), 6107--6116.Google ScholarCross Ref
Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. Simswap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia. 2003--2011.Google ScholarDigital Library
Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. 2018. VoxCeleb2: Deep Speaker Recognition. In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association. 1086--1090.Google Scholar
Jiankang Deng, Jia Guo, Jing Yang, Niannan Xue, Irene Kotsia, and Stefanos Zafeiriou. 2022. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 44, 10 (2022), 5962--5979.Google ScholarDigital Library
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 285--295.Google ScholarCross Ref
Claudio Ferrari, Matteo Serpentoni, Stefano Berretti, and Alberto Del Bimbo. 2022. What makes you, you? Analyzing Recognition by Swapping Face Parts. 26TH International Conference on Pattern Recognition (2022).Google Scholar
Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, and Ran He. 2021. Information bottleneck disentanglement for identity swapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3404--3413.Google ScholarCross Ref
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Annual Conference on Neural Information Processing Systems. 2672--2680.Google Scholar
Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z. Li. 2020. Towards Fast, Accurate and Stable 3D Dense Face Alignment. In European Conference on Computer Vision. 152--168.Google Scholar
Ziheng Hu, Hongtao Xie, Lingyun Yu, Xingyu Gao, Zhihua Shang, and Yongdong Zhang. 2022. Dynamic-Aware Federated Learning for Face Forgery Video Detection. ACM Trans. Intell. Syst. Technol., Vol. 13, 4, Article 57 (jun 2022), 25 pages.Google ScholarDigital Library
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501--1510.Google ScholarCross Ref
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In 6th International Conference on Learning Representations.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2021. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 12 (2021), 4217--4228.Google ScholarDigital Library
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and Improving the Image Quality of StyleGAN. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8107--8116.Google Scholar
Jiseob Kim, Jihoon Lee, and Byoung-Tak Zhang. 2022. Smooth-Swap: A Simple Enhancement for Face-Swapping with Smoothness. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10769--10778.Google Scholar
Jiaming Li, Hongtao Xie, Lingyun Yu, Xingyu Gao, and Yongdong Zhang. 2021. Discriminative Feature Mining Based on Frequency Information and Metric Learning for Face Forgery Detection. IEEE Transactions on Knowledge and Data Engineering (2021), 1--1.Google Scholar
Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 (2019).Google Scholar
Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, and Yongdong Zhang. 2023. MomentDiff: Generative Video Moment Retrieval from Random to Real. arXiv preprint arXiv:2307.02869 (2023).Google Scholar
Qi Li, Weining Wang, Chengzhong Xu, and Zhenan Sun. 2022. Learning Disentangled Representation for One-shot Progressive Face Swapping. arXiv preprint arXiv:2203.12985 (2022).Google Scholar
Tao Li and Lei Lin. 2019. Anonymousnet: Natural face de-identification with measurable privacy. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0--0.Google ScholarCross Ref
Yuan Lin, Shengjin Wang, Qian Lin, and Feng Tang. 2012. Face Swapping under Large Pose Variations: A 3D Model Based Approach. In Proceedings of the 2012 IEEE International Conference on Multimedia and Expo. 333--338.Google ScholarDigital Library
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
Jacek Naruniec, Leonhard Helminger, Christopher Schroers, and Romann M. Weber. 2020. High-Resolution Neural Face Swapping for Visual Effects. Comput. Graph. Forum, Vol. 39, 4 (2020), 173--184.Google ScholarCross Ref
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In IEEE Conference on Computer Vision and Pattern Recognition. 2337--2346.Google Scholar
Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics: Learning to Detect Manipulated Facial Images. In 2019 IEEE/CVF International Conference on Computer Vision. 1--11.Google ScholarCross Ref
Nataniel Ruiz, Eunji Chong, and James M. Rehg. 2018. Fine-Grained Head Pose Estimation Without Keypoints. In 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2074--2083.Google Scholar
Zhihua Shang, Hongtao Xie, Lingyun Yu, Zhengjun Zha, and Yongdong Zhang. 2023. Constructing Spatio-Temporal Graphs for Face Forgery Detection. ACM Transactions on the Web, Vol. 17, 3 (2023), 1--25.Google ScholarDigital Library
Zhihua Shang, Hongtao Xie, Zhengjun Zha, Lingyun Yu, Yan Li, and Yongdong Zhang. 2021. PRRNet: Pixel-Region Relation Network for Face Forgery Detection. Pattern Recogn., Vol. 116, C (aug 2021), 10 pages.Google Scholar
Luan Tran and Xiaoming Liu. 2018. Nonlinear 3D Face Morphable Model. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 7346--7355.Google ScholarCross Ref
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018b. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 5265--5274.Google Scholar
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 8798--8807.Google Scholar
Yuhan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, and Rongrong Ji. 2021. HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. 1136--1142.Google ScholarCross Ref
Erroll Wood, Tadas Baltrusaitis, Charlie Hewitt, Matthew Johnson, Jingjing Shen, Nikola Milosavljevic, Daniel Wilde, Stephan Garbin, Toby Sharp, Ivan Stojiljkovic, et al. 2022. 3D face reconstruction with dense landmarks. arXiv preprint arXiv:2204.02776 (2022).Google Scholar
Chao Xu, Jiangning Zhang, Miao Hua, Qian He, Zili Yi, and Yong Liu. 2022b. Region-Aware Face Swapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7632--7641.Google ScholarCross Ref
Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, and Shengfeng He. 2022a. High-resolution Face Swapping via Latent Semantics Disentanglement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7632--7641.Google ScholarCross Ref
Zhiliang Xu, Xiyu Yu, Zhibin Hong, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding, and Xiang Bai. 2021. Facecontroller: Controllable attribute editing for face in the wild. In Proceedings of the AAAI Conference on Artificial Intelligence. 3083--3091.Google ScholarCross Ref
Lingyun Yu, Hongtao Xie, and Yongdong Zhang. 2022. Multimodal Learning for Temporally Coherent Talking Face Generation With Articulator Synergy. IEEE Transactions on Multimedia, Vol. 24 (2022), 2950--2962.Google ScholarDigital Library
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters, Vol. 23, 10 (2016), 1499--1503.Google ScholarCross Ref
Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5103--5112.Google Scholar
Yuhao Zhu, Qi Li, Jian Wang, Cheng-Zhong Xu, and Zhenan Sun. 2021. One Shot Face Swapping on Megapixels. In IEEE Conference on Computer Vision and Pattern Recognition. 4834--4844.Google Scholar

Index Terms

High Fidelity Face Swapping via Semantics Disentanglement and Structure Enhancement
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

SimSwap: An Efficient Framework For High Fidelity Face Swapping
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

We propose an efficient framework, called Simple Swap (SimSwap), aiming for generalized and high fidelity face swapping. In contrast to previous approaches that either lack the ability to generalize to arbitrary identity or fail to preserve attributes ...
Read More
Quantum correlation swapping

Quantum correlations (QCs), including quantum entanglement and those different, are important quantum resources and have attracted much attention recently. Quantum entanglement swapping as a kernel technique has already been applied to quantum repeaters ...
Read More
RSFace: subject agnostic face swapping with expression high fidelity
Abstract
Face swapping has shown remarkable progress with the flourishing development of deep learning. In particular, the emergence of subject agnostic methods has broadened the range of applications of face swapping. Furthermore, high fidelity ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
face swapping
generative adversarial network
image translation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 127
  Total Downloads
- Downloads (Last 12 months)127
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

High Fidelity Face Swapping via Semantics Disentanglement and Structure Enhancement

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

SimSwap: An Efficient Framework For High Fidelity Face Swapping

Quantum correlation swapping

RSFace: subject agnostic face swapping with expression high fidelity