ABSTRACT
In this paper, we propose a novel Semantics and Structure-aware face Swapping framework (S2Swap) that exploits semantics disentanglement and structure enhancement for high fidelity face generation. Different from previous methods that either 1) suffer from degraded generation fidelity due to insufficient identity-attributes disentanglement or 2) neglect the importance of structure information for identity consistency, our approach can achieve local facial semantics disentanglement beyond global identity while boosting identity consistency through structure enhancement. Specifically, to achieve identity-attributes disentanglement, our S2Swap is designed from global-local perspectives. Firstly, an Oriented Identity Transfer module is proposed to globally disentangle target identity and attributes under global identity semantics prior. Such global disentanglement enables source identity transfer to the individual target identity. Secondly, a Local Semantics Disentanglement module is devised to disentangle local identity and identity-irrelevant facial semantics, providing local semantic compensation for the global counterpart. Moreover, to boost identity consistency, a Structure-Aware Head Modeling module is introduced to provide the desired face structure enhancement through an intuitive face sketch. Finally, considering the identity-attributes trade-off, we adaptively integrate semantics and structure information in a self-learning manner. Extensive experiments qualitatively and quantitatively show that our method outperforms SOTA face swapping methods in terms of both identity transfer and attribute preservation.
- 2016. FaceSwap. https://github.com/MarekKowalski/FaceSwap.Google Scholar
- 2017. DeepFakes. https://github.com/deepfakes/faceswap.Google Scholar
- 2018. faceswap-GAN: A denoising autoencoder + adversarial losses and attention mechanisms for face swapping. https://github.com/shaoanlu/faceswap-GAN.Google Scholar
- Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, and Kevin Murphy. 2017. Deep Variational Information Bottleneck. In 5th International Conference on Learning Representations.Google Scholar
- Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. Creating a photoreal digital actor: The digital emily project. In 2009 Conference for Visual Media Production. 176--187.Google ScholarCross Ref
- Volker Blanz, Kristina Scherbaum, Thomas Vetter, and Hans-Peter Seidel. 2004. Exchanging Faces in Images. Comput. Graph. Forum, Vol. 23, 3 (2004), 669--676.Google ScholarCross Ref
- Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, Warren N. Waggenspack (Ed.). ACM, 187--194.Google ScholarDigital Library
- Meng Cao, Haozhi Huang, Hao Wang, Xuan Wang, Li Shen, Sheng Wang, Linchao Bao, Zhifeng Li, and Jiebo Luo. 2021. UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing. IEEE Trans. Image Process., Vol. 30 (2021), 6107--6116.Google ScholarCross Ref
- Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. Simswap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia. 2003--2011.Google ScholarDigital Library
- Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. 2018. VoxCeleb2: Deep Speaker Recognition. In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association. 1086--1090.Google Scholar
- Jiankang Deng, Jia Guo, Jing Yang, Niannan Xue, Irene Kotsia, and Stefanos Zafeiriou. 2022. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 44, 10 (2022), 5962--5979.Google ScholarDigital Library
- Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 285--295.Google ScholarCross Ref
- Claudio Ferrari, Matteo Serpentoni, Stefano Berretti, and Alberto Del Bimbo. 2022. What makes you, you? Analyzing Recognition by Swapping Face Parts. 26TH International Conference on Pattern Recognition (2022).Google Scholar
- Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, and Ran He. 2021. Information bottleneck disentanglement for identity swapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3404--3413.Google ScholarCross Ref
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Annual Conference on Neural Information Processing Systems. 2672--2680.Google Scholar
- Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z. Li. 2020. Towards Fast, Accurate and Stable 3D Dense Face Alignment. In European Conference on Computer Vision. 152--168.Google Scholar
- Ziheng Hu, Hongtao Xie, Lingyun Yu, Xingyu Gao, Zhihua Shang, and Yongdong Zhang. 2022. Dynamic-Aware Federated Learning for Face Forgery Video Detection. ACM Trans. Intell. Syst. Technol., Vol. 13, 4, Article 57 (jun 2022), 25 pages.Google ScholarDigital Library
- Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501--1510.Google ScholarCross Ref
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In 6th International Conference on Learning Representations.Google Scholar
- Tero Karras, Samuli Laine, and Timo Aila. 2021. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 12 (2021), 4217--4228.Google ScholarDigital Library
- Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and Improving the Image Quality of StyleGAN. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8107--8116.Google Scholar
- Jiseob Kim, Jihoon Lee, and Byoung-Tak Zhang. 2022. Smooth-Swap: A Simple Enhancement for Face-Swapping with Smoothness. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10769--10778.Google Scholar
- Jiaming Li, Hongtao Xie, Lingyun Yu, Xingyu Gao, and Yongdong Zhang. 2021. Discriminative Feature Mining Based on Frequency Information and Metric Learning for Face Forgery Detection. IEEE Transactions on Knowledge and Data Engineering (2021), 1--1.Google Scholar
- Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 (2019).Google Scholar
- Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, and Yongdong Zhang. 2023. MomentDiff: Generative Video Moment Retrieval from Random to Real. arXiv preprint arXiv:2307.02869 (2023).Google Scholar
- Qi Li, Weining Wang, Chengzhong Xu, and Zhenan Sun. 2022. Learning Disentangled Representation for One-shot Progressive Face Swapping. arXiv preprint arXiv:2203.12985 (2022).Google Scholar
- Tao Li and Lei Lin. 2019. Anonymousnet: Natural face de-identification with measurable privacy. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0--0.Google ScholarCross Ref
- Yuan Lin, Shengjin Wang, Qian Lin, and Feng Tang. 2012. Face Swapping under Large Pose Variations: A 3D Model Based Approach. In Proceedings of the 2012 IEEE International Conference on Multimedia and Expo. 333--338.Google ScholarDigital Library
- Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
- Jacek Naruniec, Leonhard Helminger, Christopher Schroers, and Romann M. Weber. 2020. High-Resolution Neural Face Swapping for Visual Effects. Comput. Graph. Forum, Vol. 39, 4 (2020), 173--184.Google ScholarCross Ref
- Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In IEEE Conference on Computer Vision and Pattern Recognition. 2337--2346.Google Scholar
- Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics: Learning to Detect Manipulated Facial Images. In 2019 IEEE/CVF International Conference on Computer Vision. 1--11.Google ScholarCross Ref
- Nataniel Ruiz, Eunji Chong, and James M. Rehg. 2018. Fine-Grained Head Pose Estimation Without Keypoints. In 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2074--2083.Google Scholar
- Zhihua Shang, Hongtao Xie, Lingyun Yu, Zhengjun Zha, and Yongdong Zhang. 2023. Constructing Spatio-Temporal Graphs for Face Forgery Detection. ACM Transactions on the Web, Vol. 17, 3 (2023), 1--25.Google ScholarDigital Library
- Zhihua Shang, Hongtao Xie, Zhengjun Zha, Lingyun Yu, Yan Li, and Yongdong Zhang. 2021. PRRNet: Pixel-Region Relation Network for Face Forgery Detection. Pattern Recogn., Vol. 116, C (aug 2021), 10 pages.Google Scholar
- Luan Tran and Xiaoming Liu. 2018. Nonlinear 3D Face Morphable Model. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 7346--7355.Google ScholarCross Ref
- Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018b. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 5265--5274.Google Scholar
- Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 8798--8807.Google Scholar
- Yuhan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, and Rongrong Ji. 2021. HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. 1136--1142.Google ScholarCross Ref
- Erroll Wood, Tadas Baltrusaitis, Charlie Hewitt, Matthew Johnson, Jingjing Shen, Nikola Milosavljevic, Daniel Wilde, Stephan Garbin, Toby Sharp, Ivan Stojiljkovic, et al. 2022. 3D face reconstruction with dense landmarks. arXiv preprint arXiv:2204.02776 (2022).Google Scholar
- Chao Xu, Jiangning Zhang, Miao Hua, Qian He, Zili Yi, and Yong Liu. 2022b. Region-Aware Face Swapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7632--7641.Google ScholarCross Ref
- Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, and Shengfeng He. 2022a. High-resolution Face Swapping via Latent Semantics Disentanglement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7632--7641.Google ScholarCross Ref
- Zhiliang Xu, Xiyu Yu, Zhibin Hong, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding, and Xiang Bai. 2021. Facecontroller: Controllable attribute editing for face in the wild. In Proceedings of the AAAI Conference on Artificial Intelligence. 3083--3091.Google ScholarCross Ref
- Lingyun Yu, Hongtao Xie, and Yongdong Zhang. 2022. Multimodal Learning for Temporally Coherent Talking Face Generation With Articulator Synergy. IEEE Transactions on Multimedia, Vol. 24 (2022), 2950--2962.Google ScholarDigital Library
- Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters, Vol. 23, 10 (2016), 1499--1503.Google ScholarCross Ref
- Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5103--5112.Google Scholar
- Yuhao Zhu, Qi Li, Jian Wang, Cheng-Zhong Xu, and Zhenan Sun. 2021. One Shot Face Swapping on Megapixels. In IEEE Conference on Computer Vision and Pattern Recognition. 4834--4844.Google Scholar
Index Terms
- High Fidelity Face Swapping via Semantics Disentanglement and Structure Enhancement
Recommendations
SimSwap: An Efficient Framework For High Fidelity Face Swapping
MM '20: Proceedings of the 28th ACM International Conference on MultimediaWe propose an efficient framework, called Simple Swap (SimSwap), aiming for generalized and high fidelity face swapping. In contrast to previous approaches that either lack the ability to generalize to arbitrary identity or fail to preserve attributes ...
Quantum correlation swapping
Quantum correlations (QCs), including quantum entanglement and those different, are important quantum resources and have attracted much attention recently. Quantum entanglement swapping as a kernel technique has already been applied to quantum repeaters ...
RSFace: subject agnostic face swapping with expression high fidelity
AbstractFace swapping has shown remarkable progress with the flourishing development of deep learning. In particular, the emergence of subject agnostic methods has broadened the range of applications of face swapping. Furthermore, high fidelity ...
Comments