Abstract
Advanced face swapping approaches have achieved high-fidelity results. However, the success of most methods hinges on heavy parameters and high-computational costs. With the popularity of real-time face swapping, these factors have become obstacles restricting their swap speed and application. To overcome these challenges, we propose a high-fidelity lightweight generator (HFLG) for face swapping, which is a compressed version of the existing network Simple Swap and consists of its 1/4 channels. Moreover, to stabilize the learning of HFLG, we introduce feature map-based online knowledge distillation into our training process and improve the teacher–student architecture. Specifically, we first enhance our teacher generator to provide more efficient guidance. It minimizes the loss of details on the lower face. In addition, a new identity-irrelevant similarity loss is proposed to improve the preservation of non-facial regions in the teacher generator results. Furthermore, HFLG uses an extended identity injection module to inject identity more efficiently. It gradually learns face swapping by imitating the feature maps and outputs of the teacher generator online. Extensive experiments on faces in the wild demonstrate that our method achieves comparable results with other methods while having fewer parameters, lower computations, and faster inference speed. The code is available at https://github.com/EifelTing/HFLFS.




















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data are available from the corresponding author on reasonable request. Three datasets are used in our experiments, including VGGFace2, FaceForensics++, and CelebA-HQ. The VGGFace2 dataset is selected from https://www.robots.ox.ac.uk/~vgg/data/vgg_face2/, the FaceForensics++ dataset is selected from https://github.com/ondyari/FaceForensics, and the CelebA-HQ dataset is selected from https://github.com/switchablenorms/CelebAMask-HQ.
References
Nirkin Y., Keller Y., Hassner T.: Fsgan: Subject agnostic face swapping and reenactment. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 7184–7193 (2019). https://doi.org/10.1109/ICCV.2019.00728
Li L., Bao J., Yang H., Chen D., Wen F.: Advancing high fidelity identity swapping for forgery detection. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5074–5083 (2020). https://doi.org/10.1109/CVPR42600.2020.00512
Xu Z., Yu X., Hong Z., Zhu Z., Han J., Liu J., Ding E., Bai X.: Facecontroller: Controllable attribute editing for face in the wild. In: proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 3083–3091 (2021). https://doi.org/10.1609/aaai.v35i4.16417
Chen R., Chen X., Ni B., Ge Y.: Simswap: An efficient framework for high fidelity face swapping. In: proceedings of the 28th ACM international conference on multimedia, pp. 2003–2011 (2020). https://doi.org/10.1145/3394171.3413630
Xu Z., Hong Z., Ding C., Zhu Z., Han J., Liu J., Ding E.: Mobilefaceswap: A lightweight framework for video face swapping. In: proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 2973–2981 (2022). https://doi.org/10.1609/aaai.v36i3.20203
Brabandere, B.D., Jia, X., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. Adv. Neural. Inf. Process. Syst. 29, 667–675 (2016)
Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., Aila T.: Analyzing and improving the image quality of stylegan. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 8110–8119 (2020). https://doi.org/10.1109/CVPR42600.2020.00813
Hinton G., Vinyals O., Dean J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:150302531. (2015)
Ren Y., Wu J., Xiao X., Yang J.: Online multi-granularity distillation for gan compression. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 6793–6803 (2021). https://doi.org/10.1109/ICCV48922.2021.00672
Hu T., Lin M., You L., Chao F., Ji R.: Discriminator-cooperated feature map distillation for GAN compression. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 20351–20360 (2023). https://doi.org/10.1109/CVPR52729.2023.01949
Yuan G., Li M., Zhang Y., Zheng H.: ReliableSwap: boosting general face swapping via reliable supervision. arXiv preprint arXiv:230605356. (2023)
Deng J., Guo J., Xue N., Zafeiriou S.: Arcface: Additive angular margin loss for deep face recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 4690–4699 (2019). https://doi.org/10.1109/TPAMI.2021.3087709
Blanz V., Vetter T.: A morphable model for the synthesis of 3D faces. In: proceedings of the 26th annual conference on computer graphics and interactive techniques, (1999)
Nirkin Y., Masi I., Tuan A.T., Hassner T., Medioni G.: On face segmentation, face swapping, and face perception. In: 2018 13th IEEE international conference on automatic face gesture recognition (FG), pp. 98–105 (2018). https://doi.org/10.1109/FG.2018.00024
Wang Y., Chen X., Zhu J., Chu W., Tai Y., Wang C., Li J., Wu Y., Huang F., Ji R.: HifiFace: 3D shape and semantic prior guided high fidelity face swapping. In: proceedings of the thirtieth international joint conference on artificial intelligence (IJCAI), pp. 1136–1142 (2021). https://doi.org/10.24963/ijcai.2021/157
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2672–2680 (2014)
Nirkin, Y., Keller, Y., Hassner, T.: FSGANv2: Improved subject agnostic face swapping and reenactment. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 560–575 (2022). https://doi.org/10.1109/TPAMI.2022.3155571
Huang X., Belongie S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: proceedings of the IEEE international conference on computer vision (ICCV), pp. 1501–1510 (2017). https://doi.org/10.1109/ICCV.2017.167
Zhu Y., Li Q., Wang J., Xu C.-Z., Sun Z.: One shot face swapping on megapixels. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 4834–4844 (2021). https://doi.org/10.1109/CVPR46437.2021.00480
Xu Y., Deng B., Wang J., Jing Y., Pan J., He S.: High-resolution face swapping via latent semantics disentanglement. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 7642–7651 (2022). https://doi.org/10.1109/CVPR52688.2022.00749
Chen P., Liu S., Zhao H., Jia J.: Distilling knowledge via knowledge review. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5008–5017 (2021). https://doi.org/10.1109/CVPR46437.2021.00497
Heo B., Kim J., Yun S., Park H., Kwak N., Choi J.Y.: A comprehensive overhaul of feature distillation. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 1921–1930 (2019). https://doi.org/10.1109/ICCV.2019.00201
Li M., Lin J., Ding Y., Liu Z., Zhu J.-Y., Han S.: Gan compression: Efficient architectures for interactive conditional gans. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5284–5294 (2020). https://doi.org/10.1109/tpami.2021.3126742
Zhang L., Chen X., Tu X., Wan P., Xu N., Ma K.: Wavelet knowledge distillation: Towards efficient image-to-image translation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 12464–12474 (2022). https://doi.org/10.1109/CVPR52688.2022.01214
Ioffe S., Szegedy C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: proceedings of the international conference on machine learning (ICML), pp. 448–456 (2015)
He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Wang T.-C., Liu M.-Y., Zhu J.-Y., Tao A., Kautz J., Catanzaro B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 8798–8807 (2018). https://doi.org/10.1109/CVPR.2018.00917
Guo Y., Zhang L., Hu Y., He X., Gao J.: Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: proceedings of the European conference on computer vision (ECCV), pp. 87–102 (2016). https://doi.org/10.1007/978-3-319-46487-9_6
Andrew Brock J.D., Karen Simonyan: Large scale GAN training for high fidelity natural image synthesis. In: proceedings of the international conference on learning representations (ICLR), (2019)
Liu M.-Y., Huang X., Mallya A., Karras T., Aila T., Lehtinen J., Kautz J.: Few-shot unsupervised image-to-image translation. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 10551–10560 (2019). https://doi.org/10.1109/ICCV.2019.01065
Park T., Liu M.-Y., Wang T.-C., Zhu J.-Y.: Semantic image synthesis with spatially-adaptive normalization. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 2337–2346 (2019). https://doi.org/10.1109/CVPR.2019.00244
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. Adv. Neural. Inf. Process. Syst. 30, 5769–5779 (2017). https://papers.nips.cc/paper/7159-improvedtraining-of-wasserstein-gans
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
Johnson J., Alahi A., Fei-Fei L.: Perceptual losses for real-time style transfer and super-resolution. In: proceedings of the european conference on computer vision (ECCV), pp. 694–711 (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Simonyan K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. (2014). https://doi.org/10.48550/arXiv.1409.1556
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992). https://doi.org/10.1016/0167-2789(92)90242-F
zllrunning. face-parsing.pytorch. (2019). https://github.com/zllrunning/face-parsing.PyTorch
Cao Q., Shen L., Xie W., Parkhi O.M., Zisserman A.: Vggface2: A dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face gesture recognition (FG), pp. 67–74 (2018). https://doi.org/10.1109/FG.2018.00020
Rossler A., Cozzolino D., Verdoliva L., Riess C., Thies J., Nießner M.: Faceforensics++: Learning to detect manipulated facial images. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 1–11 (2019). https://doi.org/10.1109/ICCV.2019.00009
Tero Karras T.A., Samuli Laine, Jaakko Lehtinen: Progressive growing of gans for improved quality, stability, and variation. In: proceedings of the international conference on learning representations (LCLR), pp. 26 (2018)
Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D.H., Hawk, S.T., Van Knippenberg, A.: Presentation and validation of the radboud faces database. Cogn. Emot. 24(8), 1377–1388 (2010). https://doi.org/10.1080/02699930903485076
DeepFakes. https://github.com/ondyari/FaceForensics/tree/master/dataset/DeepFakes
Rosberg F., Aksoy E.E., Alonso-Fernandez F., Englund C.: FaceDancer: Pose-and occlusion-aware high fidelity face swapping. In: proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp. 3454–3463 (2023). https://doi.org/10.1109/WACV56688.2023.00345
Liu Z., Li M., Zhang Y., Wang C., Zhang Q., Wang J., Nie Y.: Fine-grained face swapping via regional GAN inversion. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 8578–8587 (2023). https://doi.org/10.1109/CVPR52729.2023.00829
Karras T., Laine S., Aila T.: A style-based generator architecture for generative adversarial networks. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 4401–4410 (2019)
Wang H., Wang Y., Zhou Z., Ji X., Gong D., Zhou J., Li Z., Liu W.: Cosface: Large margin cosine loss for deep face recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 5265–5274 (2018). https://doi.org/10.1109/CVPR.2018.00552
Ruiz N., Chong E., Rehg J.M.: Fine-grained head pose estimation without keypoints. In: proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 2074–2083 (2018). https://doi.org/10.1109/CVPRW.2018.00281
Deng Y., Yang J., Xu S., Chen D., Jia Y., Tong X.: Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 0–0 (2019). https://doi.org/10.1109/CVPRW.2019.00038
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural. Inf. Process. Syst. (2017). https://doi.org/10.18034/ajase.v8i1.9
Acknowledgements
This work was supported in part by the Anhui Natural Science Foundation of China under grant number 2308085MF218, in part by the Academic Funding Project for Top Talents in University Disciplines under grant number gxbjZD2021050, in part by the Anhui Provincial Higher Education Institutions Scientific Research Project under grant number 2022AH040113, and in part by the Anhui University of Science and Technology 2023 Graduate Innovation Fund Project under grant number 2023cx2136.
Author information
Authors and Affiliations
Contributions
Yifeng Ding performed the experiments and analyzed the data. Yifeng Ding wrote the original manuscript. All authors reviewed and edited the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
This study was conducted with the highest regard for ethical standards and following relevant guidelines and regulations. While no ethical review or approval was necessary for this particular study, the principles of academic integrity and research ethics were strictly adhered to throughout the research process.
Human and animal rights
The research protocol did not require ethical review or approval as it did not involve human participants, animals, or sensitive data. All data used in this study were obtained from publicly available sources and were properly cited and acknowledged. No private or personally identifiable information was used or accessed during this research.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, G., Ding, Y., Fang, X. et al. Fast face swapping with high-fidelity lightweight generator assisted by online knowledge distillation. Vis Comput 41, 1251–1271 (2025). https://doi.org/10.1007/s00371-024-03414-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-024-03414-2