Fast face swapping with high-fidelity lightweight generator assisted by online knowledge distillation

Yang, Gaoming; Ding, Yifeng; Fang, Xianjin; Zhang, Ji; Chu, Yan

doi:10.1007/s00371-024-03414-2

Fast face swapping with high-fidelity lightweight generator assisted by online knowledge distillation

Research
Published: 14 May 2024

Volume 41, pages 1251–1271, (2025)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Gaoming Yang¹,
Yifeng Ding¹,
Xianjin Fang¹,
Ji Zhang² &
…
Yan Chu³

545 Accesses
2 Citations
Explore all metrics

Abstract

Advanced face swapping approaches have achieved high-fidelity results. However, the success of most methods hinges on heavy parameters and high-computational costs. With the popularity of real-time face swapping, these factors have become obstacles restricting their swap speed and application. To overcome these challenges, we propose a high-fidelity lightweight generator (HFLG) for face swapping, which is a compressed version of the existing network Simple Swap and consists of its 1/4 channels. Moreover, to stabilize the learning of HFLG, we introduce feature map-based online knowledge distillation into our training process and improve the teacher–student architecture. Specifically, we first enhance our teacher generator to provide more efficient guidance. It minimizes the loss of details on the lower face. In addition, a new identity-irrelevant similarity loss is proposed to improve the preservation of non-facial regions in the teacher generator results. Furthermore, HFLG uses an extended identity injection module to inject identity more efficiently. It gradually learns face swapping by imitating the feature maps and outputs of the teacher generator online. Extensive experiments on faces in the wild demonstrate that our method achieves comparable results with other methods while having fewer parameters, lower computations, and faster inference speed. The code is available at https://github.com/EifelTing/HFLFS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition

CoupleFace: Relation Matters for Face Recognition Distillation

Face to Cartoon Incremental Super-Resolution Using Knowledge Distillation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data are available from the corresponding author on reasonable request. Three datasets are used in our experiments, including VGGFace2, FaceForensics++, and CelebA-HQ. The VGGFace2 dataset is selected from https://www.robots.ox.ac.uk/~vgg/data/vgg_face2/, the FaceForensics++ dataset is selected from https://github.com/ondyari/FaceForensics, and the CelebA-HQ dataset is selected from https://github.com/switchablenorms/CelebAMask-HQ.

References

Nirkin Y., Keller Y., Hassner T.: Fsgan: Subject agnostic face swapping and reenactment. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 7184–7193 (2019). https://doi.org/10.1109/ICCV.2019.00728
Li L., Bao J., Yang H., Chen D., Wen F.: Advancing high fidelity identity swapping for forgery detection. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5074–5083 (2020). https://doi.org/10.1109/CVPR42600.2020.00512
Xu Z., Yu X., Hong Z., Zhu Z., Han J., Liu J., Ding E., Bai X.: Facecontroller: Controllable attribute editing for face in the wild. In: proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 3083–3091 (2021). https://doi.org/10.1609/aaai.v35i4.16417
Chen R., Chen X., Ni B., Ge Y.: Simswap: An efficient framework for high fidelity face swapping. In: proceedings of the 28th ACM international conference on multimedia, pp. 2003–2011 (2020). https://doi.org/10.1145/3394171.3413630
Xu Z., Hong Z., Ding C., Zhu Z., Han J., Liu J., Ding E.: Mobilefaceswap: A lightweight framework for video face swapping. In: proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 2973–2981 (2022). https://doi.org/10.1609/aaai.v36i3.20203
Brabandere, B.D., Jia, X., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. Adv. Neural. Inf. Process. Syst. 29, 667–675 (2016)
Google Scholar
Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., Aila T.: Analyzing and improving the image quality of stylegan. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 8110–8119 (2020). https://doi.org/10.1109/CVPR42600.2020.00813
Hinton G., Vinyals O., Dean J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:150302531. (2015)
Ren Y., Wu J., Xiao X., Yang J.: Online multi-granularity distillation for gan compression. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 6793–6803 (2021). https://doi.org/10.1109/ICCV48922.2021.00672
Hu T., Lin M., You L., Chao F., Ji R.: Discriminator-cooperated feature map distillation for GAN compression. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 20351–20360 (2023). https://doi.org/10.1109/CVPR52729.2023.01949
Yuan G., Li M., Zhang Y., Zheng H.: ReliableSwap: boosting general face swapping via reliable supervision. arXiv preprint arXiv:230605356. (2023)
Deng J., Guo J., Xue N., Zafeiriou S.: Arcface: Additive angular margin loss for deep face recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 4690–4699 (2019). https://doi.org/10.1109/TPAMI.2021.3087709
Blanz V., Vetter T.: A morphable model for the synthesis of 3D faces. In: proceedings of the 26th annual conference on computer graphics and interactive techniques, (1999)
Nirkin Y., Masi I., Tuan A.T., Hassner T., Medioni G.: On face segmentation, face swapping, and face perception. In: 2018 13th IEEE international conference on automatic face gesture recognition (FG), pp. 98–105 (2018). https://doi.org/10.1109/FG.2018.00024
Wang Y., Chen X., Zhu J., Chu W., Tai Y., Wang C., Li J., Wu Y., Huang F., Ji R.: HifiFace: 3D shape and semantic prior guided high fidelity face swapping. In: proceedings of the thirtieth international joint conference on artificial intelligence (IJCAI), pp. 1136–1142 (2021). https://doi.org/10.24963/ijcai.2021/157
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2672–2680 (2014)
MATH Google Scholar
Nirkin, Y., Keller, Y., Hassner, T.: FSGANv2: Improved subject agnostic face swapping and reenactment. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 560–575 (2022). https://doi.org/10.1109/TPAMI.2022.3155571
Article MATH Google Scholar
Huang X., Belongie S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: proceedings of the IEEE international conference on computer vision (ICCV), pp. 1501–1510 (2017). https://doi.org/10.1109/ICCV.2017.167
Zhu Y., Li Q., Wang J., Xu C.-Z., Sun Z.: One shot face swapping on megapixels. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 4834–4844 (2021). https://doi.org/10.1109/CVPR46437.2021.00480
Xu Y., Deng B., Wang J., Jing Y., Pan J., He S.: High-resolution face swapping via latent semantics disentanglement. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 7642–7651 (2022). https://doi.org/10.1109/CVPR52688.2022.00749
Chen P., Liu S., Zhao H., Jia J.: Distilling knowledge via knowledge review. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5008–5017 (2021). https://doi.org/10.1109/CVPR46437.2021.00497
Heo B., Kim J., Yun S., Park H., Kwak N., Choi J.Y.: A comprehensive overhaul of feature distillation. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 1921–1930 (2019). https://doi.org/10.1109/ICCV.2019.00201
Li M., Lin J., Ding Y., Liu Z., Zhu J.-Y., Han S.: Gan compression: Efficient architectures for interactive conditional gans. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5284–5294 (2020). https://doi.org/10.1109/tpami.2021.3126742
Zhang L., Chen X., Tu X., Wan P., Xu N., Ma K.: Wavelet knowledge distillation: Towards efficient image-to-image translation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 12464–12474 (2022). https://doi.org/10.1109/CVPR52688.2022.01214
Ioffe S., Szegedy C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: proceedings of the international conference on machine learning (ICML), pp. 448–456 (2015)
He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Wang T.-C., Liu M.-Y., Zhu J.-Y., Tao A., Kautz J., Catanzaro B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 8798–8807 (2018). https://doi.org/10.1109/CVPR.2018.00917
Guo Y., Zhang L., Hu Y., He X., Gao J.: Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: proceedings of the European conference on computer vision (ECCV), pp. 87–102 (2016). https://doi.org/10.1007/978-3-319-46487-9_6
Andrew Brock J.D., Karen Simonyan: Large scale GAN training for high fidelity natural image synthesis. In: proceedings of the international conference on learning representations (ICLR), (2019)
Liu M.-Y., Huang X., Mallya A., Karras T., Aila T., Lehtinen J., Kautz J.: Few-shot unsupervised image-to-image translation. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 10551–10560 (2019). https://doi.org/10.1109/ICCV.2019.01065
Park T., Liu M.-Y., Wang T.-C., Zhu J.-Y.: Semantic image synthesis with spatially-adaptive normalization. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 2337–2346 (2019). https://doi.org/10.1109/CVPR.2019.00244
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. Adv. Neural. Inf. Process. Syst. 30, 5769–5779 (2017). https://papers.nips.cc/paper/7159-improvedtraining-of-wasserstein-gans
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
Article MATH Google Scholar
Johnson J., Alahi A., Fei-Fei L.: Perceptual losses for real-time style transfer and super-resolution. In: proceedings of the european conference on computer vision (ECCV), pp. 694–711 (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Simonyan K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. (2014). https://doi.org/10.48550/arXiv.1409.1556
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992). https://doi.org/10.1016/0167-2789(92)90242-F
Article MathSciNet MATH Google Scholar
zllrunning. face-parsing.pytorch. (2019). https://github.com/zllrunning/face-parsing.PyTorch
Cao Q., Shen L., Xie W., Parkhi O.M., Zisserman A.: Vggface2: A dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face gesture recognition (FG), pp. 67–74 (2018). https://doi.org/10.1109/FG.2018.00020
Rossler A., Cozzolino D., Verdoliva L., Riess C., Thies J., Nießner M.: Faceforensics++: Learning to detect manipulated facial images. In: proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 1–11 (2019). https://doi.org/10.1109/ICCV.2019.00009
Tero Karras T.A., Samuli Laine, Jaakko Lehtinen: Progressive growing of gans for improved quality, stability, and variation. In: proceedings of the international conference on learning representations (LCLR), pp. 26 (2018)
Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D.H., Hawk, S.T., Van Knippenberg, A.: Presentation and validation of the radboud faces database. Cogn. Emot. 24(8), 1377–1388 (2010). https://doi.org/10.1080/02699930903485076
Article MATH Google Scholar
DeepFakes. https://github.com/ondyari/FaceForensics/tree/master/dataset/DeepFakes
Rosberg F., Aksoy E.E., Alonso-Fernandez F., Englund C.: FaceDancer: Pose-and occlusion-aware high fidelity face swapping. In: proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp. 3454–3463 (2023). https://doi.org/10.1109/WACV56688.2023.00345
Liu Z., Li M., Zhang Y., Wang C., Zhang Q., Wang J., Nie Y.: Fine-grained face swapping via regional GAN inversion. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 8578–8587 (2023). https://doi.org/10.1109/CVPR52729.2023.00829
Karras T., Laine S., Aila T.: A style-based generator architecture for generative adversarial networks. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 4401–4410 (2019)
Wang H., Wang Y., Zhou Z., Ji X., Gong D., Zhou J., Li Z., Liu W.: Cosface: Large margin cosine loss for deep face recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 5265–5274 (2018). https://doi.org/10.1109/CVPR.2018.00552
Ruiz N., Chong E., Rehg J.M.: Fine-grained head pose estimation without keypoints. In: proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 2074–2083 (2018). https://doi.org/10.1109/CVPRW.2018.00281
Deng Y., Yang J., Xu S., Chen D., Jia Y., Tong X.: Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 0–0 (2019). https://doi.org/10.1109/CVPRW.2019.00038
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural. Inf. Process. Syst. (2017). https://doi.org/10.18034/ajase.v8i1.9
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Anhui Natural Science Foundation of China under grant number 2308085MF218, in part by the Academic Funding Project for Top Talents in University Disciplines under grant number gxbjZD2021050, in part by the Anhui Provincial Higher Education Institutions Scientific Research Project under grant number 2022AH040113, and in part by the Anhui University of Science and Technology 2023 Graduate Innovation Fund Project under grant number 2023cx2136.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, China
Gaoming Yang, Yifeng Ding & Xianjin Fang
School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, Australia
Ji Zhang
College of Computer Science and Technology, Harbin Engineering University, Harbin, China
Yan Chu

Authors

Gaoming Yang
View author publications
You can also search for this author inPubMed Google Scholar
Yifeng Ding
View author publications
You can also search for this author inPubMed Google Scholar
Xianjin Fang
View author publications
You can also search for this author inPubMed Google Scholar
Ji Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Yan Chu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Yifeng Ding performed the experiments and analyzed the data. Yifeng Ding wrote the original manuscript. All authors reviewed and edited the manuscript.

Corresponding author

Correspondence to Yifeng Ding.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This study was conducted with the highest regard for ethical standards and following relevant guidelines and regulations. While no ethical review or approval was necessary for this particular study, the principles of academic integrity and research ethics were strictly adhered to throughout the research process.

Human and animal rights

The research protocol did not require ethical review or approval as it did not involve human participants, animals, or sensitive data. All data used in this study were obtained from publicly available sources and were properly cited and acknowledged. No private or personally identifiable information was used or accessed during this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, G., Ding, Y., Fang, X. et al. Fast face swapping with high-fidelity lightweight generator assisted by online knowledge distillation. Vis Comput 41, 1251–1271 (2025). https://doi.org/10.1007/s00371-024-03414-2

Download citation

Accepted: 08 April 2024
Published: 14 May 2024
Issue Date: January 2025
DOI: https://doi.org/10.1007/s00371-024-03414-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast face swapping with high-fidelity lightweight generator assisted by online knowledge distillation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition

CoupleFace: Relation Matters for Face Recognition Distillation

Face to Cartoon Incremental Super-Resolution Using Knowledge Distillation

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now