EnNeRFACE: improving the generalization of face reenactment with adaptive ensemble neural radiance fields

Yang, Shuai; Qiao, Kai; Shi, Shuhao; Wang, Linyuan; Hu, Guoen; Yan, Bin; Chen, Jian

doi:10.1007/s00371-022-02709-6

EnNeRFACE: improving the generalization of face reenactment with adaptive ensemble neural radiance fields

Original article
Published: 31 October 2022

Volume 39, pages 6015–6028, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Shuai Yang¹,
Kai Qiao¹,
Shuhao Shi¹,
Linyuan Wang¹,
Guoen Hu¹,
Bin Yan¹ &
…
Jian Chen¹

362 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Face reenactment is a critical technology of digital face editing. Lately, the NeRFACE, a face reenactment method based on neural radiance fields, has been proposed, making the reconstruction accuracy of the training dataset much higher than the previous methods. However, face reenactment in realistic scenes often encounters poses and expressions that have not been seen before, which requires further improvement of the model’s generalization capability. Based on the idea of ensemble learning, we present EnNeRFACE as using the adaptive ensemble neural radiance fields architecture, which is mainly composed of a set of subgenerators and a controller. We divide the short video of human portraits into non-intersecting sub-datasets based on time correlation, thus enabling the trained subgenerators to have differentiated modeling capabilities. In response to different expression vectors, the generator dynamically adjusts the weights assigned to each generator so that the capabilities of the subgenerators are adequately exploited. Extensive experiments show that EnNeRFACE has more stable and superior performance in generalization (i.e., identity preservation, manipulation of expression and pose) than the state-of-the-art methods, demonstrating the effectiveness of our proposed adaptive ensemble structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A deep attention-based ensemble network for real-time face hallucination

Article 17 August 2020

CNN based facial aesthetics analysis through dynamic robust losses and ensemble regression

Article Open access 26 August 2022

Ensemble of Convolutional Neural Networks for Face Recognition

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

References

Chen, Y., Liu, L., Phonevilay, V., Gu, K., Xia, R., Xie, J., Zhang, Q., Yang, K.: Image super-resolution reconstruction based on feature map attention mechanism. Appl. Intell. 51(7), 4367–4380 (2021). https://doi.org/10.1007/s10489-020-02116-1
Article Google Scholar
Zhang, J., Feng, W., Yuan, T., Wang, J., Sangaiah, A.K.: SCSTCF: spatial-channel selection and temporal regularized correlation filters for visual tracking. Appl. Soft Comput. 118, 108485 (2022). https://doi.org/10.1016/j.asoc.2022.108485
Article Google Scholar
Li, P., Chen, Y.: Research into an image inpainting algorithm via multilevel attention progression mechanism. Math. Probl. Eng. (2022). https://doi.org/10.1155/2022/8508702
Article Google Scholar
Xia, R., Chen, Y., Ren, B.: Improved anti-occlusion object tracking algorithm using Unscented Rauch–Tung–Striebel smoother and kernel correlation filter. J. King Saud Univ. Comput. Inf. Sci. (2022). https://doi.org/10.1016/j.jksuci.2022.02.004
Article Google Scholar
Upchurch, P., Gardner, J.R., Pleiss, G., Pless, R., Snavely, N., Bala, K., Weinberger, K.Q.: Deep feature interpolation for image content changes. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 6090–6099. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.645
Chien, A.A.: Face2Face: real-time face capture and reenactment of RGB videos. Commun. ACM 62(1), 96–104 (2019). https://doi.org/10.1145/3292039. arxiv:2007.14808
Article Google Scholar
Wu, C., Bradley, D., Garrido, P., Zollhöfer, M., Theobalt, C., Gross, M.H., Beeler, T.: Model-based teeth reconstruction. ACM Trans. Graph. 35(6), 220–122013 (2016). https://doi.org/10.1145/2980179.2980233
Article Google Scholar
Wu, W., Zhang, Y., Li, C., Qian, C., Loy, C.C.: ReenactGAN: Learning to Reenact Faces Via Boundary Transfer. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11205 LNCS, pp. 622–638. https://doi.org/10.1007/978-3-030-01246-5_37. arxiv:1807.11079 (2018)
Nirkin, Y., Keller, Y., Hassner, T.: FSGAN: subject agnostic face swapping and reenactment. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2019-October, pp. 7183–7192 (2019). https://doi.org/10.1109/ICCV.2019.00728
Wang, Y., Bilinski, P., Bremond, F., Dantcheva, A.: ImaGINator: conditional spatio-temporal GAN for video generation. In: Proceedings—2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, pp. 1149–1158 (2020). https://doi.org/10.1109/WACV45572.2020.9093492
Siarohin, A., Lathuiliere, S., Tulyakov, S., Ricci, E., Sebe, N.: Animating arbitrary objects via deep motion transfer. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, pp. 2372–2381 (2019). https://doi.org/10.1109/CVPR.2019.00248
Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: First order motion model for image animation. Adv. Neural. Inf. Process. Syst. 32(NeurIPS), 1–11 (2019). arxiv:2003.00196
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 77–85 (2017). https://doi.org/10.1109/CVPR.2017.16
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural. Inf. Process. Syst. 2017(December), 5100–5109 (2017)
Google Scholar
Baque, P., Remelli, E., Fleuret, F., Fua, P.: Geodesic convolutional shape optimization. In: 35th International Conference on Machine Learning, ICML 2018, vol. 2, pp. 797–809 (2018)
Maron, H., Galun, M., Aigerman, N., Trope, M., Dym, N., Yumer, E., Kim, V.G., Lipman, Y.: Convolutional neural networks on surfaces via seamless toric covers. ACM Trans. Graph. (2017). https://doi.org/10.1145/3072959.3073616
Article Google Scholar
Sinha, A., Bai, J., Ramani, K.: Deep Learning 3D Shape Surfaces Using Geometry Images. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9910 LNCS, pp. 223–240 (2016). https://doi.org/10.1007/978-3-319-46466-4_14
Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: interpretable unsupervised learning on 3D point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 22, pp. 1–32 (2018)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07-12-June, pp. 1912–1920 (2015). https://doi.org/10.1109/CVPR.2015.7298801
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-October, pp. 2107–2115 (2017). https://doi.org/10.1109/ICCV.2017.230
Hane, C., Tulsiani, S., Malik, J.: Hierarchical surface prediction for 3D object reconstruction. In: Proceedings—2017 International Conference on 3D Vision, 3DV 2017, pp. 412–420 (2018). https://doi.org/10.1109/3DV.2017.00054
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12346 LNCS, pp. 405–421 (2020). https://doi.org/10.1007/978-3-030-58452-8_24. arxiv:2003.08934
Liu, L., Gu, J., Lin, K.Z., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. In: Advances in Neural Information Processing Systems, vol. 2020 (2020)
Lindell, D.B., Martel, J.N.P., Wetzstein, G.: Autoint: Automatic integration for fast neural volume rendering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 14551–14560 (2021). https://doi.org/10.1109/CVPR46437.2021.01432
Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. 40(4), 1–13 (2021). https://doi.org/10.1145/3476576.3476608
Article Google Scholar
Deng, K., Liu, A., Zhu, J.-Y., Ramanan, D.: Depth-supervised nerf: Fewer views and faster training for free. arXiv preprint arXiv:2107.02791 (2021)
Sun, C., Sun, M., Chen, H.-T.: Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. arXiv preprint arXiv:2111.11215 (2021)
Jeong, Y., Ahn, S., Choy, C.B., Anandkumar, A., Cho, M., Park, J.: Self-calibrating neural radiance fields. In: ICCV, pp. 5826–5834. IEEE (2021)
Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., Seitz, S.M., Martin-Brualla, R.: Nerfies: Deformable neural radiance fields. In: ICCV, pp. 5845–5854. IEEE (2021)
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: Neural Radiance Fields for Dynamic Scenes. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 10313–10322 (2021). https://doi.org/10.1109/CVPR46437.2021.01018
Gafni, G., Thies, J., Zollhöfer, M., Nießner, M.: Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8645–8654 (2021). https://doi.org/10.1109/CVPR46437.2021.00854. arxiv:2012.03065
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)
Article Google Scholar
Yang, J., Zeng, X., Zhong, S., Wu, S.: Effective neural network ensemble approach for improving generalization performance. IEEE Trans. Neural Netw. Learn. Syst. 24(6), 878–887 (2013). https://doi.org/10.1109/TNNLS.2013.2246578
Article Google Scholar
Luo, Y., Wu, M., Huang, Q., Zhu, J., Ling, J., Sheng, B.: Joint feedback and recurrent deraining network with ensemble learning. Vis. Comput. 38, 1–11 (2022)
Article Google Scholar
Guo, H., Liu, Y., Yang, D., Zhao, J.: Offline handwritten tai le character recognition using ensemble deep learning. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02230-2
Agrawal, S.C., Jalal, A.S.: Distortion-free image dehazing by superpixels and ensemble neural network. Vis. Comput. 38(3), 781–796 (2022)
Article Google Scholar
Reiser, C., Peng, S., Liao, Y., Geiger, A.: Kilonerf: Speeding up neural radiance fields with thousands of tiny MLPS. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 14315–14325. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.01407
Guo, Y., Chen, K., Liang, S., Liu, Y., Bao, H., Zhang, J.: Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 5764–5774. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.00573
Wang, Z., Bagautdinov, T.M., Lombardi, S., Simon, T., Saragih, J.M., Hodgins, J.K., Zollhöfer, M.: Learning compositional radiance fields of dynamic human heads. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19–25, 2021, pp. 5704–5713. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00565. https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Learning_Compositional_Radiance_Fields_of_Dynamic_Human_Heads_CVPR_2021_paper.html
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-October, pp. 2242–2251 (2017). https://doi.org/10.1109/ICCV.2017.244
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 5967–5976 (2017). https://doi.org/10.1109/CVPR.2017.632
Kajiya, J.T., Von Herzen, B.P.: Ray tracing volume densities. Comput. Graph. (ACM) 18(3), 165–174 (1984). https://doi.org/10.1145/964965.808594
Article Google Scholar
Rahaman, N., Baratin, A., Arpit, D., Draxlcr, F., Lin, M., Hamprecht, F.A., Bengio, Y., Courville, A.: On the spectral bias of neural networks. In: 36th International Conference on Machine Learning, ICML 2019, vol. 2019-June, pp. 9230–9239 (2019)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, pp. 1–15 (2015)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
Article Google Scholar
Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., Li, J., Huang, F.: Curricularface: adaptive curriculum learning loss for deep face recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5900–5909 (2020). https://doi.org/10.1109/CVPR42600.2020.00594
Mallick, S.: Head pose estimation using OpenCV and Dlib. http://www.learnopencv.com/head-pose-estimation-using-opencv-and-dlib/ (2016)
Guo, J., Zhu, X., Yang, Y., Yang, F., Lei, Z., Li, S.Z.: Towards Fast, Accurate and Stable 3D Dense Face Alignment, vol. 12364 LNCS, pp. 152–168 (2020). https://doi.org/10.1007/978-3-030-58529-7_10
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 102–110215 (2022). https://doi.org/10.1145/3528223.3530127
Article Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Waggenspack, W.N. (ed.) Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1999, Los Angeles, CA, USA, 8–13 August 1999, pp. 187–194. ACM (1999). https://dl.acm.org/citation.cfm?id=311556
Lee, C., Liu, Z., Wu, L., Luo, P.: Maskgan: towards diverse and interactive facial image manipulation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 5548–5557. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00559. https://openaccess.thecvf.com/content_CVPR_2020/html/Lee_MaskGAN_Towards_Diverse_and_Interactive_Facial_Image_Manipulation_CVPR_2020_paper.html

Download references

Author information

Authors and Affiliations

Henan Key Laboratory of Imaging and Intelligence Processing, PLA Strategy Support Force Information Engineering University, Science Avenue, Zhengzhou, 450001, Henan, China
Shuai Yang, Kai Qiao, Shuhao Shi, Linyuan Wang, Guoen Hu, Bin Yan & Jian Chen

Authors

Shuai Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Shuhao Shi
View author publications
You can also search for this author in PubMed Google Scholar
Linyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guoen Hu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jian Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Chen.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, S., Qiao, K., Shi, S. et al. EnNeRFACE: improving the generalization of face reenactment with adaptive ensemble neural radiance fields. Vis Comput 39, 6015–6028 (2023). https://doi.org/10.1007/s00371-022-02709-6

Download citation

Accepted: 16 October 2022
Published: 31 October 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00371-022-02709-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EnNeRFACE: improving the generalization of face reenactment with adaptive ensemble neural radiance fields

Abstract

Access this article

Similar content being viewed by others

A deep attention-based ensemble network for real-time face hallucination

CNN based facial aesthetics analysis through dynamic robust losses and ensemble regression

Ensemble of Convolutional Neural Networks for Face Recognition

Data Availability Statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

EnNeRFACE: improving the generalization of face reenactment with adaptive ensemble neural radiance fields

Abstract

Access this article

Similar content being viewed by others

A deep attention-based ensemble network for real-time face hallucination

CNN based facial aesthetics analysis through dynamic robust losses and ensemble regression

Ensemble of Convolutional Neural Networks for Face Recognition

Data Availability Statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation