Skip to main content
Log in

3D-2D deep convolutional neural network (DCNN) Cascade for robust video face identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a novel video face identification method, named “3D-2D-DCNN cascade” that serially combines 3D and 2D deep convolutional neural networks (DCNNs) for robust video face recognition (FR). In our method, an input video (face) sequence is first divided into a number of sub-video sequences and each of the sub-video sequences is then used as an input to the 3D-DCNN, aiming to obtain a set of class-confidence scores for a given input video sequence. These class-confidence scores are aggregated in a novel way, resulting in the formation of our novel class-confidence matrix. Key characteristic of our method is to make use of this class-confidence matrix for fine-tuning 2D-DCNN, which is serially linked to 3D-DCNN, to obtain the final face identification results. To verify the proposed method, two popular video identification benchmarks, COX Face and YTC databases, were used. Compared to the best reported recognition results on these two benchmarks, our proposed method achieves better or comparable recognition performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Deng J, et al (2018) Arcface: additive angular margin loss for deep face recognition. In: arXiv preprint arXiv:1801.07698

    Google Scholar 

  2. Ding C, Tao D (2018) Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Trans Pattern Anal Mach Intell 40(4):1002–1014

    Article  Google Scholar 

  3. Glorot X, Bengio Y (2010) International conference on artificial intelligence and statistics. In: Understanding the difficulty of training deep feedforward neural networks, pp 249–256

    Google Scholar 

  4. Gong S, Yichun S, Jain AK (2019) Video face recognition: component-wise feature aggregation network (C-FAN). arXiv preprint arXiv:1902.07327

  5. Goyal P, et al (2017) Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:1706.02677

  6. GU J, et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377

    Article  Google Scholar 

  7. Hayat M, Bennamoun M, An S (2015) Deep reconstruction models for image set classification. IEEE Trans Pattern Anal Mach Intell 37(4):713–727

    Article  Google Scholar 

  8. Hernández-Durán M, Plasencia-Calaña Y, Méndez-Vázquez H (2018) Low-resolution face recognition with deep convolutional features in the dissimilarity space. International Workshop on Artificial Intelligence and Pattern Recognition, pp 95–103

  9. Huang Z, Wang R, Shan S, Chen X (2014) Learning euclidean-to-riemannian metric for point-to-set classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:1677–1684

  10. Huang Z, Shan S, Wang R, Zhang H, Lao S, Kuerban A, Chen X (2015) A benchmark and comparative study of video-based face recognition on cox face database. IEEE Trans Image Process 24(12):5967–5981

    Article  MathSciNet  Google Scholar 

  11. Huang Z, Wang R, Shan S, Chen X (2015) Projection metric learning on Grassmann manifold with application to video based face recognition. Proc IEEE Conf Comput Vis Pattern Recognit:140–149

  12. Intra-Face (2013) http://humansensing.cs.cmu.edu/intraface. Accessed June, 23, 2017

  13. Jia X, et al (2018) Highly scalable deep learning training system with mixed-precision: training imagenet in four minutes. In: arXiv:1807.11205

    Google Scholar 

  14. Karpathy A, et al (2014) Large-scale video classification with convolutional neural networks. Proc IEEE Conf Comput Vis Pattern Recognit:1725–1732

  15. Keskar NS, et al (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: arXiv:1609.04836

    Google Scholar 

  16. Kim M, Kumar S, Pavlovic V, Rowley H (2008) Face tracking and recognition with visual constraints in real-world videos. In IEEE Conf Computer Vision and Pattern Recognition:1–8

  17. Liao X, Li K, Zhu X, Liu KR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing:1–1

  18. Lu J, Wang G, Deng W, Moulin P, Zhou J (2015) Multimanifold deep metric learning for image set classification. Proc IEEE Conf Comput Vis Pattern Recognit:1137–1145

  19. Lu J, Wang G, Moulin P (2016) Localized multifeature metric learning for image-set-based face recognition. IEEE Transactions on Circuits and Systems for Video Technology 26(3):529–540

    Article  Google Scholar 

  20. Masters D, Luschi C (2018) Revisiting small batch training for deep neural networks. arXiv:1804.07612

  21. Parchami M, Bashbaghi S, Granger E (2017) Cnns with cross-correlation matching for face recognition in video surveillance using a single training sample per person. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 1–6

  22. Parchami M, Bashbaghi S, Granger E (2017) Video-based face recognition using ensemble of haar-like deep convolutional neural networks. IJCNN

  23. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. European Conference on Computer Vision, pp 1–12

  24. Qi X, Liu C, Schuckers S (2018) Boosting face in video recognition via cnn based key frame extraction. 2018 International Conference on Biometrics (ICB), pp 132–139

  25. Rao Y, Lu J, Zhou J (2019) Learning discriminative aggregation network for video-based face recognition and person re-identification. Int J Comput Vis 127(6–7):701–718

    Article  Google Scholar 

  26. Tran D, et al (2015) Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision:4489–4497

  27. Wang R, Chen X (2009) Manifold discriminant analysis. In CVPR, pp:429–436

  28. Wang H, Wang Y, Cao Y (2009) Video-based face recognition: a survey. World Academy of Science, Eng Technol 60:293–302

    Google Scholar 

  29. Wu Y, He K (2018) Group normalization. Proceedings of the European conference on computer vision (ECCV), pp 3–19

  30. Yang M, Wang X, Liu W, Shen L (2016) Joint regularized nearest points for image set based face recognition. Image Vis Comput 58:47–60

    Article  Google Scholar 

  31. Yang J, Ren P, Zhang D, Chen D, Wen F, Li H, Hua G (2017) Neural aggregation network for video face recognition. IEEE Conference on Computer Vision and Pattern Recognition 4(6):7

    Google Scholar 

Download references

Funding

This research was supported by Hankuk University of Foreign Studies Research Fund. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant 2018R1D1A1A09082615.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jae Young Choi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, K.T., Lee, B. & Choi, J.Y. 3D-2D deep convolutional neural network (DCNN) Cascade for robust video face identification. Multimed Tools Appl 80, 4023–4036 (2021). https://doi.org/10.1007/s11042-020-09495-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09495-0

Keywords

Navigation