Effective multiple person recognition in random video sequences using a convolutional neural network

Puhalanthi, Niraimathi; Lin, Daw-Tung

doi:10.1007/s11042-019-7323-z

Effective multiple person recognition in random video sequences using a convolutional neural network

Published: 09 February 2019

Volume 79, pages 11125–11141, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

247 Accesses
2 Citations
Explore all metrics

Abstract

Effective and efficient face recognition through pervasive networks of surveillance cameras is one of the most challenging objectives of advanced computer vision. This study developed a real-time person recognition system (PRS) for the effective identification of multiple people in video sequences. We focused on identifying approximately 9000 celebrities by intelligent preprocessing, training, and deployment of a deep-learning convolutional neural network (CNN). The proposed PRS method comprises the following three major steps. In the first step, multiple faces present in a given frame as well as their associated landmarks are detected. This must be precise because the accuracy of this step dictates the accuracy of the complete PRS. In the second step, the extracted facial regions of interest are then aligned using affine warping, based on their respective identified landmark positions. The alignment process is meant to ensure correct identification of a person, because a wide range of faces entails intrinsic interclass similarities. Finally, in the third step, a VGG-19 CNN is trained to classify the aligned facial images for person recognition. In the training phase of the PRS, we utilized images from the CASIA WebFace database, which contains nearly 9000 classes, and aligned them using their respective facial landmarks. Subsequently, we used the aligned images to train a VGG-19 CNN classifier. For the purpose of validation, the trained classifier was tested with the standard Labelled Faces in the Wild (LFW) database by extracting the features for the LFW images using the trained VGG. Specifically, the VGG-extracted LFW features were used to train support vector machine classifiers, and the obtained resultant classification accuracy of approximately 96% was very close to the currently existing benchmark for the LFW database. During the testing phase, alternate frames of the input video were extracted and the identified faces (post-alignment) were used as inputs into the trained VGG to recognize the people in a given frame. When tested on random samples of video images, the proposed PRS offered robust recognition performance for most of the facial regions that had reasonable facial orientations and sizes. Furthermore, the average recognition time per person was approximately 370 milliseconds. The proposed deep learning-based PRS is the first of its kind to exhibit real-time performance for person recognition with significant accuracy, without involving any prior knowledge of the people involved in a video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Automation of surveillance systems using deep learning and facial recognition

Article 06 January 2023

Deep Learning Based Face Recognition System for Automated Identification

Real-Time Facial Recognition Using Deep Learning and Local Binary Patterns

References

Ahonen T, Hadid A, Pietikäinen M (2004) Face recognition with local binary patterns. In: Proceedings of ECCV, pp 469–481
Amos B, Ludwiczuk B, Satyanarayanan M (2016) Openface: a general-purpose face recognition library with mobile applications. Tech. rep., CMU-CS-16-118, CMU School of Computer Science
Berg T, Belhumeur PN (2012) Tom-vs-pete classifiers and identitypreserving alignment for face verification. In: Proceedings of BMVC, vol 2, p 7
Bloice MD, Stocker C, Holzinger A (2017) Augmentor: an image augmentation library for machine learning. arXiv preprint arXiv:1708.04680
Cao X, Wipf D, Wen F, Duan G, Sun J (2013) A practical transfer learning algorithm for face verification. In: Proceedings of ICCV, pp 3208–3215
Chen D, Cao X, Wang L, Wen F, Sun J (2012) Bayesian face revisited: a joint formulation. In: Proceedings of ECCV, pp 566–579
Chen D, Cao X, Wen F, Sun J (2013) Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification. In: Proceedings CVPR, pp 3025–3032
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low-and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst Hum 43(4):996–1002
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceeding of CVPR, vol 1, pp 886–893
Ding C, Tao D (2018) Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Trans Pattern Anal Mach Intell 40(4):1002–1014
Article MathSciNet Google Scholar
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Fontaine X, Achanta R, Süsstrunk S (2017) Face recognition in real-world images. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1482–1486
Gonzalez C, Jose M (2010) Detecting skin in face recognition systems: a color spaces study. Digital Signal Process 20(3):806–823
Article Google Scholar
Gonzalez-Sosa E, Fierrez J, Vera-Rodriguez R, Alonso-Fernandez F (2018) Facial soft biometrics for recognition in the wild: recent works, annotation, and cots evaluation. IEEE Trans Inf Forensics Secur 13(8):2001–2014
Article Google Scholar
Hauberg S, Freifeld O, Boesen A, Larsen L, Fisher JW, Hansen LK (2016) Dreaming more data: class-dependent distributions over dieomorphisms for learned data augmentation. In: Proceedings of 19th international conference on artificial intelligence and statistics
Hu L, Kan M, Shan S, Song X, Chen X (2017) Ldf-net: learning a displacement field network for face recognition across pose. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 9–16
Huang GB, Mattar M, Lee H, Learned-Miller E (2012) Learning to align from scratch. In: Advances in neural information processing systems, pp 764–772
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM international conference on multimedia, pp 675–678
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings CVPR, pp 1867–1874
Keren D, Osadchy M, Gotsman C (2001) Antifaces: a novel fast method for image detection. IEEE Trans Pattern Anal Mach Intell 23(7):747–761
Article Google Scholar
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):747–761
Article Google Scholar
LeCun Y, Boser B, Denker JS, Howard RE, Habbard W, Jackel LD, Henderson D (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404
Lee KC, Ho J, Yang M, Kriegman D (2003) Video-based face recognition using probabilistic appearance manifolds. In: Proceedings of CVPR, vol 1
Li Y, Gong S, Liddell H (2000) Support vector regression and classification based multi-view face detec- tion and recognition. In: Proceedings of international conference on automatic face and gesture recognition
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: IJCAI, pp 1617–1623
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI, vol 30, pp 1266–1272
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Article Google Scholar
Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl 76(8):10,701–10,719
Article Google Scholar
Masi I, Chang FJ, Choi J, Harel S, Kim J, Kim K, Leksut J, Rawls S, Wu Y, Hassner T et al (2018) Learning pose-aware models for pose-invariant face recognition in the wild. IEEE Trans Pattern Anal Mach Intell
Omkar P, Vedaldi A, Zisserman A (2015) Deep face recognition. In: Proceedings of British machine vision conference, vol 1, p 6
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252
Article MathSciNet Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of CVPR, pp 815–823
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556
Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Proceedings of CVPR, pp 1891–1898
Sund T, Moystad A (2006) Sliding window adaptive histogram equalization of intra-oral radiographs: effect on diagnostic quality. J Dentomaxillofac Radiol 35 (3):133–138
Article Google Scholar
Tadas B, Robinson P, Morency LP (2013) Constrained local neural fields for robust facial landmark detection in the wild. In: Proceedings IEEE international conference on computer vision workshops, pp 354–361
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of CVPR, pp 1701–1708
Tan X, Triggs B (2010) Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans Image Process 19(6):1635–1650
Article MathSciNet Google Scholar
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of CVPR
Yang J, Ren P, Zhang D, Chen D, Wen F, Li H, Hua G (2017) Neural aggregation network for video face recognition. In: International conference on computer vision and pattern recognition, vol 4, p 7
Yi D, Lei Z, Liao S, Li SZ (2014) Learning face representation from scratch. arXiv:1411.7923
Yin X, Yu X, Sohn K, Liu X, Chandraker M (2017) Towards large-pose face frontalization in the wild. In: Proceedings of the international conference on computer vision, pp 1–10
Zhao J, Cheng Y, Xu Y, Xiong L, Li J, Zhao F, Jayashree K, Pranata S, Shen S, Xing J et al (2018) Towards pose invariant face recognition in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2207–2216
Zhou E, Cao Z, Yin Q (2015) Naive-deep face recognition: touching the limit of lfw benchmark or not? arXiv preprint arXiv:1501.04690
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of CVPR, pp 2879–2886

Download references

Acknowledgements

This work was supported in part by the Ministry of Science and Technology, Taiwan, Grants MOST 105-2221-E-305-006-MY3, MOST 105-2622-E-305-001-CC3 and MOST 106-2622-E-305-003-CC3, and by the Orbit Technology Incorporation. The Authors would like to thank the providers of CASIA WebFace and LFW database. We would like to acknowledge that the videos used in testing the proposed PRS are chosen from YouTube.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taipei University, New Taipei City, Taiwan
Niraimathi Puhalanthi & Daw-Tung Lin

Authors

Niraimathi Puhalanthi
View author publications
You can also search for this author in PubMed Google Scholar
Daw-Tung Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daw-Tung Lin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Puhalanthi, N., Lin, DT. Effective multiple person recognition in random video sequences using a convolutional neural network. Multimed Tools Appl 79, 11125–11141 (2020). https://doi.org/10.1007/s11042-019-7323-z

Download citation

Received: 17 June 2018
Revised: 03 January 2019
Accepted: 31 January 2019
Published: 09 February 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11042-019-7323-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective multiple person recognition in random video sequences using a convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Automation of surveillance systems using deep learning and facial recognition

Deep Learning Based Face Recognition System for Automated Identification

Real-Time Facial Recognition Using Deep Learning and Local Binary Patterns

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective multiple person recognition in random video sequences using a convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Automation of surveillance systems using deep learning and facial recognition

Deep Learning Based Face Recognition System for Automated Identification

Real-Time Facial Recognition Using Deep Learning and Local Binary Patterns

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation