Abstract
Indexing is the process of extracting a compact, significant and pertinent signature that describes the content of the data. This field has a broad spectrum of promising applications, such as the Face in Video Recognition (FiVR). Motivating the interest of researchers around the world. Since the video has a huge amount of data, the process of extracting the relevant frames becomes necessary and an essential step prior to performing face recognition. In this context, we propose a new method for extracting keyframes from videos based on face quality and deep learning for a face recognition task. The first step is the face detection using MTCNN detector, which detects five landmarks (the eyes, the two corners of the mouth and the nose). It limits face boundaries in a bounding box, and provides a confidence score. This method has two steps. The first step aims to generate the face quality score of each face in the data set prepared for the learning step. To generate quality scores, we use three face feature extractor including Gabor, LBP and HoG. The second step consist on training a deep Convolutional Neural Network in a supervised manner in order to select frames having the best face quality. The obtained results show the effectiveness of the proposed method compared to the methods of the state of the art.











Similar content being viewed by others
Change history
27 October 2020
The original version of this paper was updated to present the correct biography of the first and corresponding author and to present the missing photos and biographies of the second and third authors.
References
Adam F, Robert L (2007) Constructing face image logs that are both complete and concise, 4th Canadian Conference on Computer and Robot Vision (CRV’07) : 488–494
Ahonen T, Hadid A, Pietikainen M (2004) Face recognition with local binary patterns. Eur Conf Comput Vision (ECCV) 3021:469–481
Akram A, Wang N, Li J, Gao X (2018) A comparative study on face sketch synthesis. IEEE Access 6:37084–37093
Anantharajah K, Denman S, Tjondronegoro SD, Fookes C, Guo X (2013) Quality based frame selection for face clustering in news video, International Conference on Digital Image Computing: Techniques and Applications (DICTA) :1–8
At&t laboratories cambridge face database. URL http://www.cl.cam.ac.uk/research/dtg/attarchive/ facedatabase.html. Accessed 26 March 2019.
Athanasios V, Nikolaos D, Anastasios D, Eftychios P (2014) Deep learning for computer vision: a brief review, Computational intelligence and neuroscience
Barr PJR, Bowyer KW, Biswas S (2012) Face recognition from video: a review. Int J Pattern Recognit Artif Intell 26(5):1266002
Bi H, Li N, Guan H, Lu D, Yang L (2019) A multi-scale conditional generative adversarial network for face sketch synthesis, IEEE International Conference on Image Processing (ICIP): 3876–3880
H. Bi, N. Li, H. Guan, D. Lu, L. Yang, (2019) A multi-scale conditional generative adversarial network for face sketch synthesis, in: 2019 IEEE international conference on image processing (ICIP): 3876–3880.
Bunyak F, Ersoy I, Subramanya S (2005) A multi-hypothesis approach for salient object tracking in visual surveillance, in: IEEE International Conference on Image Processing
Cament LA, Galdames F, Bowyer K, Perez C (2015) Face recognition under pose variation with local gabor features enhanced by active shape and statistical models. Pattern Recogn 48(11):3371–3384
Carcagnì P, Coco MD, Leo M, Distante C (2015) Facial expression recognition and histograms of oriented gradients: a comprehensive study. SpringerPlus 4(1)
Chen J, Deng Y, Bai G, Su G (2015) Face image quality assessment based on learning to rank. Signal Process Lett IEEE 22(1):90–94
Chen Y, Hu R, Xiao J, Liao L, Xiao J, Zhan G (2016) Criminal investigation oriented saliency detection for surveillance videos, in: Pacific Rim Conference on Multimedia, Springer: 487–496
Clevert D-A, Unterthiner Th., Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (elus), International Conference on Learning Representations (ICLR)
Dalal N, Trigg B (2005) Histograms of oriented gradients for human detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR): 886–893
Deng W, Chen B, Fang Y, Hu J (2017) Deep correlation feature learning for face verification in the wild. IEEE Signal Process Lett 24(2):1877–1881
Dhamecha TI, Goswami G, Singh R, Vatsa M (2016) On frame selection for video face recognition. Advances Face Detect Fac Image Analysis:279–297
Dubey AK, Jain V (2019) A review of face recognition methods using deep learning network. J Inf Optim Sci 40(2):547–558
Face recognition data, university of essex, uk. URL https://cswww.essex.ac.uk/mv/allfaces/index.html Accessed 28 March 2019
D.-P. Fan, W. Wang, M.-M. Cheng, J. Shen, (2019) Shifting more attention to video salient object detection proceedings of the IEEE conference on computer vision and pattern recognition: 8554–8564.
Fu T-C, Chiu W-C, Wang Y-CF (2017) Learning guided convolutional neural networks for cross- resolution face recognition, IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP): 1–5
Gharbi H, Bahroun S, Massaoudi M, Zagrouba E (2017) Key frames extraction using graph modularity clus- tering for efficient video summarization. IEEE Int Conf Acoustics Speech Signal Process ICASSP 42:1502–1506
Guangle Y, Tao L, Zhong J (2019) A review of convolutional neural network-based action recognition. Pattern Recogn Lett 118:14–22
J. Gui, Z. Sun, Y. Wen, D. Tao, J. Ye (2020) A review on generative adversarial networks: Algorithms, theory, and applications, arXiv preprint arXiv:2001.06937
Guo G, Zhang N (2019) A survey on deep learning based face recognition. Comput Vis Image Underst 189:102805
Guraya FFE, Cheikh FA, Tremeau A, Tong Y, Konik H (2010) Predictive saliency maps for surveillance videos, Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science, IEEE: 508–513
He K, Zhang X, Ren Sh., Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE International Conference on Computer Vision : 1026–1034
Huang C, Wang H (2019) Novel key-frames selection framework for comprehensive video summarization, IEEE Trans Circ Syst Vid Technol
Huang GB, Marwan M, Tamara B, Eric L-M (2008) Labeled faces in the wild: A database for studying face recognition in unconstrained environments, Workshop on faces in ‘Real-Life’ Images: detection, alignment, and recognition
Huang D, Shan C, Ardabilian M, Wang Y, Chen L (2011) Local binary patterns and its application to facial image analysis: a survey, IEEE transactions on systems, man, and cybernetics. Part C (Applications and Reviews) 41(6):765–781
Huang R, Liu C, Li G, Zhou J (2016) Adaptive deep supervised autoencoder based image reconstruction for face recognition. Math Probl Eng 2016:1–14
Javed S, Mahmood A, Bouwmans T, Jung SK (2017) Superpixels-based manifold structured sparse rpca for moving object detection, In: Proceedings of the British Machine Vision Conference (BMVC 2017), London, UK: 4–7
Javier H-O, Javier G, Julian F, Rudolf H, Laurent B (2019) FaceQNET: quality assessment for face recog- nition based on deep learning, arXiv preprint arXiv:1904.01740
Jian M, Zhang S, Wu L, Zhang S, Wang X, He Y (2019) Deep key frame extraction for sport training. Neurocomputing 328:147–156
Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection: IEEE Access (7): 128837–128868
Kaavya S, LakshmiPriya GG (2015) Multimedia indexing and retrieval: Recent research work and their challenges, 3rd International Conference on Signal Processing, Communication and Networking (ICSCN): 1–5
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions, Proceedings of the IEEE conference on computer vision and pattern recognition: 3128–3137
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization, International Conference on Learning Representations (ICLR) .
Kini M, Pai K (2019) A survey on video summarization techniques. Innovat Power Adv Comput Technol (i-PACT) 1:1–5
Krizhevsky A (2009) Learning multiple layers of features from tiny images, technical report, University of Toronto 1 (4)
Krizhevsky A, Hinton GE (2011) Using very deep autoencoders for content-based image retrieval ESANN, Vol. 1, Citeseer, p. 2
Lacey B-R, Jain AK (2018) Learning face image quality from human assessments. IEEE Trans Inform Foren Sec 13(12):3064–3077
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lee JHK-C, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338
Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476
Liu Y, Wei F, Shao J, Sheng L, Yan J, Wang X (2018) Exploring disentangled feature representation beyond face identification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, : 2080–2089
Matteo F, Annalisa F, Dario M, Davide M (2012) Face image conformance to iso/icao standards in machine readable travel documents. IEEE Trans Inform Foren Sec 7(4):1204–1213
Mei W, Weihong D (2018) Deep face recognition: a survey, ArXiv preprint arXiv:1804.06655 (26)
Mejda C, Akram K, Wajdi B, Chokri BA (2016) A survey of 2d face recognition techniques. Computers 5(4)
Muhammad K, Hussain T, Baik SW (2018) Efficient CNN based summarization of surveillance videos for resource-constrained devices, Pattern Recogn Lett
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines, Proceedings of the International Conference on Machine Learning (ICML) : 807–814
Nasrollahi K, Moeslund TB (2008) Face quality assessment system in video sequences. Biomet Ident Manag Springer:10–18
Nasrollahi K, Moeslund TB (2011) Summarization of surveillance video sequences using face quality assessment. Int J Image Graph 11(2):207–233
M. Nikitin, V. Konushin, A. Konushin (2014) Face quality assessment for face verification in video: 111–114.
Pan L, Shu X, Zhang M (2015) A key frame extraction algorithm based on clustering and compressive sensing. Int J Multimed Ubiquitous Eng 10(11):385–396
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. British Mach Vision Conf (BMVC) 1(3):1–12
Patiland PU, Warhade K (2016) Analysis of various keyframe extraction methods. Int J Electric Electron Res 4(2):35–40
Podlesnaya A, Podlesnyy S (2016) Deep learning based semantic video indexing and retrieval. Proceedings of SAI Intelligent Systems Conference, Springer: 359–372
Qi X, Liu Ch. (2015) GPU-accelerated key frame analysis for face detection in video, IEEE workshop on Delay Sensitive Video Computing in the Cloud (DSVCC) : 600–605
Qi CX, Schuckers S (2018) Boosting face in video recognition via CNN based key frame extraction, international Conference of Biometrics (ICB): 132–139
Qi X, Liu C, Schuckers S (2018) CNN based key frame extraction for face in video recognition, IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA): 1–8
Qiong C, Li S, Weidi X, Parkhi OM, Zisserman A (2018) Vggface2: A dataset for recognizing faces across pose and age, 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG): 67–74
Ramachandran P, Zoph B, Le QV (2018) Searching for activation functions, International Conference on Learning Representations ICLR
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) : 815–823
Shao Z, Wang L, Wang Z, Du W, Wu W (2019) Saliency-aware convolution neural network for ship detection in surveillance video, IEEE Trans Circ Syst Vid Technol
Shen L, Bai L (2006) A review on gabor wavelets for face recognition. Pattern Anal Applic 9:273–292
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Arxiv:1409–1556
F.Solina, P. Peer, B. Batagelj, S. Juvan, J. Kovac, (2003) Colorbased face detection in the 15 seconds of fame art installation, International Conference on Computer Vision/Computer Graphics Collaboration for Model-based Imaging, Rendering, Image Analysis and Graphical special Effects : 38–47
Štruc V, Gros J, Dobrisek S, Pavesic N (2013) Exploiting representation plurality for robust and efficient face recognition, Intenational Electrotechnical and Computer Science Conference (ERK): 121–124
Taigman MLY, Yang M (2014) Deep learning face representation from predicting 10,000 classes, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1891–1898.
Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 1701–1708
Vignesh S, Priya KM, Channappayya SS (2015) Face image quality assessment for face selection in surveil- lance video using convolutional neural networks, IEEE Global Conference on Signal and Information Processing (GlobalSIP) : 577–581
Vishal A (2018) Deep face quality assessment, arXiv preprint arXiv:1811.04346
Wang W, Yang J, Xiao J, Li S, Zhou D (2014) Face recognition based on deep learning, in: International Conference on Human Centered Computing, Springer: 812–820
Wang W, Yang J, Xiao J, Li S, Zhou D (2014) Face recognition based on deep learning International Conference on Human Centered Computing, Springer, 2014, pp. 812–820
Wang H, Hu J, Deng W (2018) Face feature extraction: a complete review. IEEE Access 6:6001–6039
Wen Y, Zhang K, LiYu Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. Eur Conf Comput Vision:499–515
Wiskott NKL, Fellous J-M, Malsburg C (1997) Face recognition by elastic bunch graph matching. IEEE Trans Pattern Anal Mach Intell 19(7):775–779
Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity, Conference on Computer Vision and Pattern Recognition : 529–534
Wong SCY, Chen Sh., Lovell B (2011) Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition, IEEE Biometrics Workshop, Computer Vision and Pattern Recognition (CVPR) Workshops : 81–88
Wu Y, Ji Q (2019) Facial landmark detection: a literature survey. Int J Comput Vis 127(2):115–142
Wu X, Xu K, Hall P (2017) A survey of image synthesis and editing with generative adversarial networks. Tsinghua Sci Technol 22(6):660–674
Xie X, Lam KM (2006) Gabor-based kernel PCA with doubly nonlinear mapping for face recognition with a single face image. IEEE Trans Image Process 15(9):2481–2492
Xu C, Liu Q, Ye M (2017) Age invariant face recognition and retrieval by coupled auto-encoder networks. Neurocomputing 222:62–71
Yang J, Ren P, Zhang D, Chen D, Wen F, Li H, Hua G (2017) Neural aggregation network for video face recognition, Proceedings of the IEEE conference on computer vision and pattern recognition: 4362–4371
Yanming G, Yu L, Ard O, Songyang L, Song W, Lew MS (2016) Deep learning for visual understanding: a review. Neurocomputing 187:27–48
Yao L, Torabi A, Cho K, Ballas N, Pal C, Larochelle H, Courville A (2015) Describing videos by exploiting temporal structure, Proceedings of the IEEE international conference on computer vision: 4507–4515
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Zhao Z-Q, Zheng P, Xu S-t, Wu X (2019) Object detection with deep learning: a review, IEEE Trans Neur Netw Learn Syst (21)
Zou J, Ji Q, Nagy G (2007) A comparative study of local matching approach for face recognition. IEEE Trans Image Process 16(10):2617–2628
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abed, R., Bahroun, S. & Zagrouba, E. KeyFrame extraction based on face quality measurement and convolutional neural network for efficient face recognition in videos. Multimed Tools Appl 80, 23157–23179 (2021). https://doi.org/10.1007/s11042-020-09385-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09385-5