A comparison among keyframe extraction techniques for CNN classification based on video periocular images

Toledo Ferraz, Carolina; Barcellos, William; Pereira Junior, Osmando; Trevisan Negri Borges, Tamiris; Garcia Manzato, Marcelo; Gonzaga, Adilson; Hiroki Saito, José

doi:10.1007/s11042-020-10384-9

A comparison among keyframe extraction techniques for CNN classification based on video periocular images

Published: 13 January 2021

Volume 80, pages 12843–12856, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Carolina Toledo Ferraz ORCID: orcid.org/0000-0002-0867-6350¹,
William Barcellos²,
Osmando Pereira Junior³,
Tamiris Trevisan Negri Borges⁴,
Marcelo Garcia Manzato⁵,
Adilson Gonzaga² &
…
José Hiroki Saito¹

519 Accesses
4 Citations
Explore all metrics

Abstract

Training and validation sets of labeled data are important components used in supervised learning to build a classification model. During training, most learning algorithms use all images from the given training set to estimate the model’s parameters. Particularly for video classification, it is required a keyframe extraction technique in order to select representative frames for training, which commonly is based on simple heuristics such as low level features frame difference. As some learning algorithms are noise sensitive, it is important to carefully select frames for training so that the model’s optimization is accomplished more accurately and faster. We propose in this paper to analyze four methodologies for selecting representative frames of a periocular video database. One of them is based on the thresholds calculation (T), the other is a modified Kennard-Stone (KS) model, the thir method is based on sum of absolute difference in LUV colorspace and the last one is random sampling. To evaluate the selected image sets we use two deep network methodologies: feature extraction (FE) and fine tuning (FT). The results show that with a reduced amount of training images we can achieve the same accuracy of the complete database using the modified KS refinement methodology and the FT evaluation method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel gray wolf optimization-based key frame extraction method for video classification using ConvLSTM

Article 12 August 2024

A novel keyframe extraction method for video classification using deep neural networks

Article Open access 02 August 2021

KeyFrame extraction based on face quality measurement and convolutional neural network for efficient face recognition in videos

Article 06 August 2020

Notes

References

Al-Obaydy WNI, Suandi SA (2020) Automatic pose normalization for open-set single-sample face recognition in video surveillance. Multimed Tools Appl 79:12
Article Google Scholar
Alonso-Fernandez F, Bigun J, Englund C (2018) Expression recognition using the periocular region: A feasibility study. In: 2018 14th International conference on signal-image technology internet-based systems (SITIS), pp 536–541
Ambika DR, Radhika KR, Seshachalam D (2012) The eye says it all: Periocular region methodologies. In: 2012 International conference on multimedia computing and systems, pp 180–185
Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357
Article Google Scholar
Balcázar J, Dai Y, Watanabe O (2001) A random sampling technique for training support vector machines. In: Abe N, Khardon R, Zeugmann T (eds) Algorithmic learning theory. Springer, Berlin, pp 119–134
Barra S, Bisogni C, Nappi M, Ricciardi S (2019) F-fid: fast fuzzy-based iris de-noising for mobile security applications. Multimed Tools Appl 01:1–21
Google Scholar
Barros de Almeida M, de Padua Braga A, Braga JP (2000) Svm-km: speeding svms learning with a priori cluster selection and k-means. In: Proceedings. vol.1. Sixth Brazilian symposium on neural networks, pp 162–167
Barroso E, Santos G, Cardoso L, Padole C, Proença H (2016) Periocular recognition: how much facial expressions affect performance? vol 19
Bulut E, Capin T (2007) Key frame extraction from motion capture data by curve saliency. CASA
Cervantes J, Lamont FG, López-Chau A, Mazahua LR, Ruíz JS (2015) Data selection based on decision tree for svm classification on large data sets. Appl Soft Comput 37:787–798. Online. Available: http://www.sciencedirect.com/science/article/pii/S1568494615005591
Article Google Scholar
de Sousa LC (2008) Espectroscopia na região do infravermelho próximo para predição de características da madeira para produção de celulose. Ph.D. dissertation Universidade Federal de Viçosa
de Sousa LC, Gomide JL, Milagres FR, de Almeida DP (2011) Desenvolvimento de modelos de calibração nirs para minimização das análises de madeira de eucalyptus spp. Ciência Florestal 21(3):591–599. OnlineAvailable: http://www.scielo.br/pdf/cflo/v21n3/1980-5098-cflo-21-03-00591.pdf
Article Google Scholar
de Souza JM, Gonzaga A (2019) Human iris feature extraction under pupil size variation using local texture descriptors. Multimed Tools Appl. Online. Available: https://doi.org/10.1007/s11042-019-7371-4
Ding J, Chen B, Liu H, Huang M (2016) Convolutional neural network with data augmentation for sar target recognition. IEEE Geosci Remote Sens Lett 13(3):364–368
Google Scholar
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T, Xing EP (2014) Decaf: A deep convolutional activation feature for generic visual recognition. In: Jebara T (ed) Proceedings of the 31st international conference on machine learning, ser. Proceedings of Machine Learning Research, Bejing, China: PMLR, vol 32, pp 647–655. Online. Available: http://proceedings.mlr.press/v32/donahue14.html
Ferraz CT, Saito JH (2018) A comprehensive analysis of local binary convolutional neural network for fast face recognition in surveillance video. In: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web, ser. WebMedia ’18. ACM, New York, pp 265–268. Online. Available: https://doi.org/10.1145/3243082.3267444
Gawande U, Hajari K, Golhar Y (2020) Deep learning approach to key frame detection in human action videos. In: Sadollah A, Sinha TS (eds) Recent trends in computational intelligence. Rijeka: IntechOpen, ch. 7. Online. Available: https://doi.org/10.5772/intechopen.91188
González-Lozoya S, de la Calleja J, Pellegrin L, Escalante HJ, Medina M, Benitez-Ruiz A (2020) Recognition of facial expressions based on cnn features. Multimed Tools Appl
Hannane R, Elboushaki A, Afdel K, Naghabhushan P, Javed M (2016) An efficient method for video shot boundary detection and keyframe extraction using sift-point distribution histogram. Int J Multimed Info Retrieval 5:89–104. Online. Available: https://link.springer.com/article/10.1007/s13735-016-0095-6
Article Google Scholar
He K, Girshick RB, Dollár P (2018) Rethinking imagenet pre-training, arXiv:1811.08883
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition, arXiv:1512.03385
Hernandez-Diaz K, Alonso-Fernandez F, Bigün J (2018) Periocular recognition using CNN features off-the-shelf, arXiv:1809.06157
Jogin M, Mohana MS, Divya GD, Meghana RK, Apoorva S (2018) Feature extraction using convolution neural networks (cnn) and deep learning. In: 2018 3rd IEEE International conference on recent trends in electronics, information communication technology (RTEICT), pp 2319–2323
Kaudki O, Bhurchandi K (2018) A robust iris recognition approach using fuzzy edge processing technique. In: 2018 9th International conference on computing, communication and networking technologies (ICCCNT), pp 1–6
Kennard R, Stone L (1969) Computer aided design of experiments. Technometrics 11:137–148
Article Google Scholar
Lee YW, Kim KW, Hoang TM, Arsalan M, Park KR (2019) Deep residual cnn-based ocular recognition based on rough pupil detection in the images by nir camera sensor. Sensors (Basel, Switzerland), vol 19
Morais CLM, Santos MCD, Lima KMG, Martin FL (2019) Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach. In: Bioinformatics, vol 35, pp 5257–5263. Online. Available: https://doi.org/10.1093/bioinformatics/btz421
Muhammad K, Hussain T, Baik SW (2018) Efficient cnn based summarization of surveillance videos for resource-constrained devices. Pattern Recogn Lett. Online. Available: http://www.sciencedirect.com/science/article/pii/S0167865518303842
Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900. https://doi.org/10.1007/s10462-017-9611-1
Article Google Scholar
Nguyen K, Fookes C, Ross A, Sridharan S (2018) Iris recognition with off-the-shelf cnn features: A deep learning perspective. IEEE Access 6:18848–18855
Article Google Scholar
Nigam I, Vatsa M, Singh R (2015) Ocular biometrics: A survey of modalities and fusion approaches. Information Fusion 26:1–35. Online. Available: http://www.sciencedirect.com/science/article/pii/S1566253515000354
Article Google Scholar
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Article MATH Google Scholar
Ouyang S, Zhong L, Luo R (2018) The comparison and analysis of extracting video key frame. IOP Conference Series: Materials Science and Engineering 359:012010
Article Google Scholar
Padole C, Proenca H (2012) Periocular recognition: Analysis of performance degradation factors. In: 2012 5th IAPR International conference on biometrics (ICB), pp 439–445
Paul MKA, Kavitha J, Rani PAJ (2018) Key-frame extraction techniques: A review. Recent Patents on Computer Science (Discontinued) 11(1):3–16
Article Google Scholar
Proenca H, Neves JC (2017) Irina: Iris recognition (even) in inaccurately segmented data. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Qi X, Liu C, Schuckers S (2018) Boosting face in video recognition via cnn based key frame extraction. In: 2018 International conference on biometrics (ICB), pp 132–139
Ravanbakhsh M, Mousavi H, Rastegari M, Murino V, Davis LS (2015) Action recognition with image based CNN features, arXiv:1512.03980
Sajjad M, Khan S, Muhammad K, Wu W, Ullah A, Baik SW (2019) Multi-grade brain tumor classification using deep cnn with extensive data augmentation. Journal of Computational Science 30:174–182. Online. Available: http://www.sciencedirect.com/science/article/pii/S1877750318307385
Article Google Scholar
Saptoro A, Tadé M (2012) A modified kennard-stone algorithm for optimal division of data for developing artificial neural network models. Chem Prod Process Model, vol 7
Schanda J (2007) Colorimetry: Understanding the CIE System. Wiley. Online. Available: https://books.google.com.br/books?id=uZadszSGe9MC
Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the Ninth ACM International Conference on Multimedia, ser. MULTIMEDIA ’01. ACM, New York, pp 107–118, DOI https://doi.org/10.1145/500141.500159, (to appear in print)
Tran L, Choi D (2020) Data augmentation for inertial sensor-based gait deep neural network. IEEE Access 8:12364–12378
Article Google Scholar
Verbiest N, Derrac J, Cornelis C, García S, Herrera F (2016) Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis. Appl Soft Comput 38:10–22. Online. Available: http://www.sciencedirect.com/science/article/pii/S1568494615005761
Article Google Scholar
Yang HS, Lee JM, Jeong SKW, Kim S, Moon YS (2019) Improved quality keyframe selection method for hd video. KSII Trans Internet Info Sys 13(6):3074–3091
Google Scholar

Download references

Acknowledgments

The authors would like to thank NVidia for GPU donation. We would also like to thank Capes (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) for financial support, financing code - 001.

Author information

Authors and Affiliations

Computer Science Program, UNIFACCAMP, Campo Limpo Paulista, SP, Brazil
Carolina Toledo Ferraz & José Hiroki Saito
Department of Electrical Engineering, São Paulo University, São Carlos, SP, Brazil
William Barcellos & Adilson Gonzaga
Department of Electrical Engineering, Federal Institute of Triângulo Mineiro - IFTM, Patrocínio, MG, Brazil
Osmando Pereira Junior
Department of Matemathics and Education, Federal Institute of São Paulo (IFSP), Araraquara, SP, Brazil
Tamiris Trevisan Negri Borges
Department of Computer Science, São Paulo University, São Carlos, SP, Brazil
Marcelo Garcia Manzato

Authors

Carolina Toledo Ferraz
View author publications
You can also search for this author inPubMed Google Scholar
William Barcellos
View author publications
You can also search for this author inPubMed Google Scholar
Osmando Pereira Junior
View author publications
You can also search for this author inPubMed Google Scholar
Tamiris Trevisan Negri Borges
View author publications
You can also search for this author inPubMed Google Scholar
Marcelo Garcia Manzato
View author publications
You can also search for this author inPubMed Google Scholar
Adilson Gonzaga
View author publications
You can also search for this author inPubMed Google Scholar
José Hiroki Saito
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Carolina Toledo Ferraz.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Toledo Ferraz, C., Barcellos, W., Pereira Junior, O. et al. A comparison among keyframe extraction techniques for CNN classification based on video periocular images. Multimed Tools Appl 80, 12843–12856 (2021). https://doi.org/10.1007/s11042-020-10384-9

Download citation

Received: 17 March 2020
Revised: 29 September 2020
Accepted: 22 December 2020
Published: 13 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10384-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison among keyframe extraction techniques for CNN classification based on video periocular images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel gray wolf optimization-based key frame extraction method for video classification using ConvLSTM

A novel keyframe extraction method for video classification using deep neural networks

KeyFrame extraction based on face quality measurement and convolutional neural network for efficient face recognition in videos

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now