Skip to main content
Log in

A comparison among keyframe extraction techniques for CNN classification based on video periocular images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Training and validation sets of labeled data are important components used in supervised learning to build a classification model. During training, most learning algorithms use all images from the given training set to estimate the model’s parameters. Particularly for video classification, it is required a keyframe extraction technique in order to select representative frames for training, which commonly is based on simple heuristics such as low level features frame difference. As some learning algorithms are noise sensitive, it is important to carefully select frames for training so that the model’s optimization is accomplished more accurately and faster. We propose in this paper to analyze four methodologies for selecting representative frames of a periocular video database. One of them is based on the thresholds calculation (T), the other is a modified Kennard-Stone (KS) model, the thir method is based on sum of absolute difference in LUV colorspace and the last one is random sampling. To evaluate the selected image sets we use two deep network methodologies: feature extraction (FE) and fine tuning (FT). The results show that with a reduced amount of training images we can achieve the same accuracy of the complete database using the modified KS refinement methodology and the FT evaluation method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. https://github.com/facebookarchive/fb.resnet.torch/tree/master/pretrained

  2. http://imagem.sel.eesc.usp.br/base/iris/index.html

  3. https://github.com/facebookarchive/fb.resnet.torch/tree/master/pretrained

References

  1. Al-Obaydy WNI, Suandi SA (2020) Automatic pose normalization for open-set single-sample face recognition in video surveillance. Multimed Tools Appl 79:12

    Article  Google Scholar 

  2. Alonso-Fernandez F, Bigun J, Englund C (2018) Expression recognition using the periocular region: A feasibility study. In: 2018 14th International conference on signal-image technology internet-based systems (SITIS), pp 536–541

  3. Ambika DR, Radhika KR, Seshachalam D (2012) The eye says it all: Periocular region methodologies. In: 2012 International conference on multimedia computing and systems, pp 180–185

  4. Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357

    Article  Google Scholar 

  5. Balcázar J, Dai Y, Watanabe O (2001) A random sampling technique for training support vector machines. In: Abe N, Khardon R, Zeugmann T (eds) Algorithmic learning theory. Springer, Berlin, pp 119–134

  6. Barra S, Bisogni C, Nappi M, Ricciardi S (2019) F-fid: fast fuzzy-based iris de-noising for mobile security applications. Multimed Tools Appl 01:1–21

    Google Scholar 

  7. Barros de Almeida M, de Padua Braga A, Braga JP (2000) Svm-km: speeding svms learning with a priori cluster selection and k-means. In: Proceedings. vol.1. Sixth Brazilian symposium on neural networks, pp 162–167

  8. Barroso E, Santos G, Cardoso L, Padole C, Proença H (2016) Periocular recognition: how much facial expressions affect performance? vol 19

  9. Bulut E, Capin T (2007) Key frame extraction from motion capture data by curve saliency. CASA

  10. Cervantes J, Lamont FG, López-Chau A, Mazahua LR, Ruíz JS (2015) Data selection based on decision tree for svm classification on large data sets. Appl Soft Comput 37:787–798. Online. Available: http://www.sciencedirect.com/science/article/pii/S1568494615005591

    Article  Google Scholar 

  11. de Sousa LC (2008) Espectroscopia na região do infravermelho próximo para predição de características da madeira para produção de celulose. Ph.D. dissertation Universidade Federal de Viçosa

  12. de Sousa LC, Gomide JL, Milagres FR, de Almeida DP (2011) Desenvolvimento de modelos de calibração nirs para minimização das análises de madeira de eucalyptus spp. Ciência Florestal 21(3):591–599. OnlineAvailable: http://www.scielo.br/pdf/cflo/v21n3/1980-5098-cflo-21-03-00591.pdf

    Article  Google Scholar 

  13. de Souza JM, Gonzaga A (2019) Human iris feature extraction under pupil size variation using local texture descriptors. Multimed Tools Appl. Online. Available: https://doi.org/10.1007/s11042-019-7371-4

  14. Ding J, Chen B, Liu H, Huang M (2016) Convolutional neural network with data augmentation for sar target recognition. IEEE Geosci Remote Sens Lett 13(3):364–368

    Google Scholar 

  15. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T, Xing EP (2014) Decaf: A deep convolutional activation feature for generic visual recognition. In: Jebara T (ed) Proceedings of the 31st international conference on machine learning, ser. Proceedings of Machine Learning Research, Bejing, China: PMLR, vol 32, pp 647–655. Online. Available: http://proceedings.mlr.press/v32/donahue14.html

  16. Ferraz CT, Saito JH (2018) A comprehensive analysis of local binary convolutional neural network for fast face recognition in surveillance video. In: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web, ser. WebMedia ’18. ACM, New York, pp 265–268. Online. Available: https://doi.org/10.1145/3243082.3267444

  17. Gawande U, Hajari K, Golhar Y (2020) Deep learning approach to key frame detection in human action videos. In: Sadollah A, Sinha TS (eds) Recent trends in computational intelligence. Rijeka: IntechOpen, ch. 7. Online. Available: https://doi.org/10.5772/intechopen.91188

  18. González-Lozoya S, de la Calleja J, Pellegrin L, Escalante HJ, Medina M, Benitez-Ruiz A (2020) Recognition of facial expressions based on cnn features. Multimed Tools Appl

  19. Hannane R, Elboushaki A, Afdel K, Naghabhushan P, Javed M (2016) An efficient method for video shot boundary detection and keyframe extraction using sift-point distribution histogram. Int J Multimed Info Retrieval 5:89–104. Online. Available: https://link.springer.com/article/10.1007/s13735-016-0095-6

    Article  Google Scholar 

  20. He K, Girshick RB, Dollár P (2018) Rethinking imagenet pre-training, arXiv:1811.08883

  21. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition, arXiv:1512.03385

  22. Hernandez-Diaz K, Alonso-Fernandez F, Bigün J (2018) Periocular recognition using CNN features off-the-shelf, arXiv:1809.06157

  23. Jogin M, Mohana MS, Divya GD, Meghana RK, Apoorva S (2018) Feature extraction using convolution neural networks (cnn) and deep learning. In: 2018 3rd IEEE International conference on recent trends in electronics, information communication technology (RTEICT), pp 2319–2323

  24. Kaudki O, Bhurchandi K (2018) A robust iris recognition approach using fuzzy edge processing technique. In: 2018 9th International conference on computing, communication and networking technologies (ICCCNT), pp 1–6

  25. Kennard R, Stone L (1969) Computer aided design of experiments. Technometrics 11:137–148

    Article  Google Scholar 

  26. Lee YW, Kim KW, Hoang TM, Arsalan M, Park KR (2019) Deep residual cnn-based ocular recognition based on rough pupil detection in the images by nir camera sensor. Sensors (Basel, Switzerland), vol 19

  27. Morais CLM, Santos MCD, Lima KMG, Martin FL (2019) Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach. In: Bioinformatics, vol 35, pp 5257–5263. Online. Available: https://doi.org/10.1093/bioinformatics/btz421

  28. Muhammad K, Hussain T, Baik SW (2018) Efficient cnn based summarization of surveillance videos for resource-constrained devices. Pattern Recogn Lett. Online. Available: http://www.sciencedirect.com/science/article/pii/S0167865518303842

  29. Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900. https://doi.org/10.1007/s10462-017-9611-1

    Article  Google Scholar 

  30. Nguyen K, Fookes C, Ross A, Sridharan S (2018) Iris recognition with off-the-shelf cnn features: A deep learning perspective. IEEE Access 6:18848–18855

    Article  Google Scholar 

  31. Nigam I, Vatsa M, Singh R (2015) Ocular biometrics: A survey of modalities and fusion approaches. Information Fusion 26:1–35. Online. Available: http://www.sciencedirect.com/science/article/pii/S1566253515000354

    Article  Google Scholar 

  32. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623

    Article  MATH  Google Scholar 

  33. Ouyang S, Zhong L, Luo R (2018) The comparison and analysis of extracting video key frame. IOP Conference Series: Materials Science and Engineering 359:012010

    Article  Google Scholar 

  34. Padole C, Proenca H (2012) Periocular recognition: Analysis of performance degradation factors. In: 2012 5th IAPR International conference on biometrics (ICB), pp 439–445

  35. Paul MKA, Kavitha J, Rani PAJ (2018) Key-frame extraction techniques: A review. Recent Patents on Computer Science (Discontinued) 11(1):3–16

    Article  Google Scholar 

  36. Proenca H, Neves JC (2017) Irina: Iris recognition (even) in inaccurately segmented data. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  37. Qi X, Liu C, Schuckers S (2018) Boosting face in video recognition via cnn based key frame extraction. In: 2018 International conference on biometrics (ICB), pp 132–139

  38. Ravanbakhsh M, Mousavi H, Rastegari M, Murino V, Davis LS (2015) Action recognition with image based CNN features, arXiv:1512.03980

  39. Sajjad M, Khan S, Muhammad K, Wu W, Ullah A, Baik SW (2019) Multi-grade brain tumor classification using deep cnn with extensive data augmentation. Journal of Computational Science 30:174–182. Online. Available: http://www.sciencedirect.com/science/article/pii/S1877750318307385

    Article  Google Scholar 

  40. Saptoro A, Tadé M (2012) A modified kennard-stone algorithm for optimal division of data for developing artificial neural network models. Chem Prod Process Model, vol 7

  41. Schanda J (2007) Colorimetry: Understanding the CIE System. Wiley. Online. Available: https://books.google.com.br/books?id=uZadszSGe9MC

  42. Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the Ninth ACM International Conference on Multimedia, ser. MULTIMEDIA ’01. ACM, New York, pp 107–118, DOI https://doi.org/10.1145/500141.500159, (to appear in print)

  43. Tran L, Choi D (2020) Data augmentation for inertial sensor-based gait deep neural network. IEEE Access 8:12364–12378

    Article  Google Scholar 

  44. Verbiest N, Derrac J, Cornelis C, García S, Herrera F (2016) Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis. Appl Soft Comput 38:10–22. Online. Available: http://www.sciencedirect.com/science/article/pii/S1568494615005761

    Article  Google Scholar 

  45. Yang HS, Lee JM, Jeong SKW, Kim S, Moon YS (2019) Improved quality keyframe selection method for hd video. KSII Trans Internet Info Sys 13(6):3074–3091

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank NVidia for GPU donation. We would also like to thank Capes (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) for financial support, financing code - 001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carolina Toledo Ferraz.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Toledo Ferraz, C., Barcellos, W., Pereira Junior, O. et al. A comparison among keyframe extraction techniques for CNN classification based on video periocular images. Multimed Tools Appl 80, 12843–12856 (2021). https://doi.org/10.1007/s11042-020-10384-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10384-9

Keywords

Navigation