Skip to main content
Log in

Enhancing face detection in video sequences by video segmentation preprocessing

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, some learning-based methods are proposed to detect and locate humans in real-time via convolutional neural networks (CNN). However, high-performance graphics processing units (GPUs) are required in those methods. To resolve this problem, a preprocessing procedure based on video segmentation is proposed to speed up face detection. Meanwhile, an accelerating toolkit is employed in this study to perform face detection in real-time on a standard central processing unit (CPU). Experimental results indicate that the proposed method can achieve an F1-Score of 93.2% and 4.5 times of real-time speed with one CPU on 155883 test frames from the RAI dataset, YouTube, and YOUKU. Notably, when the video sequence is with fewer frames of human faces, the highest speed is nearly 18 times faster than that without video segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Liu Q, He Z, Li X, Zheng Y (2020) Ptb-tir: A thermal infrared pedestrian tracking benchmark. IEEE Trans Multimedia 22(3):666–675. https://doi.org/10.1109/TMM.2019.2932615

    Article  Google Scholar 

  2. Yang H, Liu L, Min W, Yang X, Xiong X (2021) Driver yawning detection based on subtle facial action recognition. IEEE Trans Multimedia 23:572–583. https://doi.org/10.1109/TMM.2020.2985536

    Article  Google Scholar 

  3. Tian F, Gao Y, Fang Z, Fang Y, Gu J, Fujita H, Hwang J-N (2021) Depth estimation using a self-supervised network based on cross-layer feature fusion and the quadtree constraint. IEEE Trans Circuits Syst Video Technol

  4. Wu D, Sun D-W (2013) Colour measurements by computer vision for food quality control–a review. Trends Food Sci Technol 29(1):5–20

    Article  Google Scholar 

  5. Samaiya D, Gupta KK (2018) Intelligent video surveillance for real time energy savings in smart buildings using hevc compressed domain features. Multimed Tools Appl 77(21):29059–29076

    Article  Google Scholar 

  6. Hui-bin L, Fei W, Qiang C, Yong P (2016) Recognition of individual object in focus people group based on deep learning. In: 2016 International conference on audio, language and image processing (ICALIP). IEEE, pp 615–619

  7. Gao Y, Villecco F, Li M, Song W (2017) Multi-scale permutation entropy based on improved lmd and hmm for rolling bearing diagnosis. Entropy 19(4):176

    Article  Google Scholar 

  8. Zhao Y, Li H, Wan S, Sekuboyina A, Hu X, Tetteh G, Piraud M, Menze B (2019) Knowledge-aided convolutional neural network for small organ segmentation. IEEE J Biomed Health Inform 23(4):1363–1373

    Article  Google Scholar 

  9. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  10. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  11. Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowl-Based Syst 194:105590

    Article  Google Scholar 

  12. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

  13. Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Proceedings of the European conference on computer vision (ECCV), pp 83–98

  14. Liu H, Tan T-H, Kuo T-Y (2019) A novel shot detection approach based on orb fused with structural similarity. IEEE Access 8:2472–2481

    Article  Google Scholar 

  15. Ding S, Qu S, Xi Y, Wan S (2019) A long video caption generation algorithm for big video data retrieval. Futur Gener Comput Syst 93:583–595

    Article  Google Scholar 

  16. Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: 2011 International conference on computer vision. Ieee, pp 2564–2571

  17. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error measurement to structural similarity. IEEE Trans Image Process 13(1)

  18. AImageLab (2021) Rai dataset https://aimagelab.ing.unimore.it/imagelab/researchActivity.asp?idActivity=19

  19. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  20. Li SZ, Zhang Z (2004) Floatboost learning and statistical face detection. IEEE Trans Pattern Anal Mach Intell 26(9):1112–1123

    Article  Google Scholar 

  21. Huang C, Ai H, Li Y, Lao S (2007) High-performance rotation invariant multiview face detection. IEEE Trans Pattern Anal Mach Intell 29(4):671–686

    Article  Google Scholar 

  22. Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 650–657

  23. Zhang S, Wang X, Lei Z, Li SZ (2019) Faceboxes: A cpu real-time and accurate unconstrained face detector. Neurocomputing 364:297–309

    Article  Google Scholar 

  24. Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: Single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5203–5212

  25. Intel (2021) Model Zoo https://docs.openvinotoolkit.org/2019_R1/_face_detection_adas_0001_description_face_detection_adas_0001.html

  26. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  27. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  28. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  29. GG LP, Domnic S (2014) Walsh–hadamard transform kernel-based feature vector for shot boundary detection. IEEE Trans Image Process 23(12):5187–5197

    Article  MathSciNet  MATH  Google Scholar 

  30. Mori G, Belongie S, Malik J (2005) Efficient shape matching using shape contexts. IEEE Trans Pattern Anal Mach Intell 27(11):1832–1837

    Article  MATH  Google Scholar 

  31. Krishnapuram R, Medasani S, Jung S-H, Choi Y-S, Balasubramaniam R (2004) Content-based image retrieval based on a fuzzy approach. IEEE Trans Knowl Data Eng 16(10):1185–1199

    Article  Google Scholar 

  32. Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: European conference on computer vision. Springer, pp 430–443

  33. Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: Binary robust independent elementary features. In: European conference on computer vision. Springer, pp 778–792

  34. Intel (2021) OpenVINO Toolkit https://software.intel.com/en-us/openvino-toolkit

  35. Kozlov A, Osokin D (2019) Development of real-time adas object detector for deployment on cpu. In: Proceedings of SAI intelligent systems conference. Springer, pp 740–750

  36. Osokin D (2018) Real-time 2d multi-person pose estimation on cpu: Lightweight openpose. arXiv:1811.12004

  37. YouTube (2019) Youtube https://www.youtube.com/watch?v=no-ZR7-x76s

  38. YOUKU (2021) YOUKU. https://v.youku.com/v_show/id_XOTU0NzIzMTQw.html?spm=a-2h0k.114173-42.soresults.dtitle

  39. YOUKU (2021) YOUKU. https://v.youku.com/v_show/id_XNjE2NDk4OTY=.html?spm=a2h0k.11417342.soresults.dtitle

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zuoxun Fan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Fan, Z., Chen, Q. et al. Enhancing face detection in video sequences by video segmentation preprocessing. Appl Intell 53, 2897–2907 (2023). https://doi.org/10.1007/s10489-022-03608-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03608-y

Keywords

Navigation