Skip to main content
Log in

Hybrid attention network and center-guided non-maximum suppression for occluded face detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The face detection technique has obtained significant development with the huge application of convolutional neural networks. However, various types of occlusion are widespread in face detection, inevitably destroying the visual features of faces and significantly increasing the difficulty of post-processing. These problems make the occluded face detection a challenging and crucial task. In this paper, we propose a new occlusion-aware face detector (OFDet) to deal with the problem of occluded face detection, which mainly includes a hybrid attention module (HAM) and a center-guided non-maximum suppression (cgNMS) algorithm. Specifically, the HAM consists of three types of attention blocks, i.e., spatial attention block (SAB), channel attention block (CAB), and channel-spatial attention block (CSAB), integrated in a hybrid manner. This module can help the network learn more discriminative and robust feature representation by adaptively highlighting the features of more informative visible facial regions and weakening the features of occluded facial regions, contributing to solving the inter-class occlusion issue. The cgNMS introduces the information of center point distance between detected boxes as a new suppression metric to supplement the traditional intersection over union (IoU) metric. This dual-metric design of cgNMS can ensure that it makes the correct post-processing from highly overlapped detected boxes to deal with the intra-class occlusion problem. Experimental results show that our OFDet achieves state-of-the-art results on the MAFA dataset and obtains competitive results on the WIDER FACE and FDDB datasets, which demonstrate the effectiveness of our method. In addition, HAM and cgNMS are highly efficient, and their cost basically does not affect the efficiency of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Behera SK, Rath AK, Sethy PK (2021) Fruits yield estimation using faster r-CNN with miou. Multimed Tools Appl 80(12):19043–19056

    Article  Google Scholar 

  2. Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS–improving object detection with one line of code. In: IEEE International conference on computer vision, pp 5561–5569

  3. Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018) A2-nets: Double attention networks. In: Advances in neural information processing systems, vol 31

  4. Chen Y, Song L, Hu Y, He R (2018) Adversarial occlusion-aware face detection. In: IEEE International conference on biometrics theory, applications and systems, pp 1–9

  5. Chen S, Wang X, Chen C, Lu Y, Zhang X, Wen L (2019) DeepSquare: Boosting the learning power of deep convolutional neural networks with elementwise square operators. arXiv:1906.04979

  6. Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) SCA-CNN: Spatial And channel-wise attention in convolutional networks for image captioning. In: IEEE Conference on computer vision and pattern recognition, pp 5659–5667

  7. Cheng G, Lang C, Wu M, Xie X, Yao X, Han J (2021) Feature enhancement network for object detection in optical remote sensing images. Journal of Remote Sensing 2021

  8. Cheng G, Si Y, Hong H, Yao X, Guo L (2020) Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci Remote Sens Lett 18(3):431–435

    Article  Google Scholar 

  9. Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Selective refinement network for high performance face detection. In: AAAI Conference on artificial intelligence, vol 33, pp 8231–8238

  10. Dai T, Cai J, Zhang Y, Xia S-T, Zhang L (2019) Second-order attention network for single image super-resolution. In: IEEE Conference on computer vision and pattern recognition, pp 11065–11074

  11. Fang Z, Ren J, Marshall S, Zhao H, Wang Z, Huang K, Xiao B (2020) Triple loss for hard face detection. Neurocomputing 398:20–30

    Article  Google Scholar 

  12. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: IEEE Conference on computer vision and pattern recognition, pp 3146–3154

  13. Gählert N, Hanselmann N, Franke U, Denzler J (2020) Visibility guided NMS: Efficient boosting of amodal object detection in crowded traffic scenes. arXiv:2006.08547

  14. Gan Y, Chen J, Yang Z, Xu L (2020) Multiple attention network for facial expression recognition. IEEE Access 8:7383–7393

    Article  Google Scholar 

  15. Gao Z, Xie J, Wang Q, Li P (2019) Global second-order pooling convolutional networks. In: IEEE Conference on computer vision and pattern recognition, pp 3024–3033

  16. Ge S, Li J, Ye Q, Luo Z (2017) Detecting masked faces in the wild with LLE-CNNs. In: IEEE Conference on computer vision and pattern recognition, pp 2682–2690

  17. Ghiasi G, Fowlkes CC (2015) Occlusion coherence: Detecting and localizing occluded faces. arXiv:1506.08347

  18. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, pp 249–256

  19. He R, Cao J, Song L, Sun Z, Tan T (2020) Adversarial cross-spectral face completion for NIR-VIS face recognition. IEEE Trans Pattern Anal Mach Intell 42(5):1025–1037

    Article  Google Scholar 

  20. He L, Li H, Zhang Q, Sun Z (2018) Dynamic feature learning for partial face recognition. In: IEEE Conference on computer vision and pattern recognition, pp 7054–7063

  21. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: IEEE Conference on computer vision and pattern recognition, pp 13713–13722

  22. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE Conference on computer vision and pattern recognition, pp 7132–7141

  23. Hu X, Yang K, Fei L, Wang K (2019) ACNEt: Attention based network to exploit complementary features for rgbd semantic segmentation. In: IEEE International conference on image processing, pp 1440–1444

  24. Huang X, Ge Z, Jie Z, Yoshie O (2020) NMS By representative region: Towards crowded pedestrian detection by proposal pairing. In: IEEE Conference on computer vision and pattern recognition, pp 10750–10759

  25. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) CCNEt: Criss-cross attention for semantic segmentation. In: IEEE International conference on computer vision, pp 603–612

  26. Huang L, Yuan Y, Guo J, Zhang C, Chen X, Wang J (2019) Interlaced sparse self-attention for semantic segmentation. arXiv:1907.12273

  27. Iliadis M, Wang H, Molina R, Katsaggelos AK (2017) Robust and low-rank representation for fast face identification with occlusions. IEEE Trans Image Process 26(5):2203–2218

    Article  MathSciNet  MATH  Google Scholar 

  28. Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. In: Advances in neural information processing systems, vol 28

  29. Jain V, Learned-Miller E (2010) FDDB: A Benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts Amherst

  30. Kumar A, Marks TK, Mou W, Wang Y, Jones M, Cherian A, Koike-Akino T, Liu X, Feng C (2020) LUVLI face alignment: Estimating landmarks’ location, uncertainty, and visibility likelihood. In: IEEE Conference on computer vision and pattern recognition, pp 8236–8246

  31. Lee H, Kim H-E, Nam H (2019) SRM: A style-based recalibration module for convolutional neural networks. In: IEEE International conference on computer vision, pp 1854–1862

  32. Li J, Wang Y, Wang C, Tai Y, Qian J, Yang J, Wang C, Li J, Huang F (2019) DSFD: Dual Shot face detector. In: IEEE Conference on computer vision and pattern recognition, pp 5060–5069

  33. Linsley D, Shiebler D, Eberhardt S, Serre T (2019) Learning what and where to attend. In: International conference on learning representations

  34. Liu S, Huang D, Wang Y (2019) Adaptive NMS: Refining pedestrian detection in a crowd. In: IEEE Conference on computer vision and pattern recognition, pp 6459–6468

  35. Liu Y, Tang X (2020) BFBOx: Searching face-appropriate backbone and feature pyramid network for face detector. In: IEEE Conference on computer vision and pattern recognition, pp 13568–13577

  36. Liu Y, Tang X, Wu X, Han J, Liu J, Ding E (2020) HAMBOx: Delving into online high-quality anchors mining for detecting outer faces. In: IEEE Conference on computer vision and pattern recognition, pp 13043–13051

  37. Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: IEEE Conference on computer vision and pattern recognition, pp 3623–3632

  38. Lu X, Wang W, Shen J, Crandall D, Luo J (2022) Zero-shot video object segmentation with co-attention siamese networks. IEEE Trans Pattern Anal Mach Intell 44(4):2228–2242

    Google Scholar 

  39. Lu X, Wang W, Shen J, Crandall D, Van Gool L (2021) Segmenting objects from relational visual data. IEEE Trans Pattern Anal Mach Intell, 1–1

  40. Luo J, Liu J, Lin J, Wang Z (2020) A lightweight face detector by integrating the convolutional neural network with the image pyramid. Pattern Recogn Lett 133:180–187

    Article  Google Scholar 

  41. Mahbub U, Sarkar S, Chellappa R (2019) Partial face detection in the mobile domain. Image Vis Comput 82:1–17

    Article  Google Scholar 

  42. Mathias M, Benenson R, Pedersoli M, Van Gool L (2014) Face detection without bells and whistles. In: European conference on computer vision, pp 720–735

  43. Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: Convolutional triplet attention module. In: IEEE Winter conference on applications of computer vision, pp 3139–3148

  44. Mnih V, Heess N, Graves A, et al. (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, vol 27

  45. Najibi M, Samangouei P, Chellappa R, Davis LS (2017) SSH: Single Stage headless face detector. In: IEEE international conference on computer vision, pp 4875–4884

  46. Nian F, Li T, Bao B-K, Xu C (2020) Relative coordinates constraint for face alignment. Neurocomputing 395:119–127

    Article  Google Scholar 

  47. Opitz M, Waltner G, Poier G, Possegger H, Bischof H (2016) Grid loss: Detecting occluded faces. In: European conference on computer vision, pp 386–402

  48. Park J, Woo S, Lee J. -Y., Kweon IS (2018) BAM: Bottleneck Attention module. In: British machine vision conference, pp 147–157

  49. Qin Z, Zhang P, Wu F, Li X (2021) Fcanet: Frequency channel attention networks. In: IEEE International conference on computer vision, pp 783–792

  50. Roccetti M, Marfia G, Semeraro A (2012) Playing into the wild: a gesture-based interface for gaming in public spaces. J Vis Commun Image Represent 23 (3):426–440

    Article  Google Scholar 

  51. Roccetti M, Marfia G, Zanichelli M (2010) The art and craft of making the tortellino: Playing with a digital gesture recognizer for preparing pasta culinary recipes. Comput Entertain 8(4):1–20

    Article  Google Scholar 

  52. Roy AG, Navab N, Wachinger C (2018) Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks. IEEE Trans Med Imaging 38(2):540–549

    Article  Google Scholar 

  53. Salscheider NO (2020) FeatureNMS: Non-maximum suppression by learning feature embeddings. In: International conference on pattern recognition, pp 7848–7854

  54. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: IEEE International conference on computer vision, pp 618–626

  55. Triantafyllidou D, Tefas A (2016) Face detection based on deep convolutional neural networks exploiting incremental facial part learning. In: International conference on pattern recognition, pp 3560–3565

  56. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: IEEE Conference on computer vision and pattern recognition, pp 7794–7803

  57. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: IEEE Conference on computer vision and pattern recognition, pp 3156–3164

  58. Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: IEEE Conference on computer vision and pattern recognition, pp 6897–6906

  59. Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: Detecting pedestrians in a crowd. In: IEEE Conference on computer vision and pattern recognition, pp 7774–7783

  60. Wang J, Yuan Y, Yu G (2017) Face attention network: An effective face detector for the occluded faces. arXiv:1711.07246

  61. Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L-C (2020) Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In: European conference on computer vision, pp 108–126

  62. Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional Block attention module. In: European conference on computer vision, pp 3–19

  63. Xia BN, Gong Y, Zhang Y, Poellabauer C (2019) Second-order non-local attention networks for person re-identification. In: IEEE International conference on computer vision, pp 3760–3769

  64. Xia Z, Peng W, Khor H-Q, Feng X, Zhao G (2020) Revealing the invisible with model and data shrinking for composite-database micro-expression recognition. IEEE Trans Image Process 29:8590–8605

    Article  MATH  Google Scholar 

  65. Yang C, Ablavsky V, Wang K, Feng Q, Betke M (2020) Learning to separate: Detecting heavily-occluded objects in urban scenes. In: European conference on computer vision, pp 530–546

  66. Yang S, Luo P, Loy CC, Tang X (2016) WIDER FACE: A face detection benchmark. In: IEEE Conference on computer vision and pattern recognition, pp 5525–5533

  67. Yang S, Luo P, Loy CC, Tang X (2017) Faceness-net: Face detection through deep facial part responses. IEEE Trans Pattern Anal Mach Intell 40(8):1845–1859

    Article  Google Scholar 

  68. Yang L, Zhang R-Y, Li L, Xie X (2021) SimAM: A simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning, pp 11863–11874

  69. Yang Z, Zhu L, Wu Y, Yang Y (2020) Gated channel transformation for visual recognition. In: IEEE Conference on computer vision and pattern recognition, pp 11794–11803

  70. Yu X, Fu Y, Liu T (2017) Face detection: a deep convolutional network method based on grouped facial part. In: IEEE Advanced information technology, electronic and automation control conference, pp 515–519

  71. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In: IEEE Conference on computer vision and pattern recognition, pp 1857–1866

  72. Zeng D, Veldhuis R, Spreeuwers L (2021) A survey of face recognition techniques under occlusion. IET Biometrics 10(6):581–606

    Article  Google Scholar 

  73. Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: IEEE Conference on computer vision and pattern recognition, pp 7151–7160

  74. Zhang T, Li J, Jia W, Sun J, Yang H (2018) Fast and robust occluded face detection in atm surveillance. Pattern Recogn Lett 107:33–40

    Article  Google Scholar 

  75. Zhang J, Lin L, Zhu J, Li Y, Chen Y-c, Hu Y, Hoi CS (2020) Attribute-aware pedestrian detection in a crowd. IEEE Transactions on Multimedia, 1–1

  76. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-CNN: detecting pedestrians in a crowd. In: European conference on computer vision, pp 637–653

  77. Zhang S, Wen L, Shi H, Lei Z, Lyu S, Li SZ (2019) Single-shot scale-aware network for real-time face detection. Int J Comput Vis 127 (6):537–559

    Article  Google Scholar 

  78. Zhang K, Xiong F, Sun P, Hu L, Li B, Yu G (2019) Double anchor R-CNN for human detection in a crowd. arXiv:1909.09998

  79. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  80. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: IEEE Conference on computer vision and pattern recognition, pp 6848–6856

  81. Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3FD: Single shot scale-invariant face detector. In: IEEE International conference on computer vision, pp 192–201

  82. Zhao H, Ying X, Shi Y, Tong X, Wen J, Zha H (2020) RDCFAce: Radial distortion correction for face recognition. In: IEEE Conference on computer vision and pattern recognition, pp 7721–7730

  83. Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) PSANEt: Point-wise spatial attention network for scene parsing. In: European conference on computer vision, pp 267–283

  84. Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: IEEE Conference on computer vision and pattern recognition, pp 2879–2886

  85. Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. In: IEEE International conference on computer vision, pp 593–602

Download references

Acknowledgments

This work is partially supported by the Key Research and Development Program of Shaanxi, China (Program No. 2021ZDLGY15-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huifang Li.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Data availability statement

The MAFA dataset, WIDER FACE dataset, and FDDB dataset used in the study are publicly available. The MAFA dataset can be downloaded from the website: http://www.escience.cn/people/geshiming/mafa.html. The WIDER FACE dataset can be downloaded from the website: http://shuoyang1213.me/WIDERFACE/. The FDDB dataset can be downloaded from the website: http://vis-www.cs.umass.edu/fddb/index.html.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, M., Li, H. & Xia, Z. Hybrid attention network and center-guided non-maximum suppression for occluded face detection. Multimed Tools Appl 82, 15143–15170 (2023). https://doi.org/10.1007/s11042-022-13999-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13999-2

Keywords

Navigation