Abstract
The anchor-free based face detection methods can cover a large range of scales and perform better in the speed. However, their performance still bears a large gap compared with anchor-based methods, especially for detecting small faces. Because they are troubled by the context modeling and scale imbalance problems. In this study, to address these problems, we propose a novel single shot anchor-free face detector (SAFD) for detecting multi-scale faces by leveraging the multi-scale context aware information of multi-layer features. In the SAFD, we use the dilated convolution layers and attention mechanism to select the informative features that can accommodate to different scales. We also propose a scale-aware sampling strategy to mitigate the scale imbalance problem by adaptivity selecting the positive training samples. The experimental results on two public benchmark datasets, Wider Face and FDDB dataset, demonstrate that our SAFD can achieve competitive performance with the anchor-based detectors while with lower computation cost.













Similar content being viewed by others
References
Atallah RR, Kamsin A, Ismail MA, Abdelrahman SA, Zerdoumi S (2018) Face recognition and age estimation implications of changes in facial features: A critical review study. IEEE Access 6:28290–28304
Bai Y, Zhang Y, Ding M, Ghanem B (2018) Finding tiny faces in the wild with generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 21–30
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS – improving object detection with one line of code. In: In Proceedings of the IEEE international conference on computer vision (ICCV), pp. 5561–5569
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proc. ECCV, pp 354–370
Chen Z, Huang S, Tao D (2018) Conext refinement for object detection. In: Proc. ECCV
Chu W, Cai D (2018) Deep feature based contextual model for object detection. Neurocomputing 275:1035–1042
Dakhia A, Wang T, Lu H (2019) Multi-scale pyramid pooling network for salient object detection. Neurocomputing 333:211–220
Deng J, Guo J, Zhou Y, Yu J, Kotsia I, Zafeiriou S (2019) Retinaface: Single-stage dense face localisation in the wild. arXiv: Computer Vision and Pattern Recognition
Farfade SS, Saberian MJ, Li L-J (2015) Multi-view face detection using deep convolutional neural networks. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR), pp. 643–650
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9):1627–1645
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: In Proceedings of the IEEE international conference on computer vision (ICCV), pp. 1134–1142
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 580–587
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proc. ECCV, pp 346–361
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. CVPR, pp 770–778
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proc. CVPR, pp 7132–7141
Hu P, Ramanan D (2017) Finding tiny faces. In: Proc. CVPR, pp 1522–1530
Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: Unifying landmark localization with end to end object detection
Jain V, Learned-Miller E (2010) Fddb: A benchmark for face detection in unconstrained settings. Tech. rep., Technical Report UM-CS-2010-009, University of Massachusetts, Amherst
Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: Proc. FG, pp 650–657
Li J, Wang Y, Wang C, Tai Y, Qian J, Yang J, Wang C, Li J, Huang F (2019) Dsfd: Dual shot face detector. pp 5060–5069
Li Y, Sun B, Wu T, Wang Y (2016) Face detection with end-to-end integration of a convnet and a 3d model. In: European Conference on Computer Vision (ECCV), pp. 420–436. Springer, Cham
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proc. CVPR
Liu N, Han J, Liu T, Li X (2018) Learning to predict eye fixations via multiresolution convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 29(2):392–404
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Proc. ECCV, pp 21–37
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proc. ICCV, pp 3730–3738
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. CVPR, pp 3431–3440
Mathias M, Benenson R, Pedersoli M, VanGool L (2014) Face detection without bells and whistles. Proc. ECCV 8692:720–735
Najibi M, Samangouei P, Chellappa R, Davis LS (2017) Ssh: Single stage headless face detector. In: Proc. ICCV, pp 4875–4884
Ranjan R, Patel VM, Chellappa R (2017) Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(1):121–135
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proc. CVPR, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proc. NeurIPS, pp 91–99
Shen Y, Ji R, Wang C, Li X, Li X (2018) Weakly supervised object detection via object-specific pixel gradient. IEEE Trans. Neural Netw. Learn. Syst. 29(12):5960–5970
Shi B, Bai X, Liu W, Wang J (2018) Face alignment with deep regression. IEEE Trans. Neural Netw. Learn. Syst. 29(1):183–194
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sun X, Wu P, Hoi S CH (2018) Face detection using deep learning: An improved faster rcnn approach. Neurocomputing 299:42–50
Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: A context-assisted single shot face detector. In: Proc. ECCV, pp 797–813
Viola P, Jones MJ (2001) Robust real-time face detection
Wang C, Luo Z, Lian S, Li S (2018) Anchor free network for multi-scale face detection. In: Proc. ICPR, pp 1554–1559
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 3156–3164
Woo S, Park J, Lee J-Y, SoKweon I (2018) Cbam: Convolutional block attention module. In: Proc. ECCV, pp 3–19
Yan J, Lei Z, Wen L, Li SZ (2014) The fastest deformable part model for object detection. In: Proc. CVPR, pp 2497–2504
Yang B, Yan J, Lei Z, Li SZ (2014) Aggregate channel features for multi-view face detection. In: Proc. IJCB, pp 1–8
Yang S, Luo P, Loy C-C, Tang X (2015) From facial parts responses to face detection: A deep learning approach. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 3676–3684
Yang S, Luo P, Loy C-C, Tang X (2016) Wider face: A face detection benchmark. In: Proc. CVPR, pp 5525–5533
Yang S, Xiong Y, Loy CC, Tang X (2017) Face detection through scale-friendly deep convolutional networks. arXiv: Computer Vision and Pattern Recognition
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. preprint arXiv:1511.07122
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: An advanced object detection network. In: Proc. ACM, pp 516–520. ACM
Zagoruyko S, Lerer A, Lin T-Y, Pinheiro PO, Gross S, Chintala S, Dollár P (2016) A multipath network for object detection. In: Proc. BMVC
Zeng X, Ouyang W, Yang B, Yan J, Wang X (2016) Gated bi-directional cnn for object detection. In: Proc.ECCV, pp 354–369
Zhai Y, Fu J, Lu Y, Li H (2018) Feature selective networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4139–4147
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23(10):1499–1503
Zhang K, Zhang Z, Wang H, Li Z, Qiao Y, Liu W (2017) Detecting faces using inside cascaded ual cnn. In: Proc. ICCV
Zhang L, Dai J, Lu H, He Y, Wang G (2018) A bi-directional message passing model for salient object detection. In: Proc. CVPR, pp 1741–1750
Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S ˆ 3fd: Single shot scale-invariant face detector. In: Proc. ICCV, pp 192–201
Zhang S, Zhu X, Lei Z, Wang X, Shi H, Li SZ (2018) Detecting face with densely connected face proposal network. Neurocomputing 284:119–127
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proc. CVPR, pp 2881–2890
Zhu C, Tao R, Luu K, Savvides M (2018) Seeing small faces from robust anchor’s perspective. In: Proc. CVPR, pp 5127–5136
Zhu C, Zheng Y, Luu K, Savvides M (2017) Cms-rcnn: contextual multi-scale region-based cnn for unconstrained face detection. In: Deep learning for biometrics, pp 57–79
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Proc. CVPR, pp 2879–2886
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, C., Luo, Z., Zhong, Z. et al. SAFD: single shot anchor free face detector. Multimed Tools Appl 80, 13761–13785 (2021). https://doi.org/10.1007/s11042-020-10401-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10401-x