Abstract
Face detection has broad applications. Recently, there has been lots of advancement in face detection based on deep learning methods. However, small face detection in a real-world environment is still a challenging task due to its low resolution, variability in size, different poses and occlusions. YOLOv3 is one of the main approaches for object detection, which has achieved comparatively better performance for small target detection in real-time. However, it still struggles to detect a group of small size faces with inaccurate localization as well as an increasing number of false positives. In this paper, we propose an efficient multiscale deep learning network based on YOLOv3 to detect a group of small faces. First, we select the optimum number of anchors, and this will help us understand the small face targets better; secondly, we change the bounding box regression loss in the YOLOv3 to a new CIoU loss to improve the false positives; thirdly, we extend the detection scale from 3 to 4 in YOLOv3 especially for detecting small faces; fourthly, we simplify the four convolutional layers to two residual blocks from six convolutional layers in each detection scale to avoid the derivative vanishing. The proposed model can achieve the state-of-the-art performance on the WIDER FACE face detection benchmark, especially in the hard subset that has a high number of small faces with the variability of scale, poses and occlusions. Our model has achieved 86.5%AP in the WIDER FACE hard validation subset compared to 72.9%AP by the YOLOv3. The run-time is also satisfactory for real application for VGA resolution image with 64.3 FPS using the Nvidia Titan RTX.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. https://arxiv.org/abs/1804.02767. Accessed 8 Aug 2019
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Da, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems 29 (NIPS 2016) (2016)
Zhang, S., Zhu, X., Lei, Z., Shi, H.: S3FD: single shot scale-invariant face detector. https://doi.org/10.1109/iccv.2017.30
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.: SSH: single-stage headless face detector. In: ICCV, pp. 4885–4894 (2017)
Wang, H., Li, Z., Ji, X., Wang, Y.: Face R-CNN. arXiv:1706.01061 (2017)
Zhu, C., Zheng, Y., Luu, K., Savvides, M.: CMS-RCNN: contextual multi-scale region-based CNN for unconstrained face detection. In: Bhanu, B., Kumar, A. (eds.) Deep Learning for Biometrics. ACVPR, pp. 57–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61657-5_3
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Yang, S., Luo, P., Loy, C.-C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: ICCV (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: CVPR, pp. 5525– 5533 (2016)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. In: AAAI Conference on Artificial Intelligence (AAAI) (2020)
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR, pp. 532–539 (2013)
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: CVPR, pp. 146–155 (2016)
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR, pp. 1891–1898 (2014)
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision Conference, vol. 1, p. 6 (2015)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR, vol. 1, p. I–511. IEEE (2001)
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 37(9), 1904–1916 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 91–99 (2015)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)
Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. https://arxiv.org/abs/1701.06659. Accessed 8 Aug 2019
Navneet, D., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR) (2005)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: CVPR (2015)
Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. arXiv:1606.03473 (2016)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Tang, X., Du, D.K., He, Z., Liu, J.: PyramidBox: a context-assisted single shot face detector. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 812–828. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_49
Yang, S., Luo, P., Loy, C.C., Tang, X.: Faceness-Net: face detection through deep facial part responses. IEEE Trans. Pattern Anal. Mach. Intell. 40(8), 1845–1859 (2018)
Ju, M., Luo, H., Wang, Z., Hui, B., Chang, Z.: The application of improved YOLO V3 in multiscale target detection. Appl. Sci. 9, 3775 (2019). https://doi.org/10.3390/app9183775
Jocher, G.: Ultralytics LLC YOLOv3. https://github.com/ultralytics/yolov3
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tuli, S.H., Mao, A., Liu, W. (2020). A Novel Face Detector Based on YOLOv3. In: Gallagher, M., Moustafa, N., Lakshika, E. (eds) AI 2020: Advances in Artificial Intelligence. AI 2020. Lecture Notes in Computer Science(), vol 12576. Springer, Cham. https://doi.org/10.1007/978-3-030-64984-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-64984-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64983-8
Online ISBN: 978-3-030-64984-5
eBook Packages: Computer ScienceComputer Science (R0)