Abstract
Face detection is one of the important tasks of object detection. Typically detection is the first stage of pattern recognition and identity authentication. In recent years, deep learning-based algorithms in object detection have grown rapidly. These algorithms can be generally divided into two categories, i.e., two-stage detector like Faster R-CNN and one-stage detector like YOLO. Although YOLO and its varieties are not so good as two-stage detectors in terms of accuracy, they outperform the counterparts by a large margin in speed. YOLO performs well when facing normal size objects, but is incapable of detecting small objects. The accuracy decreases notably when dealing with objects that have large-scale changing like faces. Aimed to solve the detection problem of varying face scales, we propose a face detector named YOLO-face based on YOLOv3 to improve the performance for face detection. The present approach includes using anchor boxes more appropriate for face detection and a more precise regression loss function. The improved detector significantly increased accuracy while remaining fast detection speed. Experiments on the WIDER FACE and the FDDB datasets show that our improved algorithm outperforms YOLO and its varieties.
Similar content being viewed by others
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc., New York (2012)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (1), vol. 1, pp. 511–518 (2001)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010). https://doi.org/10.1109/tpami.2009.167
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Girshick, R.: Fast r-CNN. arXiv preprint arXiv:1504.08083 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing, pp. 91–99 (2015)
Dai, J., Li, Y., He, K., Sun, J.: R-fcn: object detection via region-based fully convolutional networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 379–387. Curran Associates, Inc., New York (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single Shot MultiBox Detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 21–37 (2016)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). arXiv preprint arXiv:1804.02762
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollar, P., Zitnick, L.C.: Microsoft COCO captions: data collection and evaluation server (2015). arXiv preprint arXiv:1504.00325
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Viola, P., Jones, M.J.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Schmid, C., Soatto, S., Tomasi, C. (eds.) International Conference on Computer Vision and Pattern Recognition (CVPR ’05), vol. 1, pp. 886–893. IEEE Computer Society, San Diego (2005)
Cai, Z., Vasconcelos, N.: Cascade r-CNN: delving into high quality object detection. In: The IEEE Conference on Computer Vision and Pattern (2018)
Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 3, pp 850–855 (2006). https://doi.org/10.1109/ICPR.2006.479
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Huang, L., Yang, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection (2015). arXiv preprint arXiv:1509.04874
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Faceness-net: face detection through deep facial part responses. IEEE Trans. Pattern Anal. Mach. Intell. 40(8), 1845–1859 (2018). https://doi.org/10.1109/TPAMI.2017.2738644
Hu, P., Ramanan, D.: Finding tiny faces. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2017)
Wang, H., Li, Z., Ji, X., Wang, Y.: Face R-CNN (2017). arXiv preprint arXiv:1706.01061
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage headless face detector. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Zhang, C., Xu, X., Tu, D.: Face detection using improved faster RCNN (2018). arXiv preprint arXiv:1802.02142
Wang, Y., Ji, X., Zhou, Z., Wang, H., Li, Z.: Detecting faces using region-based fully convolutional networks (2017). arXiv preprint arXiv:1709.05256
Wang, J., Yuan, Y., Yu, G.: Face attention network: an effective face detector for the occluded faces (2017). arXiv preprint arXiv:1711.07246
Tang, X., Du, D.K., He, Z., Liu, J.: Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 797–813 (2018)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Acknowledgements
We wish to acknowledge Qinglin Ran, Kuo Zhang and Canwei Zhang for their advices and discussions for this work.
Funding
This work is supported by the Beijing municipal education committee scientific and technological planning Project (KM201811232024, KM201611232022) and Beijing excellent talents youth backbone Project (9111524401).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, W., Huang, H., Peng, S. et al. YOLO-face: a real-time face detector. Vis Comput 37, 805–813 (2021). https://doi.org/10.1007/s00371-020-01831-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-020-01831-7