YOLO-face: a real-time face detector

Chen, Weijun; Huang, Hongbo; Peng, Shuai; Zhou, Changsheng; Zhang, Cuiping

doi:10.1007/s00371-020-01831-7

YOLO-face: a real-time face detector

Original Article
Published: 12 March 2020

Volume 37, pages 805–813, (2021)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Weijun Chen¹,
Hongbo Huang^1,2,
Shuai Peng¹,
Changsheng Zhou^1,2 &
…
Cuiping Zhang^1,2

8022 Accesses
135 Citations
1 Altmetric
Explore all metrics

Abstract

Face detection is one of the important tasks of object detection. Typically detection is the first stage of pattern recognition and identity authentication. In recent years, deep learning-based algorithms in object detection have grown rapidly. These algorithms can be generally divided into two categories, i.e., two-stage detector like Faster R-CNN and one-stage detector like YOLO. Although YOLO and its varieties are not so good as two-stage detectors in terms of accuracy, they outperform the counterparts by a large margin in speed. YOLO performs well when facing normal size objects, but is incapable of detecting small objects. The accuracy decreases notably when dealing with objects that have large-scale changing like faces. Aimed to solve the detection problem of varying face scales, we propose a face detector named YOLO-face based on YOLOv3 to improve the performance for face detection. The present approach includes using anchor boxes more appropriate for face detection and a more precise regression loss function. The improved detector significantly increased accuracy while remaining fast detection speed. Experiments on the WIDER FACE and the FDDB datasets show that our improved algorithm outperforms YOLO and its varieties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc., New York (2012)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (1), vol. 1, pp. 511–518 (2001)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010). https://doi.org/10.1109/tpami.2009.167
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Girshick, R.: Fast r-CNN. arXiv preprint arXiv:1504.08083 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing, pp. 91–99 (2015)
Dai, J., Li, Y., He, K., Sun, J.: R-fcn: object detection via region-based fully convolutional networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 379–387. Curran Associates, Inc., New York (2016)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single Shot MultiBox Detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 21–37 (2016)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). arXiv preprint arXiv:1804.02762
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollar, P., Zitnick, L.C.: Microsoft COCO captions: data collection and evaluation server (2015). arXiv preprint arXiv:1504.00325
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Viola, P., Jones, M.J.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Schmid, C., Soatto, S., Tomasi, C. (eds.) International Conference on Computer Vision and Pattern Recognition (CVPR ’05), vol. 1, pp. 886–893. IEEE Computer Society, San Diego (2005)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade r-CNN: delving into high quality object detection. In: The IEEE Conference on Computer Vision and Pattern (2018)
Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 3, pp 850–855 (2006). https://doi.org/10.1109/ICPR.2006.479
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Huang, L., Yang, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection (2015). arXiv preprint arXiv:1509.04874
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Faceness-net: face detection through deep facial part responses. IEEE Trans. Pattern Anal. Mach. Intell. 40(8), 1845–1859 (2018). https://doi.org/10.1109/TPAMI.2017.2738644
Article Google Scholar
Hu, P., Ramanan, D.: Finding tiny faces. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2017)
Wang, H., Li, Z., Ji, X., Wang, Y.: Face R-CNN (2017). arXiv preprint arXiv:1706.01061
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage headless face detector. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Zhang, C., Xu, X., Tu, D.: Face detection using improved faster RCNN (2018). arXiv preprint arXiv:1802.02142
Wang, Y., Ji, X., Zhou, Z., Wang, H., Li, Z.: Detecting faces using region-based fully convolutional networks (2017). arXiv preprint arXiv:1709.05256
Wang, J., Yuan, Y., Yu, G.: Face attention network: an effective face detector for the occluded faces (2017). arXiv preprint arXiv:1711.07246
Tang, X., Du, D.K., He, Z., Liu, J.: Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 797–813 (2018)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Article Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2014)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

Download references

Acknowledgements

We wish to acknowledge Qinglin Ran, Kuo Zhang and Canwei Zhang for their advices and discussions for this work.

Funding

This work is supported by the Beijing municipal education committee scientific and technological planning Project (KM201811232024, KM201611232022) and Beijing excellent talents youth backbone Project (9111524401).

Author information

Authors and Affiliations

Computer School, Beijing Information Science and Technology University, Beijing, 100101, China
Weijun Chen, Hongbo Huang, Shuai Peng, Changsheng Zhou & Cuiping Zhang
Institute of Computing Intelligence, Beijing Information Science and Technology University, Beijing, 100192, China
Hongbo Huang, Changsheng Zhou & Cuiping Zhang

Authors

Weijun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hongbo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Peng
View author publications
You can also search for this author in PubMed Google Scholar
Changsheng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Cuiping Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongbo Huang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, W., Huang, H., Peng, S. et al. YOLO-face: a real-time face detector. Vis Comput 37, 805–813 (2021). https://doi.org/10.1007/s00371-020-01831-7

Download citation

Published: 12 March 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00371-020-01831-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

YOLO-face: a real-time face detector

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

YOLO-face: a real-time face detector

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation