Abstract
As a research hotspot of computer vision and information security, face detection has been widely developed in the past few decades. However, most of the existing detection methods only realize the location of the bounding box, which leads to background noise in the face features as well as limited accuracy of detection. To overcome these drawbacks, a face detection and segmentation method with Generalized Intersection over Union (GIoU) based on Mask R-CNN is proposed in this paper, which is called G-Mask. In this method, ResNet-101 is used to extract features, RPN is used to generate RoIs, and RoIAlign faithfully retains the exact spatial locations to generate binary mask through Fully Convolution Network. In particular, to achieve better performance in multi-scale face detection tasks, we utilize GIoU as the bounding box loss function. Furthermore, a new face dataset with segmentation annotation information is constructed in this paper to train the model. The experimental results of the well-known benchmark FDDB and AFW show that the proposed G-Mask method achieves promising face detection performance compared with Faster R-CNN and the original Mask R-CNN method, and also can realize the instance-level face information segmentation while detecting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R.,. Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)
Zhang, K., Zhang, Z., Li, Z., et al.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Wang, Z., Ren, J., et al.: A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing 287, 68–83 (2018)
Jain, V., Learned-Miller, E.: FDDB: a benchmark for facedetection in unconstrained settings. Technical report UM-CS-2010-009, University of Massachusetts, Amherst (2010)
Wong, Y., Chen, S., Mau, S., et al.: Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 74–81 (2011)
Ramanan, D., Zhu, X.: Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886 (2012)
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vision 57(2), 137–154 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12, 2037–2041 (2006)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Zhan, S., Tao, Q.Q., Li, X.H.: Face detection using representation learning. Neurocomputing 187, 19–26 (2016)
Ranjan, R., Patel, V.M., Chellappa, R.: A deep pyramid deformable part model for face detection. In: IEEE International Conference on Biometrics Theory, Applications and Systems, pp. 1–8 (2015)
Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 650–657 (2017)
Wu, W., Yin, Y., Wang, X., Xu, D.: Face detection with different scales based on faster R-CNN. IEEE Trans. Cybern. 99, 1–12 (2018)
Sun, X., Wu, P., Hoi, S.C.: Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299, 42–50 (2018)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Markuš, N., Frljak, M., Pandzic, I.S., et al.: A method for object detection based on pixel intensity comparisons organized in decision trees. arXiv preprint arXiv:1305.4537, 8 (2013)
Jensen, O.H.: Implementing the Viola-Jones face detection algorithm. Master’s thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark (2008)
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Robust face detection by simple means. In: Computer Vision in Applications Workshop (2012)
Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 720–735. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_47
Ren, J., Vlachos, T., Zhang, Y., Zheng, J., Jiang, J.: Gradient-based subspace phase correlation for fast and effective image alignment. J. Vis. Commun. Image Represent. 25(7), 1558–1565 (2014)
Zabalza, J., et al.: Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 185, 1–10 (2016)
Acknowledgement
This work was partly supported by National Natural Science Foundation of China (61772144, 61672008), Innovation Team Project (Natural Science) of the Education Department of Guangdong Province (2017KCXTD021), Foundation for Youth Innovation Talents in Higher Education of Guangdong Province (2018KQNCX139), Innovation Research Project (Natural Science) of Education Department of Guangdong Province (2016KTSCX077), Project for Distinctive Innovation of Ordinary Universities of Guangdong Province (2018KTSCX120), and Foreign Science and Technology Cooperation Plan Project of Guangzhou Science Technology and Innovation Commission (201807010059).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, K. et al. (2020). Face Detection and Segmentation with Generalized Intersection over Union Based on Mask R-CNN. In: Ren, J., et al. Advances in Brain Inspired Cognitive Systems. BICS 2019. Lecture Notes in Computer Science(), vol 11691. Springer, Cham. https://doi.org/10.1007/978-3-030-39431-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-39431-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39430-1
Online ISBN: 978-3-030-39431-8
eBook Packages: Computer ScienceComputer Science (R0)