Abstract
The paper deals with an approach for a reliable dogface detection in an image using the convolutional neural networks. Two detectors were trained on a dataset containing 8351 real-world images of different dog breeds. The first detector achieved the average precision equal to 0.79 while running real-time on single CPU, the second one achieved the average precision equal to 0.98 but more time for processing is necessary. Consequently, the facial landmark detector using the cascade of regressors was proposed based on those, which are commonly used in human face detection. The proposed algorithm is able to detect dog’s eyes, a muzzle, a top of the head and inner bases of the ears with the 0.05 median location error normalized by the inter-ocular distance. The proposed two-step technique – a dogface detection with following facial landmark detector - could be utilized for a dog breeds identification and consequent auto-tagging and image searches. The paper demonstrates a real-world application of the proposed technique – a successful supporting system for taking pictures of dogs facing the camera.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The models are pre-trained on big public datasets and are available on-line. For example, the list of checkpoints provided by Tensorflow Object Detection API can be found at https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md.
- 2.
The dataset is available at http://faceserv.cs.columbia.edu/DogData/.
- 3.
COCO dataset – Common Objects in Context, available online at http://cocodataset.org/.
- 4.
VOC2012 dataset – Visual Object Classes Challenge 2012, available online at http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html.
- 5.
arXiv is an e-print service operated by Cornell University. It can be reached via website https://arxiv.org/.
- 6.
The ReLU6 activation function counts min(max(features, 0), 6), for more information about this technique please see the manuscript [13].
- 7.
atrous convolution, also known as convolution with holes or dilated convolution, based on the French word “trous” meaning holes in English. For a description of atrous convolution and how it can be used for dense feature extraction, please see [16].
- 8.
For more info about Dlib library visit http://blog.dlib.net/.
References
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., et al.: Speed/Accuracy trade-offs for modern convolutional object detectors. In: CVPR, vol. abs/1611.10012, pp. 3296–3297. IEEE (2017). https://doi.org/10.1109/cvpr.2017.351
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Han, S., Pool, J., Narang, S., Mao, H., Gong, E., Tang, S., et al.: DSD: dense-sparse-dense training for deep neural networks. CoRR, vol. abs/1607.04381 (2016)
Liu, J., Kanazawa, A., Jacobs, D., Belhumeur, P.: Dog breed classification using part localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 172–185. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_13
Spady, T.C., Ostrander, E.A.: Canine behavioral genetics: pointing out the phenotypes and herding up the genes. Am. J. Hum. Genet. 82, 10–18 (2008). https://doi.org/10.1016/j.ajhg.2007.12.001
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, p. 893, June 2005. https://doi.org/10.1109/cvpr.2005.177
Cuimei, L., Zhiliang, Q., Nan, J., Jianhua, W.: Human face detection algorithm via Haar cascade classifier combined with three additional classifiers. In: ICEMI, pp. 483–487. IEEE, (2017). https://doi.org/10.1109/icemi.2017.8265863
Parkhi, O.M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: The truth about cats and dogs. In: 2011 International Conference on Computer Vision, pp. 1427–1434, November 2011. https://doi.org/10.1109/iccv.2011.6126398
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V.: Cats and dogs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3498–3505, June 2012. https://doi.org/10.1109/cvpr.2012.6248092
Zhang, W., Sun, J., Tang, X.: Cat head detection - how to effectively exploit shape and texture features. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 802–816. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_59
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)
Ranzato, M., Krizhevsky, A., Hinton, G.E.: Convolutional Deep Belief Networks on CIFAR-10, Toronto (2010). Unpublished manuscript
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1 (2016). https://doi.org/10.1109/tpami.2016.2577031
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016. https://doi.org/10.1109/cvpr.2016.90
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: Computer Vision and Pattern Recognition, vol. abs/1412.7062 (2014)
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874, June 2014. https://doi.org/10.1109/cvpr.2014.241
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7. ISBN 978-0-387-84858-7
Acknowledgements
This work was supported by the Ministry of Education, Youth and Sports of the Czech Republic within the National Sustainability Programme Project no. LO1303 (MSMT-7778/2014), further by the European Regional Development Fund under the Project CEBIA-Tech no. CZ.1.05/2.1.00/03.0089 and by Internal Grant Agency of Tomas Bata University under the Projects no. IGA/CebiaTech/2018/003. This work is also based upon support by COST (European Cooperation in Science & Technology) under Action CA15140, Improving Applicability of Nature-Inspired Optimisation by Joining Theory and Practice (ImAppNIO), and Action IC1406, High-Performance Modelling and Simulation for Big Data Applications (cHiPSet). The work was further supported by resources of A.I. Lab at the Faculty of Applied Informatics, Tomas Bata University in Zlin (ailab.fai.utb.cz).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Vlachynska, A., Oplatkova, Z.K., Turecek, T. (2019). Dogface Detection and Localization of Dogface’s Landmarks. In: Silhavy, R. (eds) Artificial Intelligence and Algorithms in Intelligent Systems. CSOC2018 2018. Advances in Intelligent Systems and Computing, vol 764. Springer, Cham. https://doi.org/10.1007/978-3-319-91189-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-91189-2_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91188-5
Online ISBN: 978-3-319-91189-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)