Abstract
Object detection is the most fundamental problem in computer vision, which receives unprecedented attention in past decades. Although state-of-the-art detectors achieve promising results on public evaluation datasets, they may still suffer performance degradation problems dealing with some specific challenging factors. In contrast, evidences suggest that human vision system exhibits greater robustness and reliability compared with the deep neural networks. In this paper, we study and compare deep neural networks and human visual system on generic object detection task. The purpose of this paper is to evaluate the capability of DNN detectors benchmarked against human, with a proposed object detection Turing test. Experiments show that there still exists a tremendous performance gap between object detectors and human vision system. DNN detectors encounter more difficulties to effectively detect uncommon, camouflaged or sheltered objects compared with human. It’s hoped that findings of this paper could facilitate studies on object detection. We also envision that human visual capability is expected to become a valuable reference to evaluate DNN detectors in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)
Borji, A., Itti, L.: Human vs. computer in scene and object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113–120 (2014)
Chow, K.H., Liu, L., Gursoy, M.E., Truex, S., Wei, W., Wu, Y.: Understanding object detection through an adversarial lens. In: European Symposium on Research in Computer Security, pp. 460–481 (2020)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Dodge, S., Karam, L.: Can the early human visual system compete with deep neural networks? In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2798–2804 (2017)
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Fan, D.P., Ji, G.P., Sun, G., Cheng, M.M., Shen, J., Shao, L.: Camouflaged object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2777–2787 (2020)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Firestone, C.: Performance vs. competence in human-machine comparisons. Proc. Natl. Acad. Sci. 117(43), 26562–26571 (2020)
Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W.: Are you talking to a machine? dataset and methods for multilingual image question. Adv. Neural Inf. Process. Syst. 28, 2296–2304 (2015)
Geirhos, R., Janssen, D.H., SchĂ¼tt, H.H., Rauber, J., Bethge, M., Wichmann, F.A.: Comparing deep neural networks against humans: object recognition when the signal gets weaker. arXiv preprint arXiv:1706.06969 (2017)
Geirhos, R., Temme, C.R.M., Rauber, J., SchĂ¼tt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. arXiv preprint arXiv:1808.08750 (2018)
Geman, D., Geman, S., Hallonquist, N., Younes, L.: Visual turing test for computer vision systems. Proc. Natl. Acad. Sci. 112(12), 3618–3623 (2015)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Hu, B.G., Dong, W.M.: A design of human-like robust ai machines in object identification. arXiv preprint arXiv:2101.02327 (2021)
Huang, K.Q., Tan, T.N.: Review on computational model for vision. Pattern Recogn. Artif. Intell. 26(10), 951–958 (2013)
Kuznetsova, A., et al.: The open images dataset v4. Int. J. Comput. Vis. 128, 1–26 (2020)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lin, T.Y., Goyal, P., Girshick, R., He, K., DollĂ¡r, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, L., et al.: Deep learning for generic object detection: a survey. Int. J. Comput. Vis. 128(2), 261–318 (2020)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Loh, Y.P., Chan, C.S.: Getting to know low-light images with the exclusively dark dataset. Comput. Vis. Image Underst 178, 30–42 (2019)
Malinowski, M., Fritz, M.: A multi-world approach to question answering about real-world scenes based on uncertain input. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 1682–1690 (2014)
Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1–9 (2015)
Michaelis, C., et al.: Benchmarking robustness in object detection: autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484 (2019)
Qi, H., Wu, T., Lee, M.W., Zhu, S.C.: A restricted visual turing test for deep scene and event understanding. arXiv preprint arXiv:1512.01715 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Shan, Q., Adams, R., Curless, B., Furukawa, Y., Seitz, S.M.: The visual turing test for scene reconstruction. In: International Conference on 3D Vision, pp. 25–32 (2013)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. I (2001)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Xue, A.: End-to-end Chinese landscape painting creation using generative adversarial networks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3863–3871 (2021)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)
Zheng, N.N., et al.: Hybrid-augmented intelligence: collaboration and cognition. Front. Inf. Technol. Electron. Eng. 18(2), 153–179 (2017)
Zhou, Z., Firestone, C.: Humans can decipher adversarial images. Nature Commun. 10(1), 1–9 (2019)
Zhu, P., et al.: Visdrone-det2018: the vision meets drone object detection in image challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Zhu, W., Wang, X., Gao, W.: Multimedia intelligence: when multimedia meets artificial intelligence. IEEE Trans. Multimedia 22(7), 1823–1835 (2020)
Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. arXiv preprint arXiv:1905.05055 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Q., Zhang, J., Zhao, X., Huang, K. (2021). Can DNN Detectors Compete Against Human Vision in Object Detection Task?. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-88004-0_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88003-3
Online ISBN: 978-3-030-88004-0
eBook Packages: Computer ScienceComputer Science (R0)