Skip to main content

Can DNN Detectors Compete Against Human Vision in Object Detection Task?

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13019))

Included in the following conference series:

Abstract

Object detection is the most fundamental problem in computer vision, which receives unprecedented attention in past decades. Although state-of-the-art detectors achieve promising results on public evaluation datasets, they may still suffer performance degradation problems dealing with some specific challenging factors. In contrast, evidences suggest that human vision system exhibits greater robustness and reliability compared with the deep neural networks. In this paper, we study and compare deep neural networks and human visual system on generic object detection task. The purpose of this paper is to evaluate the capability of DNN detectors benchmarked against human, with a proposed object detection Turing test. Experiments show that there still exists a tremendous performance gap between object detectors and human vision system. DNN detectors encounter more difficulties to effectively detect uncommon, camouflaged or sheltered objects compared with human. It’s hoped that findings of this paper could facilitate studies on object detection. We also envision that human visual capability is expected to become a valuable reference to evaluate DNN detectors in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. https://www.flir.com/oem/adas/adas-dataset-form/

  2. Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)

    Article  Google Scholar 

  3. Borji, A., Itti, L.: Human vs. computer in scene and object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113–120 (2014)

    Google Scholar 

  4. Chow, K.H., Liu, L., Gursoy, M.E., Truex, S., Wei, W., Wu, Y.: Understanding object detection through an adversarial lens. In: European Symposium on Research in Computer Security, pp. 460–481 (2020)

    Google Scholar 

  5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)

    Google Scholar 

  6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  7. Dodge, S., Karam, L.: Can the early human visual system compete with deep neural networks? In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2798–2804 (2017)

    Google Scholar 

  8. Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)

    Article  Google Scholar 

  9. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  10. Fan, D.P., Ji, G.P., Sun, G., Cheng, M.M., Shen, J., Shao, L.: Camouflaged object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2777–2787 (2020)

    Google Scholar 

  11. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

    Google Scholar 

  12. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)

    Article  Google Scholar 

  13. Firestone, C.: Performance vs. competence in human-machine comparisons. Proc. Natl. Acad. Sci. 117(43), 26562–26571 (2020)

    Article  Google Scholar 

  14. Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W.: Are you talking to a machine? dataset and methods for multilingual image question. Adv. Neural Inf. Process. Syst. 28, 2296–2304 (2015)

    Google Scholar 

  15. Geirhos, R., Janssen, D.H., SchĂ¼tt, H.H., Rauber, J., Bethge, M., Wichmann, F.A.: Comparing deep neural networks against humans: object recognition when the signal gets weaker. arXiv preprint arXiv:1706.06969 (2017)

  16. Geirhos, R., Temme, C.R.M., Rauber, J., SchĂ¼tt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. arXiv preprint arXiv:1808.08750 (2018)

  17. Geman, D., Geman, S., Hallonquist, N., Younes, L.: Visual turing test for computer vision systems. Proc. Natl. Acad. Sci. 112(12), 3618–3623 (2015)

    Article  Google Scholar 

  18. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  19. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  20. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  21. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  22. Hu, B.G., Dong, W.M.: A design of human-like robust ai machines in object identification. arXiv preprint arXiv:2101.02327 (2021)

  23. Huang, K.Q., Tan, T.N.: Review on computational model for vision. Pattern Recogn. Artif. Intell. 26(10), 951–958 (2013)

    Google Scholar 

  24. Kuznetsova, A., et al.: The open images dataset v4. Int. J. Comput. Vis. 128, 1–26 (2020)

    Article  Google Scholar 

  25. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)

    Article  MathSciNet  Google Scholar 

  26. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  27. Lin, T.Y., Goyal, P., Girshick, R., He, K., DollĂ¡r, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  28. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  29. Liu, L., et al.: Deep learning for generic object detection: a survey. Int. J. Comput. Vis. 128(2), 261–318 (2020)

    Article  Google Scholar 

  30. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  31. Loh, Y.P., Chan, C.S.: Getting to know low-light images with the exclusively dark dataset. Comput. Vis. Image Underst 178, 30–42 (2019)

    Article  Google Scholar 

  32. Malinowski, M., Fritz, M.: A multi-world approach to question answering about real-world scenes based on uncertain input. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 1682–1690 (2014)

    Google Scholar 

  33. Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1–9 (2015)

    Google Scholar 

  34. Michaelis, C., et al.: Benchmarking robustness in object detection: autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484 (2019)

  35. Qi, H., Wu, T., Lee, M.W., Zhu, S.C.: A restricted visual turing test for deep scene and event understanding. arXiv preprint arXiv:1512.01715 (2015)

  36. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  37. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

    Google Scholar 

  38. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  39. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  40. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  41. Shan, Q., Adams, R., Curless, B., Furukawa, Y., Seitz, S.M.: The visual turing test for scene reconstruction. In: International Conference on 3D Vision, pp. 25–32 (2013)

    Google Scholar 

  42. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)

    Google Scholar 

  43. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. I (2001)

    Google Scholar 

  44. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2

  45. Xue, A.: End-to-end Chinese landscape painting creation using generative adversarial networks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3863–3871 (2021)

    Google Scholar 

  46. Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40

    Chapter  Google Scholar 

  47. Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)

    Article  Google Scholar 

  48. Zheng, N.N., et al.: Hybrid-augmented intelligence: collaboration and cognition. Front. Inf. Technol. Electron. Eng. 18(2), 153–179 (2017)

    Article  Google Scholar 

  49. Zhou, Z., Firestone, C.: Humans can decipher adversarial images. Nature Commun. 10(1), 1–9 (2019)

    Google Scholar 

  50. Zhu, P., et al.: Visdrone-det2018: the vision meets drone object detection in image challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

    Google Scholar 

  51. Zhu, W., Wang, X., Gao, W.: Multimedia intelligence: when multimedia meets artificial intelligence. IEEE Trans. Multimedia 22(7), 1823–1835 (2020)

    Article  Google Scholar 

  52. Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. arXiv preprint arXiv:1905.05055 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiaozhe Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Q., Zhang, J., Zhao, X., Huang, K. (2021). Can DNN Detectors Compete Against Human Vision in Object Detection Task?. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88004-0_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88003-3

  • Online ISBN: 978-3-030-88004-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics