skip to main content
10.1145/3484824.3484889acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdsmlaiConference Proceedingsconference-collections
research-article

Object Detection using Deep Learning: A Review

Authors Info & Claims
Published:13 January 2022Publication History

ABSTRACT

Object detection is one of the most critical and challenging tasks in computer vision. It is the process of finding objects belonging to some predefined categories and determining their location in an image or video. This paper reviews deep learning-based object detection models. The paper discusses some benchmark datasets. The performance evaluation of different detectors on different datasets based on mean Average Precision (mAP) is reviewed. Object detection is used in different fields in different forms. Applications of object detection like pedestrian detection, autonomous driving, face detection, etc., are presented. Finally, the future scope is discussed to work on new techniques for object detection.

References

  1. Hao Zhang and Xianggong Hong. 2019. Recent progresses on object detection: a brief review. Multimed Tools Appl 78, 19 (October 2019), 27809--27847. DOI:https://doi.org/10.1007/s11042-019-07898-2Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Licheng Jiao, Fan Zhang, Fang Liu, Shuyuan Yang, Lingling Li, Zhixi Feng, and Rong Qu. 2019. A Survey of Deep Learning-Based Object Detection. IEEE Access 7, (2019), 128837--128868. DOI:https://doi.org/10.1109/ACCESS.2019.2939201Google ScholarGoogle ScholarCross RefCross Ref
  3. P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, IEEE Comput. Soc, Kauai, HI, USA, I-511-I-518. DOI:https://doi.org/10.1109/CVPR.2001.990517Google ScholarGoogle ScholarCross RefCross Ref
  4. Karanbir Singh Chahal and Kuntal Dey. 2018. A Survey of Modern Object Detection Literature using Deep Learning. arXiv:1808.07256 [cs] (August 2018). Retrieved March 2, 2021 from http://arxiv.org/abs/1808.07256Google ScholarGoogle Scholar
  5. Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object Detection in 20 Years: A Survey. arXiv:1905.05055 [cs] (May 2019). Retrieved March 2, 2021 from http://arxiv.org/abs/1905.05055Google ScholarGoogle Scholar
  6. N. Dalal and B. Triggs. 2005. Histograms of Oriented Gradients for Human Detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), IEEE, San Diego, CA, USA, 886--893. DOI:https://doi.org/10.1109/CVPR.2005.177Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P F Felzenszwalb, R B Girshick, D McAllester, and D Ramanan. 2010. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (September 2010), 1627--1645. DOI:https://doi.org/10.1109/TPAMI.2009.167Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xiongwei Wu, Doyen Sahoo, and Steven C.H. Hoi. 2020. Recent advances in deep learning for object detection. Neurocomputing 396, (July 2020), 39--64. DOI:https://doi.org/10.1016/j.neucom.2020.01.085Google ScholarGoogle ScholarCross RefCross Ref
  9. Wang Zhiqiang and Liu Jun. A Review of Object Detection Based on Convolutional Neural Network. 6.Google ScholarGoogle Scholar
  10. Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. 2014. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv:1312.6229 [cs] (February 2014). Retrieved July 21, 2021 from http://arxiv.org/abs/1312.6229Google ScholarGoogle Scholar
  11. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, OH, USA, 580--587. DOI:https://doi.org/10.1109/CVPR.2014.81Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ross Girshick. 2015. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, Santiago, Chile, 1440--1448. DOI:https://doi.org/10.1109/ICCV.2015.169Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (June 2017), 1137--1149. DOI:https://doi.org/10.1109/TPAMI.2016.2577031Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2018. Mask R-CNN. arXiv:1703.06870 [cs] (January 2018). Retrieved July 21, 2021 from http://arxiv.org/abs/1703.06870Google ScholarGoogle Scholar
  15. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640 [cs] (May 2016). Retrieved July 21, 2021 from http://arxiv.org/abs/1506.02640Google ScholarGoogle Scholar
  16. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. arXiv:1512.02325 [cs] 9905, (2016), 21--37. DOI:https://doi.org/10.1007/978-3-319-46448-0_2Google ScholarGoogle ScholarCross RefCross Ref
  17. Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional Single Shot Detector. arXiv:1701.06659 [cs] (January 2017). Retrieved July 21, 2021 from http://arxiv.org/abs/1701.06659Google ScholarGoogle Scholar
  18. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal Loss for Dense Object Detection. arXiv:1708.02002 [cs] (February 2018). Retrieved July 21, 2021 from http://arxiv.org/abs/1708.02002Google ScholarGoogle Scholar
  19. Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis 88, 2 (June 2010), 303--338. DOI:https://doi.org/10.1007/s11263-009-0275-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2015. Microsoft COCO: Common Objects in Context. arXiv:1405.0312 [cs] (February 2015). Retrieved July 21, 2021 from http://arxiv.org/abs/1405.0312Google ScholarGoogle Scholar
  21. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis 115, 3 (December 2015), 211--252. DOI:https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. 2020. The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. Int J Comput Vis 128, 7 (July 2020), 1956--1981. DOI:https://doi.org/10.1007/s11263-020-01316-zGoogle ScholarGoogle ScholarCross RefCross Ref
  23. Junwei Han, Dingwen Zhang, Gong Cheng, Nian Liu, and Dong Xu. 2018. Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection: A Survey. IEEE Signal Process. Mag. 35, 1 (January 2018), 84--100. DOI:https://doi.org/10.1109/MSP.2017.2749125Google ScholarGoogle ScholarCross RefCross Ref
  24. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Computer Vision - ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele and Tinne Tuytelaars (eds.). Springer International Publishing, Cham, 346--361. DOI:https://doi.org/10.1007/978-3-319-10578-9_23Google ScholarGoogle ScholarCross RefCross Ref
  25. Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object Detection via Region-based Fully Convolutional Networks. arXiv:1605.06409 [cs] (June 2016). Retrieved July 21, 2021 from http://arxiv.org/abs/1605.06409Google ScholarGoogle Scholar
  26. Zhong-Qiu Zhao, Peng Zheng, Shou-Tao Xu, and Xindong Wu. 2019. Object Detection With Deep Learning: A Review. IEEE Trans. Neural Netw. Learning Syst. 30, 11 (November 2019), 3212--3232. DOI:https://doi.org/10.1109/TNNLS.2018.2876865Google ScholarGoogle ScholarCross RefCross Ref
  27. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. arXiv:1612.03144 [cs] (April 2017). Retrieved July 21, 2021 from http://arxiv.org/abs/1612.03144Google ScholarGoogle Scholar
  28. Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Matti Pietikäinen. 2020. Deep Learning for Generic Object Detection: A Survey. Int J Comput Vis 128, 2 (February 2020), 261--318. DOI:https://doi.org/10.1007/s11263-019-01247-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, Faster, Stronger. arXiv:1612.08242 [cs] (December 2016). Retrieved July 21, 2021 from http://arxiv.org/abs/1612.08242Google ScholarGoogle Scholar
  30. Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv:1804.02767 [cs] (April 2018). Retrieved July 21, 2021 from http://arxiv.org/abs/1804.02767Google ScholarGoogle Scholar
  31. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs] (December 2015). Retrieved July 21, 2021 from http://arxiv.org/abs/1512.03385Google ScholarGoogle Scholar
  32. Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 [cs, eess] (April 2020). Retrieved July 21, 2021 from http://arxiv.org/abs/2004.10934Google ScholarGoogle Scholar
  33. Rafael Padilla, Sergio L. Netto, and Eduardo A. B. da Silva. 2020. A Survey on Performance Metrics for Object-Detection Algorithms. In 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), IEEE, Niterói, Brazil, 237--242. DOI:https://doi.org/10.1109/IWSSIP48289.2020.9145130Google ScholarGoogle ScholarCross RefCross Ref
  34. P. Dollar, C. Wojek, B. Schiele, and P. Perona. 2012. Pedestrian Detection: An Evaluation of the State of the Art. IEEE Trans. Pattern Anal. Mach. Intell. 34, 4 (April 2012), 743--761. DOI:https://doi.org/10.1109/TPAMI.2011.155Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Liming Wang, Jianbo Shi, Gang Song, and I-fan Shen. 2007. Object Detection Combining Recognition and Segmentation. In Computer Vision - ACCV 2007, Yasushi Yagi, Sing Bing Kang, In So Kweon and Hongbin Zha (eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 189--199. DOI:https://doi.org/10.1007/978-3-540-76386-4_17Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Object Detection using Deep Learning: A Review
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          DSMLAI '21': Proceedings of the International Conference on Data Science, Machine Learning and Artificial Intelligence
          August 2021
          415 pages
          ISBN:9781450387637
          DOI:10.1145/3484824

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 January 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader