ABSTRACT
Object detection is one of the most critical and challenging tasks in computer vision. It is the process of finding objects belonging to some predefined categories and determining their location in an image or video. This paper reviews deep learning-based object detection models. The paper discusses some benchmark datasets. The performance evaluation of different detectors on different datasets based on mean Average Precision (mAP) is reviewed. Object detection is used in different fields in different forms. Applications of object detection like pedestrian detection, autonomous driving, face detection, etc., are presented. Finally, the future scope is discussed to work on new techniques for object detection.
- Hao Zhang and Xianggong Hong. 2019. Recent progresses on object detection: a brief review. Multimed Tools Appl 78, 19 (October 2019), 27809--27847. DOI:https://doi.org/10.1007/s11042-019-07898-2Google ScholarDigital Library
- Licheng Jiao, Fan Zhang, Fang Liu, Shuyuan Yang, Lingling Li, Zhixi Feng, and Rong Qu. 2019. A Survey of Deep Learning-Based Object Detection. IEEE Access 7, (2019), 128837--128868. DOI:https://doi.org/10.1109/ACCESS.2019.2939201Google ScholarCross Ref
- P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, IEEE Comput. Soc, Kauai, HI, USA, I-511-I-518. DOI:https://doi.org/10.1109/CVPR.2001.990517Google ScholarCross Ref
- Karanbir Singh Chahal and Kuntal Dey. 2018. A Survey of Modern Object Detection Literature using Deep Learning. arXiv:1808.07256 [cs] (August 2018). Retrieved March 2, 2021 from http://arxiv.org/abs/1808.07256Google Scholar
- Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object Detection in 20 Years: A Survey. arXiv:1905.05055 [cs] (May 2019). Retrieved March 2, 2021 from http://arxiv.org/abs/1905.05055Google Scholar
- N. Dalal and B. Triggs. 2005. Histograms of Oriented Gradients for Human Detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), IEEE, San Diego, CA, USA, 886--893. DOI:https://doi.org/10.1109/CVPR.2005.177Google ScholarDigital Library
- P F Felzenszwalb, R B Girshick, D McAllester, and D Ramanan. 2010. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (September 2010), 1627--1645. DOI:https://doi.org/10.1109/TPAMI.2009.167Google ScholarDigital Library
- Xiongwei Wu, Doyen Sahoo, and Steven C.H. Hoi. 2020. Recent advances in deep learning for object detection. Neurocomputing 396, (July 2020), 39--64. DOI:https://doi.org/10.1016/j.neucom.2020.01.085Google ScholarCross Ref
- Wang Zhiqiang and Liu Jun. A Review of Object Detection Based on Convolutional Neural Network. 6.Google Scholar
- Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. 2014. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv:1312.6229 [cs] (February 2014). Retrieved July 21, 2021 from http://arxiv.org/abs/1312.6229Google Scholar
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, OH, USA, 580--587. DOI:https://doi.org/10.1109/CVPR.2014.81Google ScholarDigital Library
- Ross Girshick. 2015. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, Santiago, Chile, 1440--1448. DOI:https://doi.org/10.1109/ICCV.2015.169Google ScholarDigital Library
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (June 2017), 1137--1149. DOI:https://doi.org/10.1109/TPAMI.2016.2577031Google ScholarDigital Library
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2018. Mask R-CNN. arXiv:1703.06870 [cs] (January 2018). Retrieved July 21, 2021 from http://arxiv.org/abs/1703.06870Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640 [cs] (May 2016). Retrieved July 21, 2021 from http://arxiv.org/abs/1506.02640Google Scholar
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. arXiv:1512.02325 [cs] 9905, (2016), 21--37. DOI:https://doi.org/10.1007/978-3-319-46448-0_2Google ScholarCross Ref
- Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional Single Shot Detector. arXiv:1701.06659 [cs] (January 2017). Retrieved July 21, 2021 from http://arxiv.org/abs/1701.06659Google Scholar
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal Loss for Dense Object Detection. arXiv:1708.02002 [cs] (February 2018). Retrieved July 21, 2021 from http://arxiv.org/abs/1708.02002Google Scholar
- Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis 88, 2 (June 2010), 303--338. DOI:https://doi.org/10.1007/s11263-009-0275-4Google ScholarDigital Library
- Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2015. Microsoft COCO: Common Objects in Context. arXiv:1405.0312 [cs] (February 2015). Retrieved July 21, 2021 from http://arxiv.org/abs/1405.0312Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis 115, 3 (December 2015), 211--252. DOI:https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarDigital Library
- Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. 2020. The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. Int J Comput Vis 128, 7 (July 2020), 1956--1981. DOI:https://doi.org/10.1007/s11263-020-01316-zGoogle ScholarCross Ref
- Junwei Han, Dingwen Zhang, Gong Cheng, Nian Liu, and Dong Xu. 2018. Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection: A Survey. IEEE Signal Process. Mag. 35, 1 (January 2018), 84--100. DOI:https://doi.org/10.1109/MSP.2017.2749125Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Computer Vision - ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele and Tinne Tuytelaars (eds.). Springer International Publishing, Cham, 346--361. DOI:https://doi.org/10.1007/978-3-319-10578-9_23Google ScholarCross Ref
- Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object Detection via Region-based Fully Convolutional Networks. arXiv:1605.06409 [cs] (June 2016). Retrieved July 21, 2021 from http://arxiv.org/abs/1605.06409Google Scholar
- Zhong-Qiu Zhao, Peng Zheng, Shou-Tao Xu, and Xindong Wu. 2019. Object Detection With Deep Learning: A Review. IEEE Trans. Neural Netw. Learning Syst. 30, 11 (November 2019), 3212--3232. DOI:https://doi.org/10.1109/TNNLS.2018.2876865Google ScholarCross Ref
- Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. arXiv:1612.03144 [cs] (April 2017). Retrieved July 21, 2021 from http://arxiv.org/abs/1612.03144Google Scholar
- Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Matti Pietikäinen. 2020. Deep Learning for Generic Object Detection: A Survey. Int J Comput Vis 128, 2 (February 2020), 261--318. DOI:https://doi.org/10.1007/s11263-019-01247-4Google ScholarDigital Library
- Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, Faster, Stronger. arXiv:1612.08242 [cs] (December 2016). Retrieved July 21, 2021 from http://arxiv.org/abs/1612.08242Google Scholar
- Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv:1804.02767 [cs] (April 2018). Retrieved July 21, 2021 from http://arxiv.org/abs/1804.02767Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs] (December 2015). Retrieved July 21, 2021 from http://arxiv.org/abs/1512.03385Google Scholar
- Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 [cs, eess] (April 2020). Retrieved July 21, 2021 from http://arxiv.org/abs/2004.10934Google Scholar
- Rafael Padilla, Sergio L. Netto, and Eduardo A. B. da Silva. 2020. A Survey on Performance Metrics for Object-Detection Algorithms. In 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), IEEE, Niterói, Brazil, 237--242. DOI:https://doi.org/10.1109/IWSSIP48289.2020.9145130Google ScholarCross Ref
- P. Dollar, C. Wojek, B. Schiele, and P. Perona. 2012. Pedestrian Detection: An Evaluation of the State of the Art. IEEE Trans. Pattern Anal. Mach. Intell. 34, 4 (April 2012), 743--761. DOI:https://doi.org/10.1109/TPAMI.2011.155Google ScholarDigital Library
- Liming Wang, Jianbo Shi, Gang Song, and I-fan Shen. 2007. Object Detection Combining Recognition and Segmentation. In Computer Vision - ACCV 2007, Yasushi Yagi, Sing Bing Kang, In So Kweon and Hongbin Zha (eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 189--199. DOI:https://doi.org/10.1007/978-3-540-76386-4_17Google ScholarCross Ref
Index Terms
- Object Detection using Deep Learning: A Review
Recommendations
Survey of Deep Learning Based Object Detection
ICBDT '19: Proceedings of the 2nd International Conference on Big Data TechnologiesThe main tasks of computer vision are image classification/location, target detection, target tracking, semantic segmentation and instance segmentation. The task of target detection is to output the borders and labels of a single target from the image. ...
A review of small object detection based on deep learning
AbstractSmall object detection is widely used in a variety of fields such as automatic driving, UAV-based object detection, and aerial image detection. However, small objects carry limited information, making it difficult for detectors to detect small ...
A systematic review of object detection from images using deep learning
AbstractThe development of object detection has led to huge improvements in human interaction systems. Object detection is a challenging task because it involves many parameters including variations in poses, resolution, occlusion, and daytime versus ...
Comments