Real-Time Object Detection Based on Convolutional Block Attention Module

Ban, Ming-Yang; Tian, Wei-Dong; Zhao, Zhong-Qiu

doi:10.1007/978-3-030-60796-8_4

Ming-Yang Ban¹⁰,
Wei-Dong Tian¹⁰ &
Zhong-Qiu Zhao¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12465))

Included in the following conference series:

International Conference on Intelligent Computing

1316 Accesses
3 Citations

Abstract

Object detection is one of the most challenging problems in the field of computer vision, the practicality of object detection requires accuracy and real-time. YOLOv3 is a good real-time object detection algorithm, but with insufficient recall rate and insufficient positioning accuracy. The Attention Mechanism in deep learning is similar to the attention mechanism of human vision, which is to focus attention on important points in many information, select key information, and ignore other unimportant information. In this paper, we integrate Convolutional Block Attention Module (CBAM) in YOLOv3 in order to improves the detection accuracy and keep real-time. Compared to a conventional YOLOv3, we experimentally show the effectiveness and accuracy of the proposed method on the PASCAL VOC and MS-COCO datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wu, Q., Shen, C., Wang, P., Dick, A., van den Hengel, A.: Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1367–1381 (2018)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE (2017)
Google Scholar
Kang, K., et al.: T-CNN: tubelets with convolutional neural networks for object detection from videos. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2896–2907 (2018)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Liu, W., et al.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Shen, Z., Liu, Z., Li, J., et al.: DSOD: learning deeply supervised object detectors from scratch. In: IEEE International Conference on Computer Vision, pp. 1919–1927 (2017)
Google Scholar
Huang, G., Liu, Z., et al.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269 (2017)
Google Scholar
Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition, pp. 6517–6525. IEEE (2017)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5513–5522 (2017)
Google Scholar
Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al.: Microsoft COCO: common objects in context. In: ECCV, pp. 740–755 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Woo, S., Park, J., Lee, J.Y., et al.: CBAM: convolutional block attention module. In: European Conference on Computer Vision, pp. 3–19 (2018)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision (ECCV) (2014)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar

Download references

Acknowledgement

This research was supported by the National Natural Science Foundation of China (Nos. 61672203, 61976079 & U1836102) and Anhui Natural Science Funds for Distinguished Young Scholar (No. 170808J08).

Author information

Authors and Affiliations

College of Computer and Information, Hefei University of Technology, Hefei, China
Ming-Yang Ban, Wei-Dong Tian & Zhong-Qiu Zhao

Authors

Ming-Yang Ban
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Dong Tian
View author publications
You can also search for this author in PubMed Google Scholar
Zhong-Qiu Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming-Yang Ban .

Editor information

Editors and Affiliations

Machine Learning and Systems Biology, Tongji University, Shanghai, China
De-Shuang Huang
School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ban, MY., Tian, WD., Zhao, ZQ. (2020). Real-Time Object Detection Based on Convolutional Block Attention Module. In: Huang, DS., Premaratne, P. (eds) Intelligent Computing Methodologies. ICIC 2020. Lecture Notes in Computer Science(), vol 12465. Springer, Cham. https://doi.org/10.1007/978-3-030-60796-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-60796-8_4
Published: 05 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60795-1
Online ISBN: 978-3-030-60796-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics