Enhanced SSD with interactive multi-scale attention features for object detection

Zhou, Shuren; Qiu, Jia

doi:10.1007/s11042-020-10191-2

Enhanced SSD with interactive multi-scale attention features for object detection

Published: 06 January 2021

Volume 80, pages 11539–11556, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shuren Zhou¹ &
Jia Qiu¹

991 Accesses
3 Altmetric
Explore all metrics

Abstract

Single Shot MultiBox Detector (SSD) method using multi-scale feature maps for object detection, showing outstanding performance in object detection task. However, as a one-stage detection method, it’s difficult for SSD methods to quickly notice significant areas of objects in the image. In the SSD network structure, feature maps of different scales are used to independently predict object, and there is a lack of interaction between low-level feature maps and high-level feature maps. In this paper we propose an enhanced SSD method using interactive multi-scale attention features (MA-SSD). Our method uses the attention mechanism to generate attention features of multiple scales and adds it to the original detection branch of the SSD method, which effectively enhances the feature representation ability and improves the detection accuracy. At the same time, the feature of different detection scales interacts with each other, and all the detection branches in our method have a parallel structure, which ensures the detection efficiency. Our proposed method achieves competitive performance on the public dataset PascalVOC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A recursive attention-enhanced bidirectional feature pyramid network for small object detection

Article 27 September 2022

An enhanced SSD with feature fusion and visual reasoning for object detection

Article 19 April 2018

AG-YOLO: Attention-guided network for real-time object detection

Article 04 September 2023

References

Anderson P, He XD, Buehler C (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6077–6086. https://doi.org/10.1109/CVPR.2018.00636
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Buzcu I, Alatan AA (2016) Fisher-selective search for object detection. In: IEEE International Conference on Image Processing, pp. 3633–3637, ICIP. https://doi.org/10.1109/ICIP.2016.7533037
Cai Z, Vasconcelos N (2018) Cascade R-CNN: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162, CVPR. https://doi.org/10.1109/CVPR.2018.00644
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 886–893. https://doi.org/10.1109/CVPR.2005.177
Everingham M, Eslami SMA (2007) The Pascal visual object classes challenge 2007 (VOC2007) development kit. Int J Comput Vis 111(1):98–136. https://doi.org/10.1007/s11263-014-0733-5
Article Google Scholar
Fu C, Liu W, Ranga A (2017) DSSD: Deconvolutional Single Shot Detector arXiv:1701.06659
Girshick R (2015) Fast R-CNN. In: IEEE International Conference on Computer Vision, pp. 1440–1448, ICCV. https://doi.org/10.1109/ICCV.2015.169
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, CVPR. https://doi.org/10.1109/CVPR.2014.81
Gui Y, Zeng G (2020) Joint learning of visual and spatial features for edit propagation from a single image. Vis Comput 36:469–482. https://doi.org/10.1007/s00371-019-01633-6
Article Google Scholar
He K, Zhang X, Ren S (2014) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. Eur Conf Comput Vis, pp. 346–361, ECCV. https://doi.org/10.1007/978-3-319-10578-9_23
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. CVPR. https://doi.org/10.1109/CVPR.2016.90
Hu H, Gu JY, Zhang Z, Dai J, Wei YC (2018) Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597, CVPR. https://doi.org/10.1109/CVPR.2018.00378
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely Connected Convolutional Networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269, CVPR. https://doi.org/10.1109/CVPR.2017.243
Jia D, Wei D, Richard S, Lijia L, Kai L, Feifei L (2009) ImageNet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2009:248–255, CVPR. https://doi.org/10.1109/CVPR.2009.5206848
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Article Google Scholar
Li W, Xu H, Li H, Yang YJ, Sharma PK, Wang J, Singh S (2020) Complexity and algorithms for superposed data uploading problem in networks with smart devices. IEEE Internet Things J pp. 1–1. https://doi.org/10.1109/jiot.2019.2949352
Lin TY, Dollar P, Girshick R (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944, CVPR. https://doi.org/10.1109/CVPR.2017.106
Lin T, Goyal P, Girshick R, He K, Dollar P (2018) Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 42, no 2, pp 318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Liu W, Anguelov D, Erhan D, Christian S, Scott R, Cheng-Yang F, Alexander C (2016) SSD: single shot MultiBox detector. In: European conference on computer vision. Pp. 21-37, ECCV. https://doi.org/10.1007/978-3-319-46448-0_2
Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp. 2204–2212, NIPS.
Qin J, Li H, Xiang X, Tan Y, Pan W, Xiong NN (2019) An encrypted image retrieval method based on Harris corner optimization and LSH in cloud computing. IEEE Access 7(1):24626–24633. https://doi.org/10.1109/ACCESS.2019.2894673
Article Google Scholar
Redmon J, Farhadi A (2017) YOLO9000: Better, Faster, Stronger. In: The IEEE Conference on Computer Vision and Pattern Recognition. pp. 6517–6525, CVPR. https://doi.org/10.1109/CVPR.2017.690
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, CVPR. https://doi.org/10.1109/CVPR.2016.91
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 39, no 6, pp 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Sempau J, Wilderman SJ, Bielajew AF (2000) DPM, a fast, accurate Monte Carlo code optimized for photon and electron radiotherapy treatment planning dose calculations. Phys Med Biol 45(8):2263–2291. https://doi.org/10.1088/0031-9155/45/8/315
Article Google Scholar
Simonyan, K., Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
Stollenga M, Masci J, Gomez F, Schmidhuber J (2014) Deep networks with internal selective attention through feedback connections. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol 2, pp 3545–3553, NIPS
Tang Q, Xie MZ, Yang K, Yuansheng L, Dongdai Z, Yun S (2019) A decision function based smart charging and discharging strategy for electric vehicle in smart grid. Mob Netw Appl 24:1722–1731. https://doi.org/10.1007/s11036-018-1049-4
Article Google Scholar
Wang F, Jiang M, Qian C, et al. (2017) Residual attention network for image classification. IEEE Conf Comput Vis Pattern Recognit, pp. 6450–6458, CVPR. https://doi.org/10.1109/CVPR.2017.683
Wang J, Gao Y, Yin X, Li F, Kim H (2018) An Enhanced PEGASIS Algorithm with Mobile Sink Support for Wireless Sensor Networks. Wirel Commun Mob Comput (9 pages). https://doi.org/10.1155/2018/9472075
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. IEEE Conf Comput Vis Pattern Recognit, pp.842–850, CVPR. https://doi.org/10.1109/CVPR.2015.7298685
Zeng D, Dai Y, Li F, Sherratt RS, Wang J (2018) Adversarial learning for distant supervised relation extraction. Comput Mater Contin 55(1):121–136. https://doi.org/10.3970/cmc.2018.055.121
Article Google Scholar
Zhan D, Yin T, Yang G, Xia M, Li L, Sun X (2017) Detecting image seam carving with low scaling ratio using multi-scale spatial and spectral entropies. J Vis Commun Image Represent 48:281–291. https://doi.org/10.1016/j.jvcir.2017.07.006
Article Google Scholar
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212, CVPR. https://doi.org/10.1109/CVPR.2018.00442
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5813–5821, CVPR. https://doi.org/10.1109/CVPR.2018.00609
Zhang J, Jin X, Sun J, Wang J, Arun KS (2020) Spatial and semantic convolutional features for robust visual object tracking. Multimed Tools Appl 79:15095–15115. https://doi.org/10.1007/s11042-018-6562-8
Article Google Scholar
Zhaowei C, Quanfu F, Rogerio SF, Nuno V (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision, pp. 354–370, ECCV. https://doi.org/10.1007/978-3-319-46493-0_22

Download references

Acknowledgments

This work was supported by the Scientific Research Fund of Hunan Provincial Education Department of China (Project No. 17A007); and the Teaching Reform and Research Project of Hunan Province of China (Project No. JG1615).

Author information

Authors and Affiliations

School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, 410114, China
Shuren Zhou & Jia Qiu

Authors

Shuren Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jia Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuren Zhou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, S., Qiu, J. Enhanced SSD with interactive multi-scale attention features for object detection. Multimed Tools Appl 80, 11539–11556 (2021). https://doi.org/10.1007/s11042-020-10191-2

Download citation

Received: 21 March 2019
Revised: 30 September 2020
Accepted: 24 November 2020
Published: 06 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10191-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced SSD with interactive multi-scale attention features for object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A recursive attention-enhanced bidirectional feature pyramid network for small object detection

An enhanced SSD with feature fusion and visual reasoning for object detection

AG-YOLO: Attention-guided network for real-time object detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now