Adaptive feature fusion with attention mechanism for multi-scale target detection

Ju, Moran; Luo, Jiangning; Wang, Zhongbo; Luo, Haibo

doi:10.1007/s00521-020-05150-9

Adaptive feature fusion with attention mechanism for multi-scale target detection

Original Article
Published: 10 July 2020

Volume 33, pages 2769–2781, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Moran Ju^1,2,3,4,5,
Jiangning Luo⁶,
Zhongbo Wang^1,2,3,4,5 &
…
Haibo Luo ORCID: orcid.org/0000-0001-6425-6433^1,2,4,5

1721 Accesses
26 Citations
6 Altmetric
Explore all metrics

Abstract

To detect the targets of different sizes, multi-scale output is used by target detectors such as YOLO V3 and DSSD. To improve the detection performance, YOLO V3 and DSSD perform feature fusion by combining two adjacent scales. However, the feature fusion only between the adjacent scales is not sufficient. It hasn’t made advantage of the features at other scales. What is more, as a common operation for feature fusion, concatenating can’t provide a mechanism to learn the importance and correlation of the features at different scales. In this paper, we propose adaptive feature fusion with attention mechanism (AFFAM) for multi-scale target detection. AFFAM utilizes pathway layer and subpixel convolution layer to resize the feature maps, which is helpful to learn better and complex feature mapping. In addition, AFFAM utilizes global attention mechanism and spatial position attention mechanism, respectively, to learn the correlation of the channel features and the importance of the spatial features at different scales adaptively. Finally, we combine AFFAM with YOLO V3 to build an efficient multi-scale target detector. The comparative experiments are conducted on PASCAL VOC dataset, KITTI dataset and Smart UVM dataset. Compared with the state-of-the-art target detectors, YOLO V3 with AFFAM achieved 84.34% mean average precision (mAP) at 19.9 FPS on PASCAL VOC dataset, 87.2% mAP at 21 FPS on KITTI dataset and 99.22% mAP at 20.6 FPS on Smart UVM dataset which outperforms other advanced target detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS, pp 1097–1105
Liu L, Ouyang W, Wang X, Fieguth P, Liu X, Pietikäinen M (2018) Deep learning for generic object detection: a survey. arXiv preprint https://arxiv.org/abs/1809.02165v4
Zhang H, Ji Y, Huang W et al (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31:7361–7380
Article Google Scholar
Levi G, Hassner T (2015) Age and gender classification using convolution neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 34–42
Attia A, Dayan S (2018) Detecting and counting tiny faces. arXiv preprint https://arxiv.org/abs/1801.06504
Makantasis K, Doulamis A, Doulamis N, Psychas K (2016) Deep learning based human behavior recognition in industrial workflows. In: Proceedings of the international conference on image processing, pp 1609–1613
Hasan M, Roy-Chowdhury AK (2015) A continuous learning framework for activity recognition using deep hybrid feature models. IEEE Trans Multimed 17:1909–1922
Article Google Scholar
Zhou Y, Liu L, Shao L, Mellor M (2016) DAVE: a unified framework for fast vehicle detection and annotation. arXiv preprint https://arxiv.org/abs/1607.04564
Wang L, Lu Y, Wang H, Zheng Y, Ye H, Xue X (2017) evolving boxes for fast vehicle detection. arXiv preprint https://arxiv.org/abs/1702.00254
Liu K, Mattyus G (2015) Fast multiclass vehicle detection on aerial images. IEEE Geosci Remote Sens Lett 12:1938–1942
Article Google Scholar
Chen X, Ma H, Wan J, Li B, Xia T (2016) Multi-view 3D object detection network for autonomous driving. arXiv preprint https://arxiv.org/abs/1611.07759
Uçar A, Demir Y, Güzeli C (2017) Object recognition and detection with deep learning for autonomous driving applications. Simulation 93:759–769
Article Google Scholar
Nguyen-Meidine LT, Granger E, Kiran M, Blais-Morin LA (2018) A comparison of cnn-based face and head detectors for real-time video surveillance applications. arXiv preprint https://arxiv.org/abs/1809.03336
Yu R, Wang H, Davis LS (2018) ReMotENet: efficient relevant motion event detection for large-scale home surveillance videos. arXiv preprint https://arxiv.org/abs/1801.02031
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798–1828
Article Google Scholar
Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement. In: Proceedings of the IEEE conference on computer vision and pattern recognition. arXiv preprint https://arxiv.org/abs/1804.02767
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multi box detector. In: Proceedings of the European conference on computer vision, pp 21–37
Fu CY, Liu W, Ranga A et al (2017) DSSD: deconvolutional single shot detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition. arXiv preprint https://arxiv.org/abs/1701.06659
Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Shi W, Caballero J, Huszár F, Totz J, Aitken AP, Bishop R et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid, I, Savarese S (2019). Generalized intersection over union: a metric and a loss for bounding box regression. arXiv preprint https://arxiv.org/abs/1902.09630
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42:318–327
Article Google Scholar
Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Girshick R (2015) Fast R-CNN. In: IEEE international conference on computer vision and pattern recognition, pp 1440–1448
Ren S, He K, Girshick RB et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149
Article Google Scholar
He K, Gkioxari G, Dollár P et al (2018) Mask R-CNN. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–1
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Van SKEA, Uijlings JRR, Gevers T et al (2011) Segmentation as selective search for object recognition. In: Proceedings of the 2011 international conference on computer vision, pp 1879–1886
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6517–6525
Zhou X, Wang D, Philipp K (2019) Objects as points. arXiv preprint https://arxiv.org/abs/1904.07850
Law H, Deng J (2018) Cornernet: detecting objects as paired key points, In: Proceedings of European conference on computer vision, pp 765–781
Jia XL, Liu Y (2019) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl 31:6549–6558
Article Google Scholar
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42:2011–2023
Article Google Scholar
Loshchilov I, Hutter F (2016) SGDR: stochastic gradient descent with warm restarts. arXiv preprint https://arxiv.org/abs/1608.03983
Li Y, He K, Sun J et al (2016) R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of NIPS.
Dai J et al (2017) Deformable convolutional networks. In: Proceedings of the European conference on computer vision
Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision
Wu TS, Zhang ZJ, Liu YP et al (2018) A light weight small object detection algorithm based on improved SSD. Infrared Laser Eng 47(7):703005
Article Google Scholar
Zhang H, Li D, Ji Y, Zhou H, Wu W and Liu K (2019) Towards new retail: a benchmark dataset for smart unmanned vending machines In: IEEE transactions on industrial informatics

Download references

Author information

Authors and Affiliations

Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110016, Liaoning, China
Moran Ju, Zhongbo Wang & Haibo Luo
Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, 110016, Liaoning, China
Moran Ju, Zhongbo Wang & Haibo Luo
University of Chinese Academy of Sciences, Beijing, 100049, China
Moran Ju & Zhongbo Wang
Key Laboratory of Opt-Electronic Information Processing, Chinese Academy of Sciences, Shenyang, 110016, Liaoning, China
Moran Ju, Zhongbo Wang & Haibo Luo
The Key Laboratory of Image Understanding and Computer Vision, Shenyang, 110016, Liaoning, China
Moran Ju, Zhongbo Wang & Haibo Luo
McGill University, Montreal, Quebec, H3A 0G4, Canada
Jiangning Luo

Authors

Moran Ju
View author publications
You can also search for this author in PubMed Google Scholar
Jiangning Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhongbo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haibo Luo.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ju, M., Luo, J., Wang, Z. et al. Adaptive feature fusion with attention mechanism for multi-scale target detection. Neural Comput & Applic 33, 2769–2781 (2021). https://doi.org/10.1007/s00521-020-05150-9

Download citation

Received: 28 March 2020
Accepted: 17 June 2020
Published: 10 July 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00521-020-05150-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive feature fusion with attention mechanism for multi-scale target detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive feature fusion with attention mechanism for multi-scale target detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation