FDAM: full-dimension attention module for deep convolutional neural networks

Cai, Silin; Wang, Changping; Ding, Jiajun; Yu, Jun; Fan, Jianping

doi:10.1007/s13735-022-00248-3

FDAM: full-dimension attention module for deep convolutional neural networks

REGULAR PAPER
Published: 08 November 2022

Volume 11, pages 599–610, (2022)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Silin Cai¹^na1,
Changping Wang¹^na1,
Jiajun Ding^2,3,
Jun Yu² &
…
Jianping Fan²

457 Accesses
Explore all metrics

Abstract

The attention mechanism is an important component of cross-modal research. It can improve the performance of convolutional neural networks by distinguishing the informative parts of the feature map from the useless ones. Various kinds of attention are proposed by recent studies. Different attentions use distinct division method to weight each part of the feature map. In this paper, we propose a full-dimension attention module, which is a lightweight, fully interactive 3-D attention mechanism. FDAM generates 3-D attention maps for both spatial and channel dimensions in parallel and then multiplies them to the feature map. It is difficult to obtain discriminative attention map cell under channel interaction at a low computational cost. Therefore, we adapt a generalized Elo rating mechanism to generate cell-level attention maps. We store historical information with a slight amount of non-training parameters to spread the computation over each training iteration. The proposed module can be seamlessly integrated into the end-to-end training of the CNN framework. Experiments demonstrate that it outperforms many existing attention mechanisms on different network structures and datasets for computer vision tasks, such as image classification and object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

LMA: lightweight mixed-domain attention for efficient network design

Article 11 October 2022

DoubleViT: Pushing Transformers Towards the End Because of Convolutions

TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

References

Berg A (2020) Statistical analysis of the elo rating system in chess. Chance 33(3):31–38
Article Google Scholar
Cao Y, Xu J, Lin S, et al (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 0–0
Chen H, Ding G, Liu X, et al (2020) Imram: iterative matching with recurrent attention memory for cross-modal image-text retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12655–12663
Chen K, Wang J, Pang J, et al (2019) Mmdetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155
Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Elo AE (1978) The rating of chessplayers, past and present. BT Batsford Limited, London
Google Scholar
Goyal P, Dollár P, Girshick R, et al (2017) Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677
Guo MH, Xu TX, Liu JJ, et al (2021) Attention mechanisms in computer vision: a survey. arXiv preprint arXiv:2111.07624
Guo Y, Liu Y, Georgiou T et al (2018) A review of semantic segmentation using deep neural networks. Int J Multimedia Inf Retriev 7(2):87–93
Article Google Scholar
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S et al (2016) Identity mappings in deep residual networks. European conference on computer vision. Springer, Cham, pp 630–645
Google Scholar
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 13713–13722
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Huang G, Liu Z, Van Der Maaten L, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4700–4708
Huang G, Liu S, Van der Maaten L, et al (2018) Condensenet: an efficient densenet using learned group convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2752–2761
Huddar MG, Sannakki SS, Rajpurohit VS (2020) Multi-level context extraction and attention-based contextual inter-modal fusion for multimodal sentiment analysis and emotion classification. Int J Multimedia Inf Retriev 9(2):103–112
Article Google Scholar
Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lee CY, Xie S, Gallagher P, et al (2015) Deeply-supervised nets. In: Artificial intelligence and statistics. PMLR, pp 562–570
Li J, Guo Y, Lao S et al (2022) Few2decide: towards a robust model via using few neuron connections to decide. Int J Multimedia Inf Retriev 11(2):189–198
Article Google Scholar
Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083
Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. European conference on computer vision. Springer, Cham, pp 740–755
Google Scholar
Lin TY, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. European conference on computer vision. Springer, Cham, pp 21–37
Google Scholar
Park J, Woo S, Lee JY, et al (2018) Bam: bottleneck attention module. arXiv preprint arXiv:1807.06514
Pelánek R (2016) Applications of the elo rating system in adaptive educational systems. Comput Educ 98:169–179
Article Google Scholar
Peng Z, Li Z, Zhang J, et al (2019) Few-shot image recognition with knowledge transfer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 441–449
Qin Z, Zhang P, Wu F, et al (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 783–792
Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adva Neural Inf Process Syst 28
Sandler M, Howard A, Zhu M, et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4510–4520
Shojaei G, Razzazi F (2019) Semi-supervised domain adaptation for pedestrian detection in video surveillance based on maximum independence assumption. Int J Multimedia Inf Retriev 8(4):241–252
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci
Szczecinski L, Djebbi A (2020) Understanding draws in elo rating algorithm. J Quant Anal Sports 16(3):211–220
Article Google Scholar
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9
Wang F, Jiang M, Qian C, et al (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3156–3164
Wang Q, Wu B, Zhu P, et al (2020) Eca-net: efficient channel attention for deep convolutional neural networks, 2020 ieee. In: CVF conference on computer vision and pattern recognition (CVPR). IEEE
Wang X, Girshick R, Gupta A, et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803
Woo S, Park J, Lee JY, et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). pp 3–19
Xie J, Ma Z, Chang D, et al (2021) Gpca: a probabilistic framework for gaussian process embedded channel attention. IEEE Trans Pattern Anal Mach Intell
Xie S, Girshick R, Dollár P, et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492–1500
Yang L, Zhang RY, Li L, et al (2021) Simam: a simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning. PMLR, pp 11863–11874
Yang Z, Zhu L, Wu Y, et al (2020) Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 11794–11803
Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, pp 173–190
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision. Springer, Cham, pp 818–833

Download references

Acknowledgements

The authors acknowledge the financial supported by the Fundamental Research Funds for the Provincial Universities of Zhejiang (Grant No. GK219909299001-015), the Natural Science Foundation of China (Grant No. 62206082), National Undergraduate Training Program for Innovation and Entrepreneurship (Grant No. 202110336042) and Planted talent plan (Grant No. 2022R407A002).

Author information

Silin Cai and Changping Wang have been contributed equally.

Authors and Affiliations

Zhuoyue Honor College, Hangzhou Dianzi University, No. 2 Main Street, Hangzhou, 310018, Zhejiang, China
Silin Cai & Changping Wang
School of Computer Science and Technology, Hangzhou Dianzi University, No. 2 Main Street, Hangzhou, 310018, Zhejiang, China
Jiajun Ding, Jun Yu & Jianping Fan
Hangzhou Dianzi University Shangyu Institute of Science and Engineering, Shangyu, 312300, Zhejiang, China
Jiajun Ding

Authors

Silin Cai
View author publications
You can also search for this author inPubMed Google Scholar
Changping Wang
View author publications
You can also search for this author inPubMed Google Scholar
Jiajun Ding
View author publications
You can also search for this author inPubMed Google Scholar
Jun Yu
View author publications
You can also search for this author inPubMed Google Scholar
Jianping Fan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jianping Fan.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cai, S., Wang, C., Ding, J. et al. FDAM: full-dimension attention module for deep convolutional neural networks. Int J Multimed Info Retr 11, 599–610 (2022). https://doi.org/10.1007/s13735-022-00248-3

Download citation

Received: 02 April 2022
Revised: 03 June 2022
Accepted: 03 August 2022
Published: 08 November 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s13735-022-00248-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FDAM: full-dimension attention module for deep convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LMA: lightweight mixed-domain attention for efficient network design

DoubleViT: Pushing Transformers Towards the End Because of Convolutions

TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now