Context-aware attention network for image recognition

Leng, Jiaxu; Liu, Ying; Chen, Shang

doi:10.1007/s00521-019-04281-y

Context-aware attention network for image recognition

Original Article
Published: 18 June 2019

Volume 31, pages 9295–9305, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jiaxu Leng ORCID: orcid.org/0000-0003-2802-8139^1,2,
Ying Liu^1,2,3 &
Shang Chen⁴

743 Accesses
12 Citations
Explore all metrics

Abstract

Existing recognition methods based on deep learning have achieved impressive performance. However, most of these algorithms do not fully utilize the contexts and discriminative parts, which limit the recognition performance. In this paper, we propose a context-aware attention network that imitates the human visual attention mechanism. The proposed network mainly consists of a context learning module and an attention transfer module. Firstly, we design the context learning module that carries on contextual information transmission along four directions: left, right, top and down to capture valuable contexts. Second, the attention transfer module is proposed to generate attention maps that contain different attention regions, benefiting for extracting discriminative features. Specially, the attention maps are generated through multiple glimpses. In each glimpse, we generate the corresponding attention map and apply it to the next glimpse. This means that our attention is shifting constantly, and the shift is not random but is closely related to the last attention. Finally, we consider all located attention regions to achieve accurate image recognition. Experimental results show that our method achieves state-of-the-art performance with 97.68% accuracy, 82.42% accuracy, 80.32% accuracy and 86.12% accuracy on CIFAR-10, CIFAR-100, Caltech-256 and CUB-200, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FDAM: full-dimension attention module for deep convolutional neural networks

Article 08 November 2022

TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

TripleFormer: improving transformer-based image classification method using multiple self-attention inputs

Article 01 March 2024

References

Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: ECCV. Springer, pp 850–865
Nam H, Han B (2016) Learning multi-domain convolutional neural net-506 works for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Chen LC, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Girshick RB, Donahue J, Darrell T et al (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158
Article Google Scholar
Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. arXiv preprint, 1612
Santoro A, Raposo D, Barrett DG et al (2017) A simple neural network module for relational reasoning. In: Advances in neural information processing systems, pp 4974–4983
Leng J, Liu Y (2018) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3486-1
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. ICLR
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194
Article Google Scholar
Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: NIPS
Zhao B, Wu X, Feng J, Peng Q, Yan S (2016) Diversified visual attention networks for fine-grained object classification. arXiv preprint arXiv:1606.08572
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: CVPR, pp 842–850
Liu X, Xia T, Wang J, Lin Y (2016) Fully convolutional attention localization networks: efficient attention localization for fine-grained recognition. CoRR arXiv:1603.06765
Ji Y, Zhang H, Wu QMJ (2018) Salient object detection via multi-scale attention CNN. Neurocomputing 322:130–140
Article Google Scholar
Zhang H, Ji Y, Huang W et al (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3579-x
Article Google Scholar
Xu K, Ba J, Kiros R et al (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Chen L, Zhang H, Xiao J et al (2017) SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667
Seo PH, Lin Z, Cohen S et al (2016) Progressive attention networks for visual attribute prediction. arXiv preprint arXiv:1606.02393
Das D, George Lee CS (2018) Sample-to-sample correspondence for unsupervised domain adaptation. Eng Appl Artif Intell 73:80–91
Article Google Scholar
Das D, George Lee CS (2018) Unsupervised domain adaptation using regularized hyper-graph matching. In: 2018 25th IEEE international conference on image processing (ICIP). IEEE
Courty N et al (2017) Optimal transport for domain adaptation. IEEE Trans Pattern Anal Mach Intell 39(9):1853–1865
Article Google Scholar
Larochelle H, Hinton GE (2010) Learning to combine foveal glimpses with a third-order Boltzmann machine. In: Advances in neural information processing systems, pp 1243–1251
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Kim JH, Lee SW, Kwak D et al (2016) Multimodal residual learning for visual QA. In: Advances in neural information processing systems, pp 361–369
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems, pp 2377–2385
Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025
Xiao T, Xu Y, Yang K et al (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842–850
Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. CVPR 2:3
Google Scholar
Wang F et al (2017) Residual attention network for image classification. In: CVPR
Divvala SK, Hoiem D, Hays JH, Efros AA, Hebert M (2009) An empirical study of context in object detection. In: CVPR
Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: CVPR
Uijlings JR, De Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
He K, Zhang X, Ren S et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361
Chapter Google Scholar
Girshick RB (2015) Fast R-CNN. In: International conference on computer vision, pp 1440–1448
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset
Welinder P, Branson S, Mita T, Wah C, Schroff F, Be-longie S, Perona P (2010) Caltech-UCSD Birds 200. Technical report CNS-TR-2010-001, California Institute of Technology
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: IEEE CVPR 2004, workshop on generative-model based vision
Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: NIPS, pp 2017–2025
Wang F, Jiang M, Qian C et al (2017) Residual attention network for image classification. arXiv preprint arXiv:1704.06904
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR, pp 1409–1556
Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition

Download references

Acknowledgements

This project was partially supported by Grants from Natural Science Foundation of China 71671178, 91546201. It was also supported by University of Chinese Academy of Sciences Project Y954016XX2, and by Guangdong Provincial Science and Technology Project 2016B010127004.

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 101400, China
Jiaxu Leng & Ying Liu
Data Mining and High Performance Computing Lab, Chinese Academy of Sciences, Beijing, 101400, China
Jiaxu Leng & Ying Liu
Key Lab of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, 100190, China
Ying Liu
School of Information and Communication, Guilin University of Electronic Technology, Guilin, 541004, China
Shang Chen

Authors

Jiaxu Leng
View author publications
You can also search for this author in PubMed Google Scholar
Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Liu.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leng, J., Liu, Y. & Chen, S. Context-aware attention network for image recognition. Neural Comput & Applic 31, 9295–9305 (2019). https://doi.org/10.1007/s00521-019-04281-y

Download citation

Received: 04 February 2019
Accepted: 30 May 2019
Published: 18 June 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00521-019-04281-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-aware attention network for image recognition

Abstract

Access this article

Similar content being viewed by others

FDAM: full-dimension attention module for deep convolutional neural networks

TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

TripleFormer: improving transformer-based image classification method using multiple self-attention inputs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Context-aware attention network for image recognition

Abstract

Access this article

Similar content being viewed by others

FDAM: full-dimension attention module for deep convolutional neural networks

TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

TripleFormer: improving transformer-based image classification method using multiple self-attention inputs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation