An adaptive multi-sensor visual attention model

Chen, Wenbai; Li, Jingchen; Shi, Haobin; Hwang, Kao-Shing

doi:10.1007/s00521-021-06857-z

An adaptive multi-sensor visual attention model

Original Article
Published: 29 January 2022

Volume 34, pages 7241–7252, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Wenbai Chen¹,
Jingchen Li²,
Haobin Shi² &
…
Kao-Shing Hwang ORCID: orcid.org/0000-0001-9234-4836³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

The emerging recurrent visual attention models mostly utilize a sensor to continuously capture features from the input, which requires a suited design for the sensor. Researchers usually need a number of attempts to determine optimal structures for the sensor and corresponding modules. In this work, an adaptive multi-sensor visual attention model (AM-MA) is proposed to enhance the recurrent visual attention model. The proposed model uses several sensors to observe the original input recurrently, while the number of sensors can be added adaptively. Each sensor generates a hidden state and is followed by a location network to provide the deployment scheme. We design a self-evaluation mechanism for AM-MA, by which it can decide whether to add new sensors during training. Besides, the proposed AM-MA leverages a fine-tune mechanism to avoid a lengthy training process. AM-MA is a parameter-insensitive model. That is, there is no need for researchers to pre-train the model for finding the optimal structure in the case of unknown complexity. Experimental results show that the proposed AM-MA not only outperforms the renowned sensor-based attention model on image classification tasks, but also achieves satisfactory results when given an inappropriate structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FDAM: full-dimension attention module for deep convolutional neural networks

Article 08 November 2022

Inner-imaging 3D attention module for residual network

Article Open access 13 April 2022

Map-and-acquisition networks

Article 07 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952
Chorowski JK, Bahdanau D, Serdyuk D et al (2015) Attention-based models for speech recognition. In: Proceedings of the 28th international conference on neural information processing systems, vol 1, pp 577–585
Google Scholar
Coates A, Ng A, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 215–223
Elsayed G, Kornblith S, Le QV (2019) Saccader: improving accuracy of hard attention models for vision. In: Proceedings of the 33rd international conference on neural information processing systems, pp 702–714
Eom H, Lee D, Han S, Hariyani YS, Lim Y, Sohn I, Park K, Park C (2020) End-to-end deep learning architecture for continuous blood pressure estimation using attention mechanism. Sensors 20(8):2338
Article Google Scholar
Ji Y, Zhang H, Jie Z, Ma L, Wu QMJ (2020) Casnet: a cross-attention siamese network for video salient object detection. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.3007534
Article Google Scholar
Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R (2016) Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387
Lee KH, Chen X, Hua G, Hu H, He X (2018) Stacked cross attention for image-text matching. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 201–216
Li J, Shi H, Hwang KS (2021) An explainable ensemble feedforward method with gaussian convolutional filter. Knowl Based Syst 225:107103
Article Google Scholar
Li R, Zheng S, Duan C, Yang Y, Wang X (2020) Classification of hyperspectral image based on double-branch dual-attention mechanism network. Remote Sens 12(3):582
Article Google Scholar
Lin L, Luo H, Huang R, Ye M (2019) Recurrent models of visual co-attention for person re-identification. IEEE Access 7:8865–8875
Article Google Scholar
Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025
Luttrell J, Zhou Z, Zhang Y, Zhang C, Gong P, Yang B, Li R (2018) A deep transfer learning approach to fine-tuning facial recognition models. In: 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 2671–2676. IEEE
Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: Proceedings of the 27th international conference on neural information processing systems, vol 2, pp 2204–2212
Olson RS, Moore JH, Adami C (2016) Evolution of active categorical image classification via saccadic eye movement. In: International Conference on Parallel Problem Solving from Nature. Springer, pp. 581–590
Ouyang W, Wang X, Zhang C, Yang X (2016) Factors in finetuning deep model for object detection with long-tail distribution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 864–873
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Reed S, Akata Z, Lee H, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–58
Semeniuta S, Barth E (2016) Image classification with recurrent attention models. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7
Serra J, Suris D, Miron M, Karatzoglou A (2018) Overcoming catastrophic forgetting with hard attention to the task. arXiv preprint arXiv:1801.01423
Shi H, Li J, Mao J, Hwang KS (2021) Lateral transfer learning for multiagent reinforcement learning. IEEE Trans Cybern
Stollenga MF, Masci J, Gomez F, Schmidhuber J (2014) Deep networks with internal selective attention through feedback connections. In: Proceedings of the 27th international conference on neural information processing systems, vol 2, pp 3545–3553
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems, pp 6000–6010
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3156–3164
Wei XS, Xie CW, Wu J (2016) Mask-cnn: Localizing parts and selecting descriptors for fine-grained image recognition. arXiv preprint arXiv:1605.06878
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850
Yan S, Xie Y, Wu F, Smith JS, Lu W, Zhang B (2020) Image captioning via hierarchical attention mechanism and policy gradient optimization. Signal Process 167:107329
Article Google Scholar
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Proceedings of the 27th international conference on neural information processing systems, vol 2, 3320–3328
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European Conference on Computer Vision. Springer, pp. 834–849
Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimed 19(6):1245–1256
Article Google Scholar
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5209–5217
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 207–212
Zhou R, Shen YD (2020) End-to-end adversarial-attention network for multi-modal clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14619–14628

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grant 62076202, 61976178.

Author information

Authors and Affiliations

Automation School, Beijing Information Science and Technology University, Beijing, 100192, China
Wenbai Chen
School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, Shaanxi Province, China
Jingchen Li & Haobin Shi
Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung, 80424, Taiwan
Kao-Shing Hwang

Authors

Wenbai Chen
View author publications
You can also search for this author inPubMed Google Scholar
Jingchen Li
View author publications
You can also search for this author inPubMed Google Scholar
Haobin Shi
View author publications
You can also search for this author inPubMed Google Scholar
Kao-Shing Hwang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Haobin Shi or Kao-Shing Hwang.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, W., Li, J., Shi, H. et al. An adaptive multi-sensor visual attention model. Neural Comput & Applic 34, 7241–7252 (2022). https://doi.org/10.1007/s00521-021-06857-z

Download citation

Received: 25 December 2020
Accepted: 12 December 2021
Published: 29 January 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00521-021-06857-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive multi-sensor visual attention model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FDAM: full-dimension attention module for deep convolutional neural networks

Inner-imaging 3D attention module for residual network

Map-and-acquisition networks

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now