Skip to main content
Log in

An adaptive multi-sensor visual attention model

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

The emerging recurrent visual attention models mostly utilize a sensor to continuously capture features from the input, which requires a suited design for the sensor. Researchers usually need a number of attempts to determine optimal structures for the sensor and corresponding modules. In this work, an adaptive multi-sensor visual attention model (AM-MA) is proposed to enhance the recurrent visual attention model. The proposed model uses several sensors to observe the original input recurrently, while the number of sensors can be added adaptively. Each sensor generates a hidden state and is followed by a location network to provide the deployment scheme. We design a self-evaluation mechanism for AM-MA, by which it can decide whether to add new sensors during training. Besides, the proposed AM-MA leverages a fine-tune mechanism to avoid a lengthy training process. AM-MA is a parameter-insensitive model. That is, there is no need for researchers to pre-train the model for finding the optimal structure in the case of unknown complexity. Experimental results show that the proposed AM-MA not only outperforms the renowned sensor-based attention model on image classification tasks, but also achieves satisfactory results when given an inappropriate structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  2. Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952

  3. Chorowski JK, Bahdanau D, Serdyuk D et al (2015) Attention-based models for speech recognition. In: Proceedings of the 28th international conference on neural information processing systems, vol 1, pp 577–585

    Google Scholar 

  4. Coates A, Ng A, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 215–223

  5. Elsayed G, Kornblith S, Le QV (2019) Saccader: improving accuracy of hard attention models for vision. In: Proceedings of the 33rd international conference on neural information processing systems, pp 702–714

  6. Eom H, Lee D, Han S, Hariyani YS, Lim Y, Sohn I, Park K, Park C (2020) End-to-end deep learning architecture for continuous blood pressure estimation using attention mechanism. Sensors 20(8):2338

    Article  Google Scholar 

  7. Ji Y, Zhang H, Jie Z, Ma L, Wu QMJ (2020) Casnet: a cross-attention siamese network for video salient object detection. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.3007534

    Article  Google Scholar 

  8. Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R (2016) Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387

  9. Lee KH, Chen X, Hua G, Hu H, He X (2018) Stacked cross attention for image-text matching. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 201–216

  10. Li J, Shi H, Hwang KS (2021) An explainable ensemble feedforward method with gaussian convolutional filter. Knowl Based Syst 225:107103

    Article  Google Scholar 

  11. Li R, Zheng S, Duan C, Yang Y, Wang X (2020) Classification of hyperspectral image based on double-branch dual-attention mechanism network. Remote Sens 12(3):582

    Article  Google Scholar 

  12. Lin L, Luo H, Huang R, Ye M (2019) Recurrent models of visual co-attention for person re-identification. IEEE Access 7:8865–8875

    Article  Google Scholar 

  13. Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025

  14. Luttrell J, Zhou Z, Zhang Y, Zhang C, Gong P, Yang B, Li R (2018) A deep transfer learning approach to fine-tuning facial recognition models. In: 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 2671–2676. IEEE

  15. Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: Proceedings of the 27th international conference on neural information processing systems, vol 2, pp 2204–2212

  16. Olson RS, Moore JH, Adami C (2016) Evolution of active categorical image classification via saccadic eye movement. In: International Conference on Parallel Problem Solving from Nature. Springer, pp. 581–590

  17. Ouyang W, Wang X, Zhang C, Yang X (2016) Factors in finetuning deep model for object detection with long-tail distribution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 864–873

  18. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  19. Reed S, Akata Z, Lee H, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–58

  20. Semeniuta S, Barth E (2016) Image classification with recurrent attention models. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7

  21. Serra J, Suris D, Miron M, Karatzoglou A (2018) Overcoming catastrophic forgetting with hard attention to the task. arXiv preprint arXiv:1801.01423

  22. Shi H, Li J, Mao J, Hwang KS (2021) Lateral transfer learning for multiagent reinforcement learning. IEEE Trans Cybern

  23. Stollenga MF, Masci J, Gomez F, Schmidhuber J (2014) Deep networks with internal selective attention through feedback connections. In: Proceedings of the 27th international conference on neural information processing systems, vol 2, pp 3545–3553

  24. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems, pp 6000–6010

  25. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3156–3164

  26. Wei XS, Xie CW, Wu J (2016) Mask-cnn: Localizing parts and selecting descriptors for fine-grained image recognition. arXiv preprint arXiv:1605.06878

  27. Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850

  28. Yan S, Xie Y, Wu F, Smith JS, Lu W, Zhang B (2020) Image captioning via hierarchical attention mechanism and policy gradient optimization. Signal Process 167:107329

    Article  Google Scholar 

  29. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Proceedings of the 27th international conference on neural information processing systems, vol 2, 3320–3328

  30. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European Conference on Computer Vision. Springer, pp. 834–849

  31. Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimed 19(6):1245–1256

    Article  Google Scholar 

  32. Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5209–5217

  33. Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 207–212

  34. Zhou R, Shen YD (2020) End-to-end adversarial-attention network for multi-modal clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14619–14628

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grant 62076202, 61976178.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Haobin Shi or Kao-Shing Hwang.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, W., Li, J., Shi, H. et al. An adaptive multi-sensor visual attention model. Neural Comput & Applic 34, 7241–7252 (2022). https://doi.org/10.1007/s00521-021-06857-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06857-z

Keywords