Skip to main content

Visual Question Answering of Remote Sensing Image Based on Attention Mechanism

  • Conference paper
  • First Online:
Intelligence Science IV (ICIS 2022)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 659))

Included in the following conference series:

Abstract

In recent years, the research of attention mechanism has made significant progress in the field of computer vision. In the processing of visual problems of remote sensing images, the attention mechanism can make the computer focus on important image areas and improve the accuracy of question answering. Our research focuses on the role of synergistic attention mechanisms in the interaction of question representations and visual representations. On the basis of Modular Collaborative Attention (MCA), according to the complementary characteristics of global features and local features, the hybrid connection strategy is used to perceive global features at the same time without weakening the attention distribution of local features. The impact of attention mechanisms on various types of visual question answering questions has been evaluated:(i) scene classification (ii)object comparison (iii) quantitative statistics (iv) relational judgment. By fusing the global features and local features of different modalities, the model can obtain more information between modalities. Model performance evaluation under the RSVQA-LR dataset. Experimental results show, the method in this paper improves the global accuracy by 9.81% than RSVQA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014)

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  3. Chen, Y.-C., et al.: UNITER: UNiversal Image-TExt Representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 104–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_7

    Chapter  Google Scholar 

  4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  5. Ilievski, I., Yan, S., Feng, J.: A focused dynamic attention model for visual question answering. arXiv preprint arXiv:1604.01485 (2016)

  6. Kumar, A., et al.: Ask me anything: Dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387. PMLR (2016)

    Google Scholar 

  7. Li, X., et al.: Oscar: object-semantics aligned pre-training for vision-language tasks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 121–137. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_8

    Chapter  Google Scholar 

  8. Lobry, S., Marcos, D., Murray, J., Tuia, D.: RSVQA: visual question answering for remote sensing data. IEEE Trans. Geosci. Remote Sens. 58(12), 8555–8566 (2020)

    Article  Google Scholar 

  9. Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  10. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

  11. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26 (2013)

    Google Scholar 

  12. Xiong, C., Merity, S., Socher, R.: Dynamic memory networks for visual and textual question answering. In: International Conference on Machine Learning. pp. 2397–2406. PMLR (2016)

    Google Scholar 

  13. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)

    Google Scholar 

  14. Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61772399 and Grant 62101517, in part by the Key Research and Development Program in Shaanxi Province of China under Grant 2019ZDLGY09-05, and in part by the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangyang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, S., Wei, Q., Li, Y., Chen, Y., Jiao, L. (2022). Visual Question Answering of Remote Sensing Image Based on Attention Mechanism. In: Shi, Z., Jin, Y., Zhang, X. (eds) Intelligence Science IV. ICIS 2022. IFIP Advances in Information and Communication Technology, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-031-14903-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14903-0_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14902-3

  • Online ISBN: 978-3-031-14903-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics