Abstract
Satellite Image Situation Awareness (SISA) is a task that generates semantic situations from satellite images automatically. It requires not only the position and basic attributions (color, size, etc.) of targets but also the relationships (counting, relative position, existence, comparison, etc.) among them and realization of the situation analysis rules. We propose a novel framework which consists of the Background Process, Visual Question Answering (VQA), and Association Rules Set (ARS), in which, the Background Process deals with the situational map, the VQA and ARS identifies the relationships through answering a set of questions on SISA. To verify the performance of our method, we build the evaluation dataset based on CLEVR. Experiments demonstrate that our approach outperforms the traditional SISA systems on accuracy and automaticity. To the best of our knowledge, we are the first to solve SA problem using VQA method. The meaning of our research are: (1) We provide the possibility that SISA can be accomplished through VQA (without precise scene graph). (2) We broaden the application of VQA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Antol, S., et al.: VQA: Visual Question Answering. In: International Conference on Computer Vision, pp. 2425–2433. IEEE Press (2015). https://doi.org/10.1109/ICCV.2015.279
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017). https://doi.org/10.1007/s11263-016-0981-7
Zhu, Y., Groth, O., Bernstein, M.S., Feifei, L.: Visual7W: grounded question answering in images. In: Computer Vision and Pattern Recognition, pp. 4995–5004 (2016). https://doi.org/10.1109/CVPR.2016.540
Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., Fidler, S.: MovieQA: understanding stories in movies through question-answering. In: Computer Vision and Pattern Recognition, pp. 4631–4640 (2016). https://doi.org/10.1109/CVPR.2016.501
Johnson, J., Hariharan, B., van der Maaten, L., Feifei, L., Zitnick, C.L., Girshick, R.B.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: Computer Vision and Pattern Recognition, pp. 1988–1997 (2017)
Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: Computer Vision and Pattern Recognition, pp. 39–48 (2016)
Johnson, J., et al.: Inferring and executing programs for visual reasoning, pp. 3008–3017 (2017). https://doi.org/10.1109/ICCV.2017.325
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Wickens, C.D.: Situation awareness review of Mica Endsley’s 1995 articles on situation awareness theory and measurement. Hum. Factors J. Hum. Factors Ergon. Soc. 50(3), 397–403 (2008). https://doi.org/10.1518/001872008X288420
Kokara, M.M., Matheusb, C.J., Baclawskic, K.: Ontology-based situation awareness. Inf. Fusion 10(1), 83–98 (2009)
Li, Z., Itti, L.: Saliency and gist features for target detection in satellite images. IEEE Trans. Image Process. 20(7), 2017–2029 (2011). https://doi.org/10.1109/TIP.2010.2099128
Wu, H., Zhang, H., Zhang, J., Xu, F.: Typical target detection in satellite images based on convolutional neural networks, pp. 2956–2961 (2015). https://doi.org/10.1109/SMC.2015.514
Corbane, C., Najman, L., Pecoul, E., Demagistri, L., Petit, M.: A complete processing chain for ship detection using optical satellite imagery. Int. J. Remote Sensing 31(22), 5837–5854 (2010). https://doi.org/10.1080/01431161.2010.512310
Agrawal, A., Batra, D., Parikh, D.: Analyzing the behavior of visual question answering models. In: Empirical Methods in Natural Language Processing, pp. 1955–1960 (2016)
Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. In: North American Chapter of the Association for Computational Linguistics, pp. 1545–1554 (2016)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
He, K., et al.: Deep residual learning for image recognition. IEEE Press (2016). https://doi.org/10.1109/CVPR.2016.90
DigitalGlobe Company. https://www.digitalglobe.com
Xu, N., et al.: Scene graph captioner: image captioning based on structural visual representation. J. Vis. Commun. Image Represent. 58, 477–485 (2019)
Endsley, M.R.: Situation awareness: operationally necessary and scientifically grounded. Cogn. Technol. Work 17(2), 163–167 (2015)
Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: International Conference on Computer Vision, pp. 1270–1279 (2017). https://doi.org/10.1109/ICCV.2017.142
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Qu, X., Zhuang, D., Xie, H. (2019). Semantic Situation Extraction from Satellite Image Based on Neural Networks. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2019. Lecture Notes in Computer Science(), vol 11741. Springer, Cham. https://doi.org/10.1007/978-3-030-27532-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-27532-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27531-0
Online ISBN: 978-3-030-27532-7
eBook Packages: Computer ScienceComputer Science (R0)