Skip to main content
Log in

HTCViT: an effective network for image classification and segmentation based on natural disaster datasets

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Classifying and segmenting natural disaster images are crucial for predicting and responding to disasters. However, current convolutional networks perform poorly in processing natural disaster images, and there are few proprietary networks for this task. To address the varying scales of the region of interest (ROI) in these images, we propose the Hierarchical TSAM-CB-ViT (HTCViT) network, which builds on the ViT network’s attention mechanism to better process natural disaster images. Considering that ViT excels at extracting global context but struggles with local features, our method combines the strengths of ViT and convolution, and can capture overall contextual information within each patch using the Triple-Strip Attention Mechanism (TSAM) structure. Experiments validate that our HTCViT can improve the classification task with \(3-4 \%\) and the segmentation task with \(1-2 \%\) on natural disaster datasets compared to the vanilla ViT network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

Data will be made available on request.

References

  1. Jonkman, Sebastiaan N., Vrijling, Johannes K.: Loss of life due to floods. J. Flood Risk Manag. 1(1), 43–56 (2008)

    Article  Google Scholar 

  2. Cruden, David: A simple definition of a landslide. Bulletin of the International Association of Engineering Geology - Bulletin de l’Association Internationale de Géologie de l’Ingénieur 43, 27–29 (1991)

    Article  Google Scholar 

  3. Lendering, K.T., Jonkman, S.N., Kok, M.: Effectiveness of emergency measures for flood prevention. J. Flood Risk Manag. 9(4), 320–334 (2016)

    Article  Google Scholar 

  4. Sato, Hiroshi P.: Hasegawa, Hiroyuki, Fujiwara, Satoshi, Tobita, Mikio, Koarai, Mamoru, Une, Hiroshi, Iwahashi, Junko, Interpretation of landslide distribution triggered by the northern pakistan earthquake using spot 5 imagery. Landslides 4(113–122), 2007 (2005)

    Google Scholar 

  5. Bellotti, Fernando, Bianchi, Marco, Colombo, Davide, Ferretti, Alessandro, Tamburini, Andrea: Advanced insar techniques to support landslide monitoring. In: Mathematics of Planet Earth: Proceedings of the 15th Annual Conference of the International Association for Mathematical Geosciences, pages , 287–290 (2013)

  6. geomorphological features and landslide distribution: A Rosi, V Tofani, L Tanteri, C Tacconi Stefanelli, A Agostini, F Catani, and N Casagli. The new landslide inventory of tuscany (italy) updated with ps-insar. Landslides 15, 5–19 (2018)

  7. Ding, Anzi, Zhang, Qingyong, Zhou, Xinmin, Dai, Bicheng: Automatic recognition of landslide based on cnn and texture change detection. In: 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), pages 444–448 (2016)

  8. Balestriero, Randall, Pesenti, Jerome, LeCun, Yann: Learning in high dimension always amounts to extrapolation. arXiv preprint arXiv:2110.09485, 2021

  9. Panboonyuen, Teerapong, Jitkajornwanich, Kulsawasd, Lawawirojwong, Siam, Srestasathiern, Panu, Vateekul, Peerapon: Transformer-based decoder designs for semantic segmentation on remotely sensed images. Remote Sensing 13(24) (2021)

  10. Ma, Zhihao, Yuan, Mengke, Jiaming, Gu., Meng, Weiliang, Shibiao, Xu., Zhang, Xiaopeng: Triple-strip attention mechanism-based natural disaster images classification and segmentation. Visual Comput 38, 3163–3173 (2022)

  11. Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. International Conference on Learning Representions (2016)

  12. Lin, Tsung-Yi., Dollár, Piotr, Girshick, Ross, et al.: Feature pyramid networks for object detection.In: CVPR, pages 2117–2125 (2017)

  13. Dosovitskiy, Alexey, Beyer, Lucas, et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  14. Zheng, Sixiao, Jiachen, Lu., Zhao, Hengshuang, et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR, pages 6881–6890 (2021)

  15. Strudel, Robin, Garcia, Ricardo, Laptev, Ivan: and Cordelia Schmid. Segmenter Transformer for semantic segmentation. In: ICCV, pages 7262–7272 (2021)

  16. Carion, Nicolas, Massa, Francisco, Synnaeve, Gabriel, et al.: End-to-end object detection with transformers. In: European conference on computer vision, pages 213–22 (2020)

  17. Jiang, Yifan: Chang, Shiyu, Wang, Zhangyang: Transgan: two pure transformers can make one strong gan, and that can scale up. Adv. Neural Inf. Proc. Syst. 34, 14745–14758 (2021)

    Google Scholar 

  18. Peng, Zhiliang, Huang, Wei, Shanzhi, Gu., et al.: Conformer: local features coupling global representations for visual recognition. In: ICCV, pages 367–376 (2021)

  19. Jie, Hu., Shen, Li., Sun, Gang: squeeze-and-excitation networks. In: CVPR, pages 7132–7141 (2018)

  20. Woo, Sanghyun, Park, Jongchan, Lee, Joon-Young., et al.: Cbam: convolutional block attention module. In: ECCV, pages 3–19 (2018)

  21. Park, Jongchan, Woo, Sanghyun, et al.: Bam: bottleneck attention module. In: British Machine Vision Conference (BMVC) (2018)

  22. Hou, Qibin, Zhou, Daquan, Feng, Jiashi: Coordinate attention for efficient mobile network design. In: CVPR, pages 13713–13722 (2021)

  23. Hou, Qibin, Zhang, Li., Cheng, Ming-Ming., Feng, Jiashi: Strip pooling: rethinking spatial pooling for scene parsing. In: CVPR, pages 4003–4012 (2020)

  24. Wang, Jingdong, Sun, Ke., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021)

  25. He, Kaiming, Zhang, Xiangyu, et al.: Deep residual learning for image recognition. In: CVPR, pages 770–778 (2016)

  26. Ji, Shunping, Dawen, Yu., Shen, Chaoyong, Li, Weile, Qiang, Xu.: Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 17(6), 1337–1352 (2020)

  27. Barz, Björn, Schröter, Kai, Münch, Moritz, et al.: Enhancing flood impact analysis using interactive retrieval of social media images. arXiv:1908.03361 (2019)

  28. Misra, Diganta, Nalamada, Trikay: Ajay Uppili Arasanipalai, and Qibin Hou. Rotate to attend: convolutional triplet attention module . In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3139–3148 (2021)

  29. Zhou, Bolei, Zhao, Hang, Puig, Xavier, et al.: Scene parsing through ade20k dataset. In: CVPR, pages 633–641 (2017)

  30. Huang, Gao, Liu, Zhuang, Maaten, Laurens Van Der, Weinberger, Kilian Q.: Densely connected convolutional networks. In: CVPR, pages 2261–2269 (2017)

  31. Szegedy, Christian, Liu, Wei, et al.: Going deeper with convolutions. In: CVPR, pages 1–9 (2015)

  32. Szegedy, Christian, Vanhoucke, Vincent, Ioffe, Sergey, Shlens, Jon, Wojna, Zbigniew: Rethinking the inception architecture for computer vision. In: CVPR, pages 2818–2826 (2016)

  33. Howard, Andrew G, Zhu, Menglong, Chen, Bo, et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv e-prints, pages arXiv–1704 (2017)

  34. Sandler, Mark, Howard, Andrew, Zhu, Menglong, et al.: Mobilenetv2: inverted residuals and linear bottlenecks. In: CVPR, pages 4510–4520 (2018)

  35. Zoph, Barret, Vasudevan, Vijay, Shlens, Jonathon, Le, Quoc V, Learning transferable architectures for scalable image recognition. In: CVPR, pages 8697–8710 (2018)

  36. Simonyan, Karen, Zisserman, Andrew, Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (2015)

  37. Chollet, François, Xception: deep learning with depthwise separable convolutions. In: CVPR, pages 1251–1258 (2017)

  38. Tan, Mingxing, Le, Quoc, Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pages 6105–6114 (2019)

  39. Ronneberger, Olaf, Fischer, Philipp, Brox, Thomas, U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pages 234–241 (2015)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Nos. U21A20515, 61972459, 62172416, 62102414, U2003109, 62071157, 62171321, and 62162044), in part by Open Research Projects of ZhejiangLab (No.2021KE0AB07) and the Project TC210H00L/42.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weiliang Meng or Shibiao Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 131092 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Z., Li, W., Zhang, M. et al. HTCViT: an effective network for image classification and segmentation based on natural disaster datasets. Vis Comput 39, 3285–3297 (2023). https://doi.org/10.1007/s00371-023-02954-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02954-3

Keywords

Navigation