HTCViT: an effective network for image classification and segmentation based on natural disaster datasets

Ma, Zhihao; Li, Wei; Zhang, Muyang; Meng, Weiliang; Xu, Shibiao; Zhang, Xiaopeng

doi:10.1007/s00371-023-02954-3

HTCViT: an effective network for image classification and segmentation based on natural disaster datasets

Original article
Published: 04 July 2023

Volume 39, pages 3285–3297, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Zhihao Ma^1,3,
Wei Li^1,3,
Muyang Zhang^1,3,
Weiliang Meng ORCID: orcid.org/0000-0002-3221-4981^1,2,3,
Shibiao Xu⁴ &
…
Xiaopeng Zhang^1,2,3

250 Accesses
1 Citation
Explore all metrics

Abstract

Classifying and segmenting natural disaster images are crucial for predicting and responding to disasters. However, current convolutional networks perform poorly in processing natural disaster images, and there are few proprietary networks for this task. To address the varying scales of the region of interest (ROI) in these images, we propose the Hierarchical TSAM-CB-ViT (HTCViT) network, which builds on the ViT network’s attention mechanism to better process natural disaster images. Considering that ViT excels at extracting global context but struggles with local features, our method combines the strengths of ViT and convolution, and can capture overall contextual information within each patch using the Triple-Strip Attention Mechanism (TSAM) structure. Experiments validate that our HTCViT can improve the classification task with \(3-4 \%\) and the segmentation task with \(1-2 \%\) on natural disaster datasets compared to the vanilla ViT network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Triple-strip attention mechanism-based natural disaster images classification and segmentation

Article 18 June 2022

A novel convolutional neural network model with hybrid attentional atrous convolution module for detecting the areas affected by the flood

Article 24 November 2023

HybridNet: Integrating Multiple Approaches for Aerial Semantic Segmentation

Article 27 December 2023

Data availability

Data will be made available on request.

References

Jonkman, Sebastiaan N., Vrijling, Johannes K.: Loss of life due to floods. J. Flood Risk Manag. 1(1), 43–56 (2008)
Article Google Scholar
Cruden, David: A simple definition of a landslide. Bulletin of the International Association of Engineering Geology - Bulletin de l’Association Internationale de Géologie de l’Ingénieur 43, 27–29 (1991)
Article Google Scholar
Lendering, K.T., Jonkman, S.N., Kok, M.: Effectiveness of emergency measures for flood prevention. J. Flood Risk Manag. 9(4), 320–334 (2016)
Article Google Scholar
Sato, Hiroshi P.: Hasegawa, Hiroyuki, Fujiwara, Satoshi, Tobita, Mikio, Koarai, Mamoru, Une, Hiroshi, Iwahashi, Junko, Interpretation of landslide distribution triggered by the northern pakistan earthquake using spot 5 imagery. Landslides 4(113–122), 2007 (2005)
Google Scholar
Bellotti, Fernando, Bianchi, Marco, Colombo, Davide, Ferretti, Alessandro, Tamburini, Andrea: Advanced insar techniques to support landslide monitoring. In: Mathematics of Planet Earth: Proceedings of the 15th Annual Conference of the International Association for Mathematical Geosciences, pages , 287–290 (2013)
geomorphological features and landslide distribution: A Rosi, V Tofani, L Tanteri, C Tacconi Stefanelli, A Agostini, F Catani, and N Casagli. The new landslide inventory of tuscany (italy) updated with ps-insar. Landslides 15, 5–19 (2018)
Ding, Anzi, Zhang, Qingyong, Zhou, Xinmin, Dai, Bicheng: Automatic recognition of landslide based on cnn and texture change detection. In: 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), pages 444–448 (2016)
Balestriero, Randall, Pesenti, Jerome, LeCun, Yann: Learning in high dimension always amounts to extrapolation. arXiv preprint arXiv:2110.09485, 2021
Panboonyuen, Teerapong, Jitkajornwanich, Kulsawasd, Lawawirojwong, Siam, Srestasathiern, Panu, Vateekul, Peerapon: Transformer-based decoder designs for semantic segmentation on remotely sensed images. Remote Sensing 13(24) (2021)
Ma, Zhihao, Yuan, Mengke, Jiaming, Gu., Meng, Weiliang, Shibiao, Xu., Zhang, Xiaopeng: Triple-strip attention mechanism-based natural disaster images classification and segmentation. Visual Comput 38, 3163–3173 (2022)
Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. International Conference on Learning Representions (2016)
Lin, Tsung-Yi., Dollár, Piotr, Girshick, Ross, et al.: Feature pyramid networks for object detection.In: CVPR, pages 2117–2125 (2017)
Dosovitskiy, Alexey, Beyer, Lucas, et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Zheng, Sixiao, Jiachen, Lu., Zhao, Hengshuang, et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR, pages 6881–6890 (2021)
Strudel, Robin, Garcia, Ricardo, Laptev, Ivan: and Cordelia Schmid. Segmenter Transformer for semantic segmentation. In: ICCV, pages 7262–7272 (2021)
Carion, Nicolas, Massa, Francisco, Synnaeve, Gabriel, et al.: End-to-end object detection with transformers. In: European conference on computer vision, pages 213–22 (2020)
Jiang, Yifan: Chang, Shiyu, Wang, Zhangyang: Transgan: two pure transformers can make one strong gan, and that can scale up. Adv. Neural Inf. Proc. Syst. 34, 14745–14758 (2021)
Google Scholar
Peng, Zhiliang, Huang, Wei, Shanzhi, Gu., et al.: Conformer: local features coupling global representations for visual recognition. In: ICCV, pages 367–376 (2021)
Jie, Hu., Shen, Li., Sun, Gang: squeeze-and-excitation networks. In: CVPR, pages 7132–7141 (2018)
Woo, Sanghyun, Park, Jongchan, Lee, Joon-Young., et al.: Cbam: convolutional block attention module. In: ECCV, pages 3–19 (2018)
Park, Jongchan, Woo, Sanghyun, et al.: Bam: bottleneck attention module. In: British Machine Vision Conference (BMVC) (2018)
Hou, Qibin, Zhou, Daquan, Feng, Jiashi: Coordinate attention for efficient mobile network design. In: CVPR, pages 13713–13722 (2021)
Hou, Qibin, Zhang, Li., Cheng, Ming-Ming., Feng, Jiashi: Strip pooling: rethinking spatial pooling for scene parsing. In: CVPR, pages 4003–4012 (2020)
Wang, Jingdong, Sun, Ke., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021)
He, Kaiming, Zhang, Xiangyu, et al.: Deep residual learning for image recognition. In: CVPR, pages 770–778 (2016)
Ji, Shunping, Dawen, Yu., Shen, Chaoyong, Li, Weile, Qiang, Xu.: Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 17(6), 1337–1352 (2020)
Barz, Björn, Schröter, Kai, Münch, Moritz, et al.: Enhancing flood impact analysis using interactive retrieval of social media images. arXiv:1908.03361 (2019)
Misra, Diganta, Nalamada, Trikay: Ajay Uppili Arasanipalai, and Qibin Hou. Rotate to attend: convolutional triplet attention module . In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3139–3148 (2021)
Zhou, Bolei, Zhao, Hang, Puig, Xavier, et al.: Scene parsing through ade20k dataset. In: CVPR, pages 633–641 (2017)
Huang, Gao, Liu, Zhuang, Maaten, Laurens Van Der, Weinberger, Kilian Q.: Densely connected convolutional networks. In: CVPR, pages 2261–2269 (2017)
Szegedy, Christian, Liu, Wei, et al.: Going deeper with convolutions. In: CVPR, pages 1–9 (2015)
Szegedy, Christian, Vanhoucke, Vincent, Ioffe, Sergey, Shlens, Jon, Wojna, Zbigniew: Rethinking the inception architecture for computer vision. In: CVPR, pages 2818–2826 (2016)
Howard, Andrew G, Zhu, Menglong, Chen, Bo, et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv e-prints, pages arXiv–1704 (2017)
Sandler, Mark, Howard, Andrew, Zhu, Menglong, et al.: Mobilenetv2: inverted residuals and linear bottlenecks. In: CVPR, pages 4510–4520 (2018)
Zoph, Barret, Vasudevan, Vijay, Shlens, Jonathon, Le, Quoc V, Learning transferable architectures for scalable image recognition. In: CVPR, pages 8697–8710 (2018)
Simonyan, Karen, Zisserman, Andrew, Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (2015)
Chollet, François, Xception: deep learning with depthwise separable convolutions. In: CVPR, pages 1251–1258 (2017)
Tan, Mingxing, Le, Quoc, Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pages 6105–6114 (2019)
Ronneberger, Olaf, Fischer, Philipp, Brox, Thomas, U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pages 234–241 (2015)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Nos. U21A20515, 61972459, 62172416, 62102414, U2003109, 62071157, 62171321, and 62162044), in part by Open Research Projects of ZhejiangLab (No.2021KE0AB07) and the Project TC210H00L/42.

Author information

Authors and Affiliations

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Zhihao Ma, Wei Li, Muyang Zhang, Weiliang Meng & Xiaopeng Zhang
Zhejiang Lab, Hangzhou, China
Weiliang Meng & Xiaopeng Zhang
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhihao Ma, Wei Li, Muyang Zhang, Weiliang Meng & Xiaopeng Zhang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
Shibiao Xu

Authors

Zhihao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Muyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weiliang Meng
View author publications
You can also search for this author in PubMed Google Scholar
Shibiao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Weiliang Meng or Shibiao Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 131092 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, Z., Li, W., Zhang, M. et al. HTCViT: an effective network for image classification and segmentation based on natural disaster datasets. Vis Comput 39, 3285–3297 (2023). https://doi.org/10.1007/s00371-023-02954-3

Download citation

Accepted: 10 June 2023
Published: 04 July 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00371-023-02954-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HTCViT: an effective network for image classification and segmentation based on natural disaster datasets

Abstract

Access this article

Similar content being viewed by others

Triple-strip attention mechanism-based natural disaster images classification and segmentation

A novel convolutional neural network model with hybrid attentional atrous convolution module for detecting the areas affected by the flood

HybridNet: Integrating Multiple Approaches for Aerial Semantic Segmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HTCViT: an effective network for image classification and segmentation based on natural disaster datasets

Abstract

Access this article

Similar content being viewed by others

Triple-strip attention mechanism-based natural disaster images classification and segmentation

A novel convolutional neural network model with hybrid attentional atrous convolution module for detecting the areas affected by the flood

HybridNet: Integrating Multiple Approaches for Aerial Semantic Segmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation