MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation

Li, Yansheng; Chen, Wei; Huang, Xin; Gao, Zhi; Li, Siwei; He, Tao; Zhang, Yongjun

doi:10.1007/s11432-022-3599-y

MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation

Research Paper
Published: 27 March 2023

Volume 66, article number 140305, (2023)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Yansheng Li¹,
Wei Chen¹,
Xin Huang¹,
Zhi Gao¹,
Siwei Li¹,
Tao He¹ &
…
Yongjun Zhang¹

1287 Accesses
49 Citations
14 Altmetric
2 Mentions
Explore all metrics

Abstract

In recent years, the remote sensing image (RSI) semantic segmentation attracts increasing research interest due to its wide application. RSIs are difficult to be processed holistically on current GPU cards on account of their large field-of-views (FOVs). However, the prevailing practices such as downsampling and cropping will inevitably decrease the quality of semantic segmentation. To address this conflict, this paper proposes a new deep adaptive fusion network with multiple FOVs (MFVNet), which is specially designed for RSI semantic segmentation. Different from existing methods, MFVNet takes into consideration the differences among multiple FOVs. By pyramid sampling the RSI, we first obtain images on different scales with multiple FOVs. Images on the high scale with a large FOV can capture larger spatial contexts and complete object contours, while images on the low scale with a small FOV can keep the higher spatial resolution and more detailed information. Then scale-specific models are chosen to make the best predictions for all scales. Next, the output feature maps and score maps are aligned through the scale alignment module to overcome spatial misregistration among scales. Finally, the aligned score maps are fused with the help of adaptive weight maps generated by the adaptive fusion module, producing the fused prediction. The performance of MFVNet surpasses the previous state-of-the-art semantic segmentation models on three typical RSI datasets, demonstrating the effectiveness of the proposed MFVNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GLE-net: global-local information enhancement for semantic segmentation of remote sensing images

Article Open access 25 October 2024

High Resolution Remote Sensing Image Segmentation Method with Improved DeepLabv3+

Article 23 April 2024

A deep learning method for optimizing semantic segmentation accuracy of remote sensing images based on improved UNet

Article Open access 10 May 2023

References

He Q, Sun X, Yan Z, et al. Multi-object tracking in satellite videos with graph-based multitask modeling. IEEE Trans Geosci Remote Sens, 2022, 60: 1–13
Article Google Scholar
He Q, Sun X, Diao W, et al. Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing. ISPRS J Photogrammetry Remote Sens, 2022, 193: 90–103
Article Google Scholar
Sun X, Wang P, Yan Z, et al. FAIR1M: a benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J Photogrammetry Remote Sens, 2022, 184: 116–130
Article Google Scholar
Fu S L, Xu F, Jin Y-Q. Reciprocal translation between SAR and optical remote sensing images with cascaded-residual adversarial networks. Sci China Inf Sci, 2021, 64: 122301
Article Google Scholar
Gu Y F, Liu T Z, Gao G M, et al. Multimodal hyperspectral remote sensing: an overview and perspective. Sci China Inf Sci, 2021, 64: 121301
Article Google Scholar
Mei J, Li R J, Gao W, et al. CoANet: connectivity attention network for road extraction from satellite imagery. IEEE Trans Image Process, 2021, 30: 8540–8552
Article Google Scholar
Rashkovetsky D, Mauracher F, Langer M, et al. Wildfire detection from multisensor satellite imagery using deep semantic segmentation. IEEE J Sel Top Appl Earth Observations Remote Sens, 2021, 14: 7001–7016
Article Google Scholar
Ding L, Tang H, Liu Y, et al. Adversarial shape learning for building extraction in VHR remote sensing images. IEEE Trans Image Process, 2022, 31: 678–690
Article Google Scholar
Li Y, Chen W, Zhang Y, et al. Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning. Remote Sens Environ, 2020, 250: 112045
Article Google Scholar
Li Y, Shi T, Zhang Y, et al. Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation. ISPRS J Photogrammetry Remote Sens, 2021, 175: 20–33
Article Google Scholar
Li Y, Zhou Y, Zhang Y, et al. DKDFN: domain knowledge-guided deep collaborative fusion network for multimodal unitemporal remote sensing land cover classification. ISPRS J Photogrammetry Remote Sens, 2022, 186: 170–189
Article Google Scholar
Workman S, Rafique M U, Blanton H, et al. Revisiting near/remote sensing with geospatial attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022
Peng D, Bruzzone L, Zhang Y, et al. SemiCDNet: a semisupervised convolutional neural network for change detection in high resolution remote-sensing images. IEEE Trans Geosci Remote Sens, 2021, 59: 5891–5906
Article Google Scholar
Zhu Q, Guo X, Deng W, et al. Land-Use/Land-Cover change detection based on a Siamese global learning framework for high spatial resolution remote sensing imagery. ISPRS J Photogrammetry Remote Sens, 2022, 184: 63–78
Article Google Scholar
Datcu M, Seidel K. Human-centered concepts for exploration and understanding of Earth observation images. IEEE Trans Geosci Remote Sens, 2005, 43: 601–609
Article Google Scholar
Lillesand T, Kiefer R W, Chipman J. Remote Sensing and Image Interpretation. Hoboken: John Wiley & Sons, 2015
Google Scholar
Haar R, Bart M T, Florack L. A multiscale geometric model of human vision. In: The Perception of Visual Information. New York: Springer, 1993. 73–114
Google Scholar
Romeny B M H. Front-End Vision and Multi-Scale Image Analysis: Multi-Scale Computer Vision Theory and Applications, Written in Mathematica. Berlin: Springer Science & Business Media, 2008
Google Scholar
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3431–3440
Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481–2495
Article Google Scholar
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention, 2015. 234–241
Chen L, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, 2018. 801–818
Lin G, Milan A, Shen C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1925–1934
Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2881–2890
Xiao T, Liu Y, Zhou B, et al. Unified perceptual parsing for scene understanding. In: Proceedings of European Conference on Computer Vision, 2018. 418–434
Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 3349–3364
Article Google Scholar
Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision, 2021
Zhang Z J, Pang Y W. CGNet: cross-guidance network for semantic segmentation. Sci China Inf Sci, 2020, 63: 120104
Article Google Scholar
Ma S, Pang Y W, Pan J, et al. Preserving details in semantics-aware context for scene parsing. Sci China Inf Sci, 2020, 63: 120106
Article Google Scholar
Feng J P, Wang X G, Liu W Y. Deep graph cut network for weakly-supervised semantic segmentation. Sci China Inf Sci, 2021, 64: 130105
Article Google Scholar
He N J, Fang L Y, Plaza A. Hybrid first and second order attention Unet for building segmentation in remote sensing images. Sci China Inf Sci, 2020, 63: 140305
Article Google Scholar
Li Q, Yang W, Liu W, et al. From contexts to locality: ultra-high resolution image segmentation via locality-aware contextual correlation. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 7252–7261
Tong X Y, Xia G S, Lu Q, et al. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sens Environ, 2020, 237: 111322
Article Google Scholar
Li Z, Shen H, Li H, et al. Multi-feature combined cloud and cloud shadow detection in GaoFen-1 wide field of view imagery. Remote Sens Environ, 2017, 191: 342–358
Article Google Scholar
Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 3146–3154
Huang Z, Wang X, Huang L, et al. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 603–612
Ding L, Zhang J, Bruzzone L. Semantic segmentation of large-size VHR remote sensing images using a two-stage multiscale training architecture. IEEE Trans Geosci Remote Sens, 2020, 58: 5367–5376
Article Google Scholar
Ding L, Lin D, Lin S, et al. Looking outside the window: wide-context transformer for the semantic segmentation of high-resolution remote sensing images. IEEE Trans Geosci Remote Sens, 2022, 60: 1–13
Google Scholar
Chen W, Jiang Z, Wang Z, et al. Collaborative global-local networks for memory-efficient segmentation of ultra-high resolution images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8924–8933
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: Proceedings of International Conference on Learning Representations, 2016
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3104–3112
Devlin J, Chang M, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. 2018. ArXiv:1810.04805
Yuan Y, Huang L, Guo J, et al. OCNet: object context network for scene parsing. 2021. ArXiv:1809.00916
Li D, Hu J, Wang C, et al. Involution: inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 12321–12330
Woo S, Park J, Lee J, et al. CBAM: convolutional block attention module. In: Proceedings of European Conference on Computer Vision, 2018. 3–19
Zhao H, Zhang Y, Liu S, et al. PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of European Conference on Computer Vision, 2018. 267–283
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations, 2021
Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention. In: Proceedings of International Conference on Machine Learning, 2021. 10347–10357
Zheng S, Lu J, Zhao H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 6881–6890
Cheng H K, Chung J, Tai Y, et al. CascadePSP: toward class-agnostic and very high-resolution segmentation via global and local refinement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 8890–8899
Li X, You A, Zhu Z, et al. Semantic flow for fast and accurate scene parsing. In: Proceedings of European Conference on Computer Vision, 2020. 775–793
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778
Liu C, Chen L, Schroff F, et al. Auto-DeepLab: hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 82–92
Zhang X, Xu H, Mo H, et al. DCNAs: densely connected neural architecture search for semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 13956–13967
He X, Zhao K, Chu X. AutoML: a survey of the state-of-the-art. Knowledge-Based Syst, 2021, 212: 106622
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by State Key Program of the National Natural Science Foundation of China (Grant No. 42030102), Foundation for Innovative Research Groups of the Natural Science Foundation of Hubei Province (Grant No. 2020CFA003), National Natural Science Foundation of China (Grant No. 41971284), Fundamental Research Funds for the Central Universities (Grant No. 2042022kf1201), and Special Fund of Hubei Luojia Laboratory.

Author information

Authors and Affiliations

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, 430079, China
Yansheng Li, Wei Chen, Xin Huang, Zhi Gao, Siwei Li, Tao He & Yongjun Zhang

Authors

Yansheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Gao
View author publications
You can also search for this author in PubMed Google Scholar
Siwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Tao He
View author publications
You can also search for this author in PubMed Google Scholar
Yongjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wei Chen or Yongjun Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Chen, W., Huang, X. et al. MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation. Sci. China Inf. Sci. 66, 140305 (2023). https://doi.org/10.1007/s11432-022-3599-y

Download citation

Received: 19 June 2022
Revised: 25 August 2022
Accepted: 14 October 2022
Published: 27 March 2023
DOI: https://doi.org/10.1007/s11432-022-3599-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GLE-net: global-local information enhancement for semantic segmentation of remote sensing images

High Resolution Remote Sensing Image Segmentation Method with Improved DeepLabv3+

A deep learning method for optimizing semantic segmentation accuracy of remote sensing images based on improved UNet

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GLE-net: global-local information enhancement for semantic segmentation of remote sensing images

High Resolution Remote Sensing Image Segmentation Method with Improved DeepLabv3+

A deep learning method for optimizing semantic segmentation accuracy of remote sensing images based on improved UNet

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation