Skip to main content
Log in

MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In recent years, the remote sensing image (RSI) semantic segmentation attracts increasing research interest due to its wide application. RSIs are difficult to be processed holistically on current GPU cards on account of their large field-of-views (FOVs). However, the prevailing practices such as downsampling and cropping will inevitably decrease the quality of semantic segmentation. To address this conflict, this paper proposes a new deep adaptive fusion network with multiple FOVs (MFVNet), which is specially designed for RSI semantic segmentation. Different from existing methods, MFVNet takes into consideration the differences among multiple FOVs. By pyramid sampling the RSI, we first obtain images on different scales with multiple FOVs. Images on the high scale with a large FOV can capture larger spatial contexts and complete object contours, while images on the low scale with a small FOV can keep the higher spatial resolution and more detailed information. Then scale-specific models are chosen to make the best predictions for all scales. Next, the output feature maps and score maps are aligned through the scale alignment module to overcome spatial misregistration among scales. Finally, the aligned score maps are fused with the help of adaptive weight maps generated by the adaptive fusion module, producing the fused prediction. The performance of MFVNet surpasses the previous state-of-the-art semantic segmentation models on three typical RSI datasets, demonstrating the effectiveness of the proposed MFVNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. He Q, Sun X, Yan Z, et al. Multi-object tracking in satellite videos with graph-based multitask modeling. IEEE Trans Geosci Remote Sens, 2022, 60: 1–13

    Article  Google Scholar 

  2. He Q, Sun X, Diao W, et al. Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing. ISPRS J Photogrammetry Remote Sens, 2022, 193: 90–103

    Article  Google Scholar 

  3. Sun X, Wang P, Yan Z, et al. FAIR1M: a benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J Photogrammetry Remote Sens, 2022, 184: 116–130

    Article  Google Scholar 

  4. Fu S L, Xu F, Jin Y-Q. Reciprocal translation between SAR and optical remote sensing images with cascaded-residual adversarial networks. Sci China Inf Sci, 2021, 64: 122301

    Article  Google Scholar 

  5. Gu Y F, Liu T Z, Gao G M, et al. Multimodal hyperspectral remote sensing: an overview and perspective. Sci China Inf Sci, 2021, 64: 121301

    Article  Google Scholar 

  6. Mei J, Li R J, Gao W, et al. CoANet: connectivity attention network for road extraction from satellite imagery. IEEE Trans Image Process, 2021, 30: 8540–8552

    Article  Google Scholar 

  7. Rashkovetsky D, Mauracher F, Langer M, et al. Wildfire detection from multisensor satellite imagery using deep semantic segmentation. IEEE J Sel Top Appl Earth Observations Remote Sens, 2021, 14: 7001–7016

    Article  Google Scholar 

  8. Ding L, Tang H, Liu Y, et al. Adversarial shape learning for building extraction in VHR remote sensing images. IEEE Trans Image Process, 2022, 31: 678–690

    Article  Google Scholar 

  9. Li Y, Chen W, Zhang Y, et al. Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning. Remote Sens Environ, 2020, 250: 112045

    Article  Google Scholar 

  10. Li Y, Shi T, Zhang Y, et al. Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation. ISPRS J Photogrammetry Remote Sens, 2021, 175: 20–33

    Article  Google Scholar 

  11. Li Y, Zhou Y, Zhang Y, et al. DKDFN: domain knowledge-guided deep collaborative fusion network for multimodal unitemporal remote sensing land cover classification. ISPRS J Photogrammetry Remote Sens, 2022, 186: 170–189

    Article  Google Scholar 

  12. Workman S, Rafique M U, Blanton H, et al. Revisiting near/remote sensing with geospatial attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022

  13. Peng D, Bruzzone L, Zhang Y, et al. SemiCDNet: a semisupervised convolutional neural network for change detection in high resolution remote-sensing images. IEEE Trans Geosci Remote Sens, 2021, 59: 5891–5906

    Article  Google Scholar 

  14. Zhu Q, Guo X, Deng W, et al. Land-Use/Land-Cover change detection based on a Siamese global learning framework for high spatial resolution remote sensing imagery. ISPRS J Photogrammetry Remote Sens, 2022, 184: 63–78

    Article  Google Scholar 

  15. Datcu M, Seidel K. Human-centered concepts for exploration and understanding of Earth observation images. IEEE Trans Geosci Remote Sens, 2005, 43: 601–609

    Article  Google Scholar 

  16. Lillesand T, Kiefer R W, Chipman J. Remote Sensing and Image Interpretation. Hoboken: John Wiley & Sons, 2015

    Google Scholar 

  17. Haar R, Bart M T, Florack L. A multiscale geometric model of human vision. In: The Perception of Visual Information. New York: Springer, 1993. 73–114

    Google Scholar 

  18. Romeny B M H. Front-End Vision and Multi-Scale Image Analysis: Multi-Scale Computer Vision Theory and Applications, Written in Mathematica. Berlin: Springer Science & Business Media, 2008

    Google Scholar 

  19. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3431–3440

  20. Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481–2495

    Article  Google Scholar 

  21. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention, 2015. 234–241

  22. Chen L, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, 2018. 801–818

  23. Lin G, Milan A, Shen C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1925–1934

  24. Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2881–2890

  25. Xiao T, Liu Y, Zhou B, et al. Unified perceptual parsing for scene understanding. In: Proceedings of European Conference on Computer Vision, 2018. 418–434

  26. Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 3349–3364

    Article  Google Scholar 

  27. Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision, 2021

  28. Zhang Z J, Pang Y W. CGNet: cross-guidance network for semantic segmentation. Sci China Inf Sci, 2020, 63: 120104

    Article  Google Scholar 

  29. Ma S, Pang Y W, Pan J, et al. Preserving details in semantics-aware context for scene parsing. Sci China Inf Sci, 2020, 63: 120106

    Article  Google Scholar 

  30. Feng J P, Wang X G, Liu W Y. Deep graph cut network for weakly-supervised semantic segmentation. Sci China Inf Sci, 2021, 64: 130105

    Article  Google Scholar 

  31. He N J, Fang L Y, Plaza A. Hybrid first and second order attention Unet for building segmentation in remote sensing images. Sci China Inf Sci, 2020, 63: 140305

    Article  Google Scholar 

  32. Li Q, Yang W, Liu W, et al. From contexts to locality: ultra-high resolution image segmentation via locality-aware contextual correlation. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 7252–7261

  33. Tong X Y, Xia G S, Lu Q, et al. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sens Environ, 2020, 237: 111322

    Article  Google Scholar 

  34. Li Z, Shen H, Li H, et al. Multi-feature combined cloud and cloud shadow detection in GaoFen-1 wide field of view imagery. Remote Sens Environ, 2017, 191: 342–358

    Article  Google Scholar 

  35. Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 3146–3154

  36. Huang Z, Wang X, Huang L, et al. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 603–612

  37. Ding L, Zhang J, Bruzzone L. Semantic segmentation of large-size VHR remote sensing images using a two-stage multiscale training architecture. IEEE Trans Geosci Remote Sens, 2020, 58: 5367–5376

    Article  Google Scholar 

  38. Ding L, Lin D, Lin S, et al. Looking outside the window: wide-context transformer for the semantic segmentation of high-resolution remote sensing images. IEEE Trans Geosci Remote Sens, 2022, 60: 1–13

    Google Scholar 

  39. Chen W, Jiang Z, Wang Z, et al. Collaborative global-local networks for memory-efficient segmentation of ultra-high resolution images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8924–8933

  40. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: Proceedings of International Conference on Learning Representations, 2016

  41. Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3104–3112

  42. Devlin J, Chang M, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. 2018. ArXiv:1810.04805

  43. Yuan Y, Huang L, Guo J, et al. OCNet: object context network for scene parsing. 2021. ArXiv:1809.00916

  44. Li D, Hu J, Wang C, et al. Involution: inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 12321–12330

  45. Woo S, Park J, Lee J, et al. CBAM: convolutional block attention module. In: Proceedings of European Conference on Computer Vision, 2018. 3–19

  46. Zhao H, Zhang Y, Liu S, et al. PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of European Conference on Computer Vision, 2018. 267–283

  47. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations, 2021

  48. Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention. In: Proceedings of International Conference on Machine Learning, 2021. 10347–10357

  49. Zheng S, Lu J, Zhao H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 6881–6890

  50. Cheng H K, Chung J, Tai Y, et al. CascadePSP: toward class-agnostic and very high-resolution segmentation via global and local refinement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 8890–8899

  51. Li X, You A, Zhu Z, et al. Semantic flow for fast and accurate scene parsing. In: Proceedings of European Conference on Computer Vision, 2020. 775–793

  52. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778

  53. Liu C, Chen L, Schroff F, et al. Auto-DeepLab: hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 82–92

  54. Zhang X, Xu H, Mo H, et al. DCNAs: densely connected neural architecture search for semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 13956–13967

  55. He X, Zhao K, Chu X. AutoML: a survey of the state-of-the-art. Knowledge-Based Syst, 2021, 212: 106622

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by State Key Program of the National Natural Science Foundation of China (Grant No. 42030102), Foundation for Innovative Research Groups of the Natural Science Foundation of Hubei Province (Grant No. 2020CFA003), National Natural Science Foundation of China (Grant No. 41971284), Fundamental Research Funds for the Central Universities (Grant No. 2042022kf1201), and Special Fund of Hubei Luojia Laboratory.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Chen or Yongjun Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Chen, W., Huang, X. et al. MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation. Sci. China Inf. Sci. 66, 140305 (2023). https://doi.org/10.1007/s11432-022-3599-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3599-y

Keywords

Navigation