Abstract
In recent years, the remote sensing image (RSI) semantic segmentation attracts increasing research interest due to its wide application. RSIs are difficult to be processed holistically on current GPU cards on account of their large field-of-views (FOVs). However, the prevailing practices such as downsampling and cropping will inevitably decrease the quality of semantic segmentation. To address this conflict, this paper proposes a new deep adaptive fusion network with multiple FOVs (MFVNet), which is specially designed for RSI semantic segmentation. Different from existing methods, MFVNet takes into consideration the differences among multiple FOVs. By pyramid sampling the RSI, we first obtain images on different scales with multiple FOVs. Images on the high scale with a large FOV can capture larger spatial contexts and complete object contours, while images on the low scale with a small FOV can keep the higher spatial resolution and more detailed information. Then scale-specific models are chosen to make the best predictions for all scales. Next, the output feature maps and score maps are aligned through the scale alignment module to overcome spatial misregistration among scales. Finally, the aligned score maps are fused with the help of adaptive weight maps generated by the adaptive fusion module, producing the fused prediction. The performance of MFVNet surpasses the previous state-of-the-art semantic segmentation models on three typical RSI datasets, demonstrating the effectiveness of the proposed MFVNet.
Similar content being viewed by others
References
He Q, Sun X, Yan Z, et al. Multi-object tracking in satellite videos with graph-based multitask modeling. IEEE Trans Geosci Remote Sens, 2022, 60: 1–13
He Q, Sun X, Diao W, et al. Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing. ISPRS J Photogrammetry Remote Sens, 2022, 193: 90–103
Sun X, Wang P, Yan Z, et al. FAIR1M: a benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J Photogrammetry Remote Sens, 2022, 184: 116–130
Fu S L, Xu F, Jin Y-Q. Reciprocal translation between SAR and optical remote sensing images with cascaded-residual adversarial networks. Sci China Inf Sci, 2021, 64: 122301
Gu Y F, Liu T Z, Gao G M, et al. Multimodal hyperspectral remote sensing: an overview and perspective. Sci China Inf Sci, 2021, 64: 121301
Mei J, Li R J, Gao W, et al. CoANet: connectivity attention network for road extraction from satellite imagery. IEEE Trans Image Process, 2021, 30: 8540–8552
Rashkovetsky D, Mauracher F, Langer M, et al. Wildfire detection from multisensor satellite imagery using deep semantic segmentation. IEEE J Sel Top Appl Earth Observations Remote Sens, 2021, 14: 7001–7016
Ding L, Tang H, Liu Y, et al. Adversarial shape learning for building extraction in VHR remote sensing images. IEEE Trans Image Process, 2022, 31: 678–690
Li Y, Chen W, Zhang Y, et al. Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning. Remote Sens Environ, 2020, 250: 112045
Li Y, Shi T, Zhang Y, et al. Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation. ISPRS J Photogrammetry Remote Sens, 2021, 175: 20–33
Li Y, Zhou Y, Zhang Y, et al. DKDFN: domain knowledge-guided deep collaborative fusion network for multimodal unitemporal remote sensing land cover classification. ISPRS J Photogrammetry Remote Sens, 2022, 186: 170–189
Workman S, Rafique M U, Blanton H, et al. Revisiting near/remote sensing with geospatial attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022
Peng D, Bruzzone L, Zhang Y, et al. SemiCDNet: a semisupervised convolutional neural network for change detection in high resolution remote-sensing images. IEEE Trans Geosci Remote Sens, 2021, 59: 5891–5906
Zhu Q, Guo X, Deng W, et al. Land-Use/Land-Cover change detection based on a Siamese global learning framework for high spatial resolution remote sensing imagery. ISPRS J Photogrammetry Remote Sens, 2022, 184: 63–78
Datcu M, Seidel K. Human-centered concepts for exploration and understanding of Earth observation images. IEEE Trans Geosci Remote Sens, 2005, 43: 601–609
Lillesand T, Kiefer R W, Chipman J. Remote Sensing and Image Interpretation. Hoboken: John Wiley & Sons, 2015
Haar R, Bart M T, Florack L. A multiscale geometric model of human vision. In: The Perception of Visual Information. New York: Springer, 1993. 73–114
Romeny B M H. Front-End Vision and Multi-Scale Image Analysis: Multi-Scale Computer Vision Theory and Applications, Written in Mathematica. Berlin: Springer Science & Business Media, 2008
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3431–3440
Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481–2495
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention, 2015. 234–241
Chen L, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, 2018. 801–818
Lin G, Milan A, Shen C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1925–1934
Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2881–2890
Xiao T, Liu Y, Zhou B, et al. Unified perceptual parsing for scene understanding. In: Proceedings of European Conference on Computer Vision, 2018. 418–434
Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 3349–3364
Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision, 2021
Zhang Z J, Pang Y W. CGNet: cross-guidance network for semantic segmentation. Sci China Inf Sci, 2020, 63: 120104
Ma S, Pang Y W, Pan J, et al. Preserving details in semantics-aware context for scene parsing. Sci China Inf Sci, 2020, 63: 120106
Feng J P, Wang X G, Liu W Y. Deep graph cut network for weakly-supervised semantic segmentation. Sci China Inf Sci, 2021, 64: 130105
He N J, Fang L Y, Plaza A. Hybrid first and second order attention Unet for building segmentation in remote sensing images. Sci China Inf Sci, 2020, 63: 140305
Li Q, Yang W, Liu W, et al. From contexts to locality: ultra-high resolution image segmentation via locality-aware contextual correlation. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 7252–7261
Tong X Y, Xia G S, Lu Q, et al. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sens Environ, 2020, 237: 111322
Li Z, Shen H, Li H, et al. Multi-feature combined cloud and cloud shadow detection in GaoFen-1 wide field of view imagery. Remote Sens Environ, 2017, 191: 342–358
Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 3146–3154
Huang Z, Wang X, Huang L, et al. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 603–612
Ding L, Zhang J, Bruzzone L. Semantic segmentation of large-size VHR remote sensing images using a two-stage multiscale training architecture. IEEE Trans Geosci Remote Sens, 2020, 58: 5367–5376
Ding L, Lin D, Lin S, et al. Looking outside the window: wide-context transformer for the semantic segmentation of high-resolution remote sensing images. IEEE Trans Geosci Remote Sens, 2022, 60: 1–13
Chen W, Jiang Z, Wang Z, et al. Collaborative global-local networks for memory-efficient segmentation of ultra-high resolution images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8924–8933
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: Proceedings of International Conference on Learning Representations, 2016
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3104–3112
Devlin J, Chang M, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. 2018. ArXiv:1810.04805
Yuan Y, Huang L, Guo J, et al. OCNet: object context network for scene parsing. 2021. ArXiv:1809.00916
Li D, Hu J, Wang C, et al. Involution: inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 12321–12330
Woo S, Park J, Lee J, et al. CBAM: convolutional block attention module. In: Proceedings of European Conference on Computer Vision, 2018. 3–19
Zhao H, Zhang Y, Liu S, et al. PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of European Conference on Computer Vision, 2018. 267–283
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations, 2021
Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention. In: Proceedings of International Conference on Machine Learning, 2021. 10347–10357
Zheng S, Lu J, Zhao H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 6881–6890
Cheng H K, Chung J, Tai Y, et al. CascadePSP: toward class-agnostic and very high-resolution segmentation via global and local refinement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 8890–8899
Li X, You A, Zhu Z, et al. Semantic flow for fast and accurate scene parsing. In: Proceedings of European Conference on Computer Vision, 2020. 775–793
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778
Liu C, Chen L, Schroff F, et al. Auto-DeepLab: hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 82–92
Zhang X, Xu H, Mo H, et al. DCNAs: densely connected neural architecture search for semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 13956–13967
He X, Zhao K, Chu X. AutoML: a survey of the state-of-the-art. Knowledge-Based Syst, 2021, 212: 106622
Acknowledgements
This work was supported in part by State Key Program of the National Natural Science Foundation of China (Grant No. 42030102), Foundation for Innovative Research Groups of the Natural Science Foundation of Hubei Province (Grant No. 2020CFA003), National Natural Science Foundation of China (Grant No. 41971284), Fundamental Research Funds for the Central Universities (Grant No. 2042022kf1201), and Special Fund of Hubei Luojia Laboratory.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Li, Y., Chen, W., Huang, X. et al. MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation. Sci. China Inf. Sci. 66, 140305 (2023). https://doi.org/10.1007/s11432-022-3599-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3599-y