Abstract:
Integrated Sentinel-1 synthetic aperture radar (SAR) imagery and Sentinel-2 optical imagery have shown great promise in mapping large-scale building height. Effectively f...Show MoreMetadata
Abstract:
Integrated Sentinel-1 synthetic aperture radar (SAR) imagery and Sentinel-2 optical imagery have shown great promise in mapping large-scale building height. Effectively fusing the complementary features of SAR and optical imagery is a key challenge in enhancing the building height estimation performance. However, SAR imagery and optical imagery have significant heterogeneity, which makes obtaining accurate building height a challenging problem. In this article, we propose a hybrid multimodal fusion network (MF-BHNet) for building height estimation using Sentinel-1 SAR imagery and Sentinel-2 optical imagery. First, we design a hybrid multimodal encoder to mine modal-specific feature and model intermodal correlation. In particular, an intramodal encoder (IME) is designed to reconstruct valuable intramodal information, and a transformer-based cross-modal encoder (CME) is used to model intermodal correlation and capture contextual information. Then, a coarse-fine progressive multimodal fusion method is proposed to fuse SAR feature and optical feature to improve the building height estimation performance. We construct a building height dataset by introducing superior building footprints to validate our method. Experimental results demonstrate that our MF-BHNet method outperforms the compared 11 state-of-the-art methods, which achieves the lowest root-mean-square error (RMSE) of 3.6421 m. Besides, compared to the four publicly available building height products, the mapping result of the proposed method has significant advantages in terms of spatial detail and accuracy.
Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 62)