ABSTRACT
Road extraction from remote sensing images has gradually become a prominent research hotspot in the field of autonomous driving and smart city construction. In recent years, with the developments of computing power, deep learning has been widely used in this field and convolution neural networks are usually used to extract roads. However, since the roads in the remote sensing images are easy to be occluded by trees and buildings, the roads extracted by these methods are usually fragmented. In this paper, a U-shaped Neural Network based on Pyramid Vision Transformer (PVT-Unet) is designed. This network combines Transformer's long term learning capability with U-shaped network multi-scale feature extraction capability to predict the roads well. Experimental results show that PVT-Unet outperforms the state-of-the-art methods in all evaluation metrics on the Istanbul City Road Dataset. The source code has been made publicly available at: https://github.com/XYQ1517/PVT-Unet.
- L. Qiu, D. Yu, C. Zhang, and X. Zhang, “A semantics-geometry framework for road extraction from remote sensing images,” IEEE Geoscience and Remote Sensing Letters, 2023.Google Scholar
- Y. Wang, Y. Peng, W. Li, G. C. Alexandropoulos, J. Yu, D. Ge, and W. Xiang, “Ddu-net: Dual-decoder-u-net for road extraction using highresolution remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2022.Google Scholar
- L. Dai, G. Zhang, and R. Zhang, “Radanet: road augmented deformable attention network for road extraction from complex high-resolution remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–13, 2023.Google ScholarCross Ref
- Z. Miao, W. Shi, H. Zhang, and X. Wang, “Road centerline extraction from high-resolution imagery based on shape features and multivariate adaptive regression splines,” IEEE geoscience and remote sensing letters, vol. 10, no. 3, pp. 583–587, 2012.Google Scholar
- H. Zhang, W. Shi, Y. Wang, M. Hao, and Z. Miao, “Classification of very high spatial resolution imagery based on a new pixel shape feature set,” IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 5, pp. 940–944, 2013.Google ScholarCross Ref
- E. F. Martins, A. P. Dal Poz, and R. A. Gallis, “Semiautomatic object- ´ space road extraction combining a stereoscopic image pair and a tinbased dtm,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 8, pp. 1790–1794, 2015.Google ScholarCross Ref
- G. Cheng, F. Zhu, S. Xiang, and C. Pan, “Road centerline extraction via semisupervised segmentation and multidirection nonmaximum suppression,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 4, pp. 545–549, 2016.Google ScholarCross Ref
- G. Cheng, Y. Wang, F. Zhu and C. Pan, "Road extraction via adaptive graph cuts with multiple features," 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 2015, pp. 3962-3966, doi: 10.1109/ICIP.2015.7351549.Google ScholarDigital Library
- T. Pham, “Semantic road segmentation using deep learning,” in 2020 Applying New Technology in Green Buildings (ATiGB). IEEE, 2021, pp. 45–48.Google Scholar
- D. Guanlin, “Research on semantic segmentation algorithm based on deep learning control tools,” in 2020 International Conference on Computer Communication and Network Security (CCNS). IEEE, 2020, pp. 35–38.Google ScholarCross Ref
- A. Do Hong, H. D. Chi, and T. Pham, “Medical image segmentation using deep learning and blending loss,” in 2022 7th National Scientific Conference on Applying New Technology in Green Buildings (ATiGB). IEEE, 2022, pp. 109–113.Google ScholarCross Ref
- Y. Wang, J. Seo, and T. Jeon, “Nl-linknet: Toward lighter but more accurate road extraction with nonlocal operations,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.Google Scholar
- Y. Wei, Z. Wang, and M. Xu, “Road structure refined cnn for road extraction in aerial image,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 5, pp. 709–713, 2017.Google ScholarCross Ref
- Z. Zhang, Q. Liu and Y. Wang, "Road Extraction by Deep Residual U-Net," in IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749-753, May 2018, doi: 10.1109/LGRS.2018.2802944.Google ScholarCross Ref
- Y. Wang et al., "Re-DLinkNet: Based on DLinkNet and ReNet for Road Extraction from High Resolution Satellite Imagery," 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 2021, pp. 4664-4667, doi: 10.1109/IGARSS47720.2021.9553728.Google ScholarCross Ref
- Z. Liu, R. Feng, L. Wang, Y. Zhong and L. Cao, "D-Resunet: Resunet and Dilated Convolution for High Resolution Satellite Imagery Road Extraction," IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 2019, pp. 3927-3930, doi: 10.1109/IGARSS.2019.8898392.Google ScholarCross Ref
- Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[J]. Advances in neural information processing systems, 2014, 27.Google Scholar
- Qiu X, Sun T, Xu Y, Pre-trained models for natural language processing: A survey[J]. Science China Technological Sciences, 2020, 63(10): 1872-1897.Google ScholarCross Ref
- Cordonnier J B, Loukas A, Jaggi M. On the relationship between self-attention and convolutional layers[J]. arXiv preprint arXiv:1911.03584, 2019.Google Scholar
- Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.Google Scholar
- Carion N, Massa F, Synnaeve G, End-to-end object detection with transformers[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.Google Scholar
- Wang H, Zhu Y, Adam H, Max-deeplab: End-to-end panoptic segmentation with mask transformers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 5463-5474.Google Scholar
- Chen X, Yan B, Zhu J, Transformer tracking[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 8126-8135.Google Scholar
- Jiang Y, Chang S, Wang Z. Transgan: Two pure transformers can make one strong gan, and that can scale up[J]. Advances in Neural Information Processing Systems, 2021, 34: 14745-14758.Google Scholar
- Chen H, Wang Y, Guo T, Pre-trained image processing transformer[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 12299-12310.Google Scholar
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.Google Scholar
- Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.Google Scholar
- Wang W, Xie E, Li X, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 568-578.Google Scholar
- Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.Google Scholar
- Zaremba W, Sutskever I, Vinyals O. Recurrent neural network regularization[J]. arXiv preprint arXiv:1409.2329, 2014.Google Scholar
- O. Ozturk, M. S. Isik, B. Sariturk, and D. Z. Seker, “Generation of istanbul road data set using google map api for deep learning-based segmentation,” International Journal of Remote Sensing, vol. 43, no. 8, pp. 2793–2812, 2022.Google ScholarCross Ref
- L. Zhou, C. Zhang, and M. Wu, “D-linknet: Linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 182–186.Google ScholarCross Ref
- S.-B. Chen, Y.-X. Ji, J. Tang, B. Luo, W.-Q. Wang, and K. Lv, “Dbranet: Road extraction by dual-branch encoder and regional attention decoder,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.Google Scholar
- R. Li, S. Zheng, C. Duan, J. Su, and C. Zhang, “Multistage attention resu-net for semantic segmentation of fine-resolution remote sensing images,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.Google Scholar
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12 077–12 090, 2021.Google Scholar
Index Terms
- PVT-Unet: Road Extraction in Remote Sensing Imagery Based on U-shaped Pyramid Vision Transformer Neural Network
Recommendations
Study on Road Extraction Method in Remote Sensing Image
ICICEE '12: Proceedings of the 2012 International Conference on Industrial Control and Electronics EngineeringA road extraction method combining radiation and topology feature is proposed. First, wavelet transform is used to remove noise and detail information which impact extraction of radiation feature. Then road and other objects with same spectrum are ...
Research and Application of Urban Road Extraction from Remote Sensing Images Based on Convolutional neural network U-net
ISIA '23: Proceedings of the 2023 International Conference on Intelligent Sensing and Industrial AutomationAbstract: The urban road network is the backbone of a city, and the development speed of a city largely depends on whether the planning of the urban road network is reasonable. How to accurately obtain road distribution has profound significance for ...
Automatic Road Extraction from Remote Sensing Images Based on Fuzzy Connectedness
GIT4NDM '13: Proceedings of the 2013 Fifth International Conference on Geo-Information Technologies for Natural Disaster ManagementWith the rapid development of space technology, space remote sensing activities get a full extension and application. Remote sensing information has become an essential part of geographic information data source. As a very important kind of geographic ...
Comments