skip to main content
10.1145/3274895.3274993acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
poster

Fusion of aerial lidar and images for road segmentation with deep CNN

Published:06 November 2018Publication History

ABSTRACT

In this paper we attempt to fuse both airborne Lidar and high resolution images (0.5 feet per pixel) to identify road networks in a large geographic region. We perform pixel-wise segmentation to classify each pixel as road or non-road based on color and depth features in a larger neighborhood context. This constitutes a bimodal setting because the RGB pixels represent the color space and the depth values come from three dimensional Lidar readings. We present multiple strategies for fusing Lidar and images. We describe a cost-effective, modular, deep convolution network design, TriSeg which gives better IoU metric for the aerial road segmentation problem than the state of the art RGB only architectures. We report on many other architectures as well as release our dataset for further research: https://bitbucket.org/biswas/fusion_lidar_images.

References

  1. Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).Google ScholarGoogle Scholar
  2. Ruzena Bajcsy and Mohamad Tavakoli. 1976. Computer recognition of roads from satellite pictures. IEEE Transactions on Systems, Man, and Cybernetics 9 (1976), 623--637.Google ScholarGoogle ScholarCross RefCross Ref
  3. Favyen Bastani, Songtao He, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, and David DeWitt. 2018. RoadTracer: Automatic Extraction of Road Networks from Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4720--4728.Google ScholarGoogle ScholarCross RefCross Ref
  4. Carlos Becker, Nicolai Häni, Elena Rosinskaya, Emmanuel d'Angelo, and Christoph Strecha. 2017. Classification of aerial photogrammetric 3D point clouds. arXiv preprint arXiv:1705.08374 (2017).Google ScholarGoogle Scholar
  5. James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In International Conference on Machine Learning. 115--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2018), 834--848.Google ScholarGoogle Scholar
  7. Guangliang Cheng, Ying Wang, Shibiao Xu, Hongzhen Wang, Shiming Xiang, and Chunhong Pan. 2017. Automatic road detection and centerline extraction via cascaded end-to-end convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing 55, 6 (2017), 3322--3337.Google ScholarGoogle ScholarCross RefCross Ref
  8. David Eigen and Rob Fergus. 2014. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. CoRR abs/1411.4734 (2014). arXiv:1411.4734 http://arxiv.org/abs/1411.4734Google ScholarGoogle Scholar
  9. Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, and Wolfram Burgard. 2015. Multimodal deep learning for robust rgb-d object recognition. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 681--687.Google ScholarGoogle ScholarCross RefCross Ref
  10. Saurabh Gupta, Pablo Arbelaez, and Jitendra Malik. 2013. Perceptual organization and recognition of indoor scenes from RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 564--571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Caner Hazirbas, Lingni Ma, Csaba Domokos, and Daniel Cremers. 2016. Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In Asian Conference on Computer Vision. Springer, 213--228.Google ScholarGoogle Scholar
  12. Xiangyun Hu, C Vincent Tao, and Yong Hu. 2004. Automatic road extraction from dense urban area by integrated processing of high resolution imagery and lidar data. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences. Istanbul, Turkey 35 (2004), B3.Google ScholarGoogle Scholar
  13. Fu Jie Huang, Y-Lan Boureau, Yann LeCun, et al. 2007. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 1--8.Google ScholarGoogle Scholar
  14. Xin Huang and Liangpei Zhang. 2009. Road centreline extraction from high-resolution imagery based on multiscale structural features and support vector machines. International Journal of Remote Sensing 30, 8 (2009), 1977--1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).Google ScholarGoogle Scholar
  16. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.Google ScholarGoogle ScholarCross RefCross Ref
  17. Gellért Máttyus, Wenjie Luo, and Raquel Urtasun. 2017. Deeproadmapper: Extracting road topology from aerial images. In International Conference on Computer Vision, Vol. 2.Google ScholarGoogle ScholarCross RefCross Ref
  18. Volodymyr Mnih and Geoffrey Hinton. 2010. Learning to detect roads in high-resolution aerial images. Computer Vision-ECCV 2010 (2010), 210--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689--696. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fusion of aerial lidar and images for road segmentation with deep CNN

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGSPATIAL '18: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
        November 2018
        655 pages
        ISBN:9781450358897
        DOI:10.1145/3274895

        Copyright © 2018 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 November 2018

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        SIGSPATIAL '18 Paper Acceptance Rate30of150submissions,20%Overall Acceptance Rate220of1,116submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader