Deep Multispectral Semantic Scene Understanding of Forested Environments Using Multimodal Fusion

Valada, Abhinav; Oliveira, Gabriel L.; Brox, Thomas; Burgard, Wolfram

doi:10.1007/978-3-319-50115-4_41

Deep Multispectral Semantic Scene Understanding of Forested Environments Using Multimodal Fusion

Abhinav Valada⁷,
Gabriel L. Oliveira⁷,
Thomas Brox⁷ &
…
Wolfram Burgard⁷

Conference paper
First Online: 21 March 2017

5122 Accesses
57 Citations

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 1))

Abstract

Semantic scene understanding of unstructured environments is a highly challenging task for robots operating in the real world. Deep Convolutional Neural Network architectures define the state of the art in various segmentation tasks. So far, researchers have focused on segmentation with RGB data. In this paper, we study the use of multispectral and multimodal images for semantic segmentation and develop fusion architectures that learn from RGB, Near-InfraRed channels, and depth data. We introduce a first-of-its-kind multispectral segmentation benchmark that contains 15, 000 images and 366 pixel-wise ground truth annotations of unstructured forest environments. We identify new data augmentation strategies that enable training of very deep models using relatively small datasets. We show that our UpNet architecture exceeds the state of the art both qualitatively and quantitatively on our benchmark. In addition, we present experimental results for segmentation under challenging real-world conditions. Benchmark and demo are publicly available at http://deepscene.cs.uni-freiburg.de.

This work has partly been supported by the European Commission under FP7-267686-LIFENAV and FP7-610603-EUROPA2.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Badrinarayanan, V., et al.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint (2015). arXiv:1511.00561
Bradley, D.M., et al.: Vegetation detection for mobile robot navigation. Technical report CMU-RI-TR-05-12, Carnegie Mellon University (2004)
Google Scholar
Eitel, A., et al.: Multimodal deep learning for robust RGB-D object recognition. In: International Conference on Intelligent Robots and Systems (2015)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis. Comm. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint (2015). arXiv:1512.03385
Hirschmüller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: CVPR (2005)
Google Scholar
Huete, A., Justice, C.O., van Leeuwen, W.J.D.: MODIS vegetation index (MOD 13), Algorithm Theoretical Basis Document (ATBD), Version 3.0, p. 129 (1999)
Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint (2014). arXiv:1408.5093
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image (2014). arXiv:1411.6387
Liu, W., et al.: ParseNet: looking wider to see better. preprint (2015). arXiv:1506.04579
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, November 2015
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)
Google Scholar
Oliveira, G.L., Burgard, W., Brox, T.: Efficient deep methods for monocular road segmentation. In: International Conference on Intelligent Robots and Systems (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Schwarz, M., Schulz, H., Behnke, S.: RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In: ICRA (2015)
Google Scholar
Sermanet, P., et al.: Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint (2013). arXiv:1312.6229
Socher, R., et al.: Convolutional-recursive deep learning for 3D object classification. In: NIPS, vol. 25 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
Thrun, S., Montemerlo, M., Dahlkamp, H., et al.: Stanley: the robot that won the DARPA grand challenge. JFR 23(9), 661–692 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Freiburg, Freiburg Im Breisgau, Germany
Abhinav Valada, Gabriel L. Oliveira, Thomas Brox & Wolfram Burgard

Authors

Abhinav Valada
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel L. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brox
View author publications
You can also search for this author in PubMed Google Scholar
Wolfram Burgard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhinav Valada .

Editor information

Editors and Affiliations

Gra Sch of Info Sci&Tech,Dept of MechInf, The University of Tokyo Gra Sch of Info Sci&Tech,Dept of MechInf, Tokyo, Japan
Dana Kulić
Computer Science Department, Stanford University Computer Science Department, Stanford, California, USA
Yoshihiko Nakamura
Department of Electrical & Computer Engg, University of Waterloo Department of Electrical & Computer Engg, Waterloo, Ontario, Canada
Oussama Khatib
Department of Mechanical Systems Enginee, Tokyo University of Agriculture and Tech Department of Mechanical Systems Enginee, Tokyo, Japan
Gentiane Venture

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valada, A., Oliveira, G.L., Brox, T., Burgard, W. (2017). Deep Multispectral Semantic Scene Understanding of Forested Environments Using Multimodal Fusion. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds) 2016 International Symposium on Experimental Robotics. ISER 2016. Springer Proceedings in Advanced Robotics, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-50115-4_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-50115-4_41
Published: 21 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50114-7
Online ISBN: 978-3-319-50115-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics