Abstract
This paper proposes a fine semantic mapping method using dense segmentation network (DS-Net) to obtain good performance of semantic mapping fusion. First, the RGB image and the depth image are used to generate a dense indoor scene map via the state-of-the-art dense SLAM (ElasticFusion). Then, the DS-Net is constructed based on DenseNet’s dense connection to perform precise semantic segmentation on the input RGB image. Finally, the long-term correspondence is established between the indoor scene map and the landmarks using continuous frames both in the visual odometer and in loop detection, and the final semantic map is obtained by fusing the indoor scene map with the semantic predictions of the RGB-D video frames of multiple angles. Experiments were performed on the NYUv2, PASCAL VOC 2012, CIFAR10 datasets and our laboratory environments. Results show that our method can reduce the error in dense map construction and obtain good semantic segmentation performance.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Cadena C, Carlone L, Carrillo H et al (2016) Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans Rob 32(6):1309–1332
McCormac J, Handa A, Davison A et al (2017) Semanticfusion: dense 3d semantic mapping with convolutional neural networks. In: IEEE international conference on robotics & automation (ICRA), pp 4628–4635
Yang S, Huang Y, Scherer S (2017) Semantic 3d occupancy mapping through efficient high order crfs. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 590–597
Hermans A, Floros G, Leibe B (2014) Dense 3d semantic mapping of indoor scenes from RGB-D images. In: IEEE international conference on robotics & automation (ICRA), pp 2631–2638
Henry P, Krainin M, Herbst E et al (2014) RGB-D mapping: Using depth cameras for dense 3d modeling of indoor environments. Int J Robot Res 31(5):647–663
Whelan T, Johannsson H, Kaess M et al (2013) Robust real-time visual odometry for dense RGB-D mapping. In: IEEE international conference on robotics & automation (ICRA), pp 5724–5731
Dai A, NieSSner M, Zollhöfer M et al (2017) Bundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface re-integration. ACM Trans Gr 36(3):24
Whelan T, Salas-Moreno RF, Glocker B et al (2016) Elasticfusion: real-time dense slam and light source estimation. Int J Robot Res 35(14):1697–1716
Sunderhauf N, Pham TT, Latif Y et al (2017) Meaningful maps with object-oriented semantic mapping. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5079–5085
Bowman SL, Atanasov N, Daniilidis K et al (2017) Probabilistic data association for semantic slam. In: IEEE international conference on robotics & automation (ICRA), pp 1722–1729
Huang G, Liu Z, Laurens VDM et al (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Silberman N, Hoiem D, Kohli P et al (2012) Indoor segmentation and support inference from RGBD images. In: European conference on computer vision (ECCV), pp 746–760
Vineet V, Miksik O, Lidegaard M et al (2015) Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: IEEE international conference on robotics & automation, pp 75–82
Salas-Moreno RF, Newcombe RA, Strasdat H et al (2013) Slam++: simultaneous localisation and mapping at the level of objects. In: Computer vision pattern recognition (CVPR), pp 1352–1359
Nakajima Y, Tateno K, Tombari F et al (2018) Fast and accurate semantic mapping through geometric-based incremental segmentation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 385–392
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: IEEE international conference on computer vision (ICCV), pp 1520–1528
Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. In: Advances in neural information processing systems, pp 1495–1503
Schuler CJ, Hirsch M, Harmeling S et al (2016) Learning to deblur. IEEE Trans Pattern Anal Mach Intell 38(7):1439–1451
Shelhamer E, Long J,Darrell T (2017) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 472–483
Chen LC, Papandreou G, Kokkinos I et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Zhao H, Qi X, Shen X et al (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420
Badrinarayanan V, Kendall A, Segnet Cipolla R (2017) A deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Kreo I, Krapac J, šegvić S (2019) Efficient ladder-style densenets for semantic segmentation of large images. arXiv preprint arXiv:1905.05661
Larsson G, Maire M, Shakhnarovich G. Fractalnet (2016) Ultra-deep neural networks without residuals. In: International conference on learning representations, pp 485–495
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Neural information processing systems (NIPS), pp 2377–2385
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Lee CY, Xie S (2015) Gallagher P et al. Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570
Zhang Z, Liang X, Dong X et al (2018) A sparse-view CT reconstruction method based on combination of densenet and deconvolution. IEEE Trans Med Imaging 37(6):1407–1417
Li T, Xu M, Yang R et al (2019) A densenet based approach for multi-frame in-loop filter in HEVC. In: Data compression conference, pp 270–279
Joaquim S, Matabosch C et al (2007) A review of recent range image registration methods with accuracy evaluation. Image Vis Comput 25(5):578–596
Keller M, Lefloch D, Lambers M, Izadi S, Weyrich T. Kolb A (2013) Real-time 3D Reconstruction in Dynamic Scenes using Point-based Fusion. In: Proceedings of joint 3DIM/3DPVT conference (3DV), pp 1–8
Whelan T, Kaess M, Johannsson H, Fallon MF, Leonard JJ, McDonald JB (2015) Real-time large scale dense RGB-D SLAM with volumetric fusion. IJRR 34(4–5):598–626
Woodham RJ et al (1992) Photometric method for determining surface orientation from multiple images. Opt Eng 19(1):139–144
Glocker J, Criminisi Shotton A, Izadi S (2015) Real-Time RGB-D camera relocalization via randomized ferns for Keyframe encoding. IEEE Trans Visual Comput Graphics 21(5):571–583
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on international conference on machine learning, pp 448–456
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: International conference on international conference on machine learning, pp 807–814
Everingham M, Winn J (2006) The Pascal visual object classes challenge 2007 (voc2007) development kit. Int J Comput Vis 111(1):98–136
Couprie C, Farabet C, Najman L et al (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572
Hariharan, B et al (2011) Semantic contours from inverse detectors. In: IEEE international conference on computer vision, ICCV, Barcelona, Spain, November 6–13, pp 991–998
Lin T-Y, Maire M, Belongie S, et al (2014) Microsoft COCO: common objects in context. In: ECCV, pp 740–755
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: CVPR, pp 6230–6239
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp 833–851
Zhang H, Dana KJ, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: CVPR, pp 7151–7160
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In CVPR, pp 1857–1866
Amiri Atashgah MA, Malaek SMB et al (2012) Prediction of aerial-image motion blurs due to the flying vehicle dynamics and camera characteristics in a virtual environment. J Aerosp Eng 227(7):1055–1067
Amiri Atashgah MA, Gholampour P, Malaek SMB (2013) Integration of image de-blurring in an aerial Mono-SLAM. J Aerosp Eng 228(8):1348–1462
Acknowledgements
This document is the results of the research projects funded by the National Natural Science Foundation of China (61873008) and Beijing Natural Science Foundation (4182008, 4192010).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zuo, G., Zheng, T., Liu, Y. et al. Fine semantic mapping based on dense segmentation network. Intel Serv Robotics 14, 47–60 (2021). https://doi.org/10.1007/s11370-020-00341-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-020-00341-8