Abstract
Determining the categories of different parts of a scene and generating a continuous traversable region map in the physical coordinate system are crucial for autonomous vehicle navigation. This paper presents our efforts in these two aspects for an autonomous vehicle operating in open terrain environment. Driven by the ideas that have been proposed in our Cognitive Architecture, we have designed novel strategies for the top-down facilitation process to explicitly interpret spatial relationship between objects in the scene, and have incorporated a visual attention mechanism into the image-based scene parsing module. The scene parsing module is able to process images fast enough for real-time vehicle navigation applications. To alleviate the challenges in using sparse 3D occupancy grids for path planning, we are proposing an approach to interpolate the category of occupancy grids not hit by 3D LIDAR, with reference to the aligned image-based scene parsing result, so that a continuous \(2\frac{1}{2}D\) traversable region map can be formed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goodale, M.A., Milner, A.D.: Separate visual pathways for perception and action. trends Neurosci. 15(1), 20–25 (1992)
Ng, G.W.: Brain-Mind Machinery. World Scientific, London (2009)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, pp. 1150–1157 (1999)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in Cortex. Nature Neurosci. 2, 1019–1025 (1999)
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 411–426 (2007)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained multiscale deformable part model. In: CVPR (2008)
Viola, P., Michael J.J.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)
Felzenszwalb, P., Girshick, R. McAllester, D.: Cascade object detection with deformable part models. In: CVPR (2010)
Laxebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Comput. Vis. 42(3), 145–175 (2001)
Torralba, A., Murphy, K., P., Freeman, W.T., Rubin, M. A.: Context-based vision system for place and object recognition. In: ICCV, pp. 1023–1029 (2003)
Siagian, C., Itti, L.: Rapid biologically-inspired scene classication using features shared with visual attention. PAMI 29(2), 300–312 (2007)
Renniger, L., Malik, J.: When is scene identification just texture recognition? Vis. Res. 44, 2301–2311 (2004)
Tighe, J., Lazebnik, S.: SuperParsing: scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)
Li, L.J., Socher, R., Li, F.F.: Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: CVPR (2009)
Du, L., Ren, L., Dunson, D., B., Carin, L.: A Bayesian model for simultaneous image clustering, annotation and object segmentation. In: NIPS (2009)
Rabinovich, A., Vedaldi, A., Galleguillos, C.: Object in context. In: ICCV (2007)
Galleguillos, C., Belongie, S.: Context-based object categorization: a critical survey. J. Comput. Vis. Image Underst. 114(6), 712–722 (2010)
He, X., Zemel, R., Carreira-Perpindn, M.A.: Multiscale conditional random fields for image labelling. In: CVPR, pp. 695–702 (2004)
Kumar, S., Hebert, M.: A hierarchical field framework for unified context-based classification. In: ICCV, pp. 1284–1291 (2005)
Verbeek, J., Triggs, B.: Scene segmentation with conditional random fields learned from partially labeled images. In: NIPS (2008)
Vandapel, N., Huber, D.F., Kapuria, A., Hebert, M.: Natural terrain classification using three-dimensional Ladar data for ground robot mobility. J. Field Robot. 23(10), 839–861 (2006)
Himmelsbach, M., Luettel, T., Wuensche, H.J.: Real-time object classification in 3D point clouds using point feature histograms. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, USA (2009)
Thrun, S., et al.: Stanley: the robot that won the DARPA grand challenge. J. Robot. Syst. 23(9), 661–692 (2006)
Rasmussen, C.: A hybrid vision+Ladar rural road follower. In: Proceedings of the IEEE Conference on Robotics and Automation, pp. 156–161 (2006)
Manz, M., Himmelsbach, M., Luettel, T., Wuensche, H.: Detection and tracking of road networks in rural terrain by fusing vision and LIDAR. In: Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4562–4568 (2011)
Ng, G.W., Xiao, X., Chan, R.Z., Tan, Y.S.: Scene understanding using DSO cognitive architecture. In: Proceedings of the 15th International Conference on Information Fusion (2012)
Zhao, G., Xiao, X., Yuan, J., Ng, G.W.: Fusion of 3D-LIDAR and camera data for scene parsing. J. Vis. Commun. Image Represent. 25(1), 165–183 (2013)
Hochstein, S., Ahissar, M.: View from the top: hierarchies and reverse hierarchies in the visual system. Neuron 36, 791–804 (2002)
Bar, M.: A cortical mechanism for triggering top-down facilitation in visual object recognition. J. Cogn. Neurosci. 15(4), 600–609 (2003)
Yao, J., Fidler, S., and Urtasun, R.: Describing the scene as a whole: joint object detection, scene classfication and semantic segmentation. In: CVPR (2012)
Kasther, S., Ungerleider, G.: Mechanisms of visual attention in the human cortex. Annu. Rev. Neural Sci. 23, 315–341 (2000)
Felzenszwalb, P., Huttenlocker, D.: Efficient graph-Based imagesegmentation. IJCV 2, 167–181 (2004)
http://www.robots.ox.ac.uk/vgg/research/textclass/filters.html
Ojala, T., Pietikainen, M., Maenpaa, T.: Multi-resolution gray-scaleand rotation invariant texture classification with local binary patterns. PAMI 24(7), 971–986 (2002)
Fenske, M.J., Aminoff, E., Gronau, N., Bar, M.: Top-down facilitation of visual object recognition: object-based and context-based contributions. Prog. Brain Res. 155, 3–21 (2006)
Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(2), 520–527 (2007)
Desai, C., Ramanan, D., Fowlkes, C.C.: Discriminative models for multi-class object layout. IJCV 2, 169–176 (2012)
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned Salient Region Detection. In: CVPR (2009)
Rensink, R.A.: The dynamic representation of scenes. Visual Cognition 7(1/2/3), 17–42 (2000)
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008)
Matas, J., Chum, O., Urban, M., Pajdla, T: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Xiao, X., Ng, G.W., Tan, Y.S., Ye Chuan, Y. (2015). Scene Parsing and Fusion-Based Continuous Traversable Region Formation. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9008. Springer, Cham. https://doi.org/10.1007/978-3-319-16628-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-16628-5_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16627-8
Online ISBN: 978-3-319-16628-5
eBook Packages: Computer ScienceComputer Science (R0)