Scene Parsing and Fusion-Based Continuous Traversable Region Formation

Xiao, Xuhong; Ng, Gee Wah; Tan, Yuan Sin; Ye Chuan, Yeo

doi:10.1007/978-3-319-16628-5_28

Xuhong Xiao¹⁵,
Gee Wah Ng¹⁵,
Yuan Sin Tan¹⁵ &
…
Yeo Ye Chuan¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9008))

Included in the following conference series:

Asian Conference on Computer Vision

1949 Accesses

Abstract

Determining the categories of different parts of a scene and generating a continuous traversable region map in the physical coordinate system are crucial for autonomous vehicle navigation. This paper presents our efforts in these two aspects for an autonomous vehicle operating in open terrain environment. Driven by the ideas that have been proposed in our Cognitive Architecture, we have designed novel strategies for the top-down facilitation process to explicitly interpret spatial relationship between objects in the scene, and have incorporated a visual attention mechanism into the image-based scene parsing module. The scene parsing module is able to process images fast enough for real-time vehicle navigation applications. To alleviate the challenges in using sparse 3D occupancy grids for path planning, we are proposing an approach to interpolate the category of occupancy grids not hit by 3D LIDAR, with reference to the aligned image-based scene parsing result, so that a continuous $2\frac{1}{2}D$ traversable region map can be formed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

BEVSeg: Geometry and Data-Driven Based Multi-view Segmentation in Bird’s-Eye-View

HaWANet: Road Scene Understanding with Multi-modal Sensor Data Using Height-Width-Driven Attention Network

AfforDrive: Detection of Drivable Area for Autonomous Vehicles

References

Goodale, M.A., Milner, A.D.: Separate visual pathways for perception and action. trends Neurosci. 15(1), 20–25 (1992)
Article Google Scholar
Ng, G.W.: Brain-Mind Machinery. World Scientific, London (2009)
Book Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, pp. 1150–1157 (1999)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in Cortex. Nature Neurosci. 2, 1019–1025 (1999)
Article Google Scholar
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 411–426 (2007)
Article Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained multiscale deformable part model. In: CVPR (2008)
Google Scholar
Viola, P., Michael J.J.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)
Google Scholar
Felzenszwalb, P., Girshick, R. McAllester, D.: Cascade object detection with deformable part models. In: CVPR (2010)
Google Scholar
Laxebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Torralba, A., Murphy, K., P., Freeman, W.T., Rubin, M. A.: Context-based vision system for place and object recognition. In: ICCV, pp. 1023–1029 (2003)
Google Scholar
Siagian, C., Itti, L.: Rapid biologically-inspired scene classication using features shared with visual attention. PAMI 29(2), 300–312 (2007)
Article Google Scholar
Renniger, L., Malik, J.: When is scene identification just texture recognition? Vis. Res. 44, 2301–2311 (2004)
Article Google Scholar
Tighe, J., Lazebnik, S.: SuperParsing: scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)
Chapter Google Scholar
Li, L.J., Socher, R., Li, F.F.: Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: CVPR (2009)
Google Scholar
Du, L., Ren, L., Dunson, D., B., Carin, L.: A Bayesian model for simultaneous image clustering, annotation and object segmentation. In: NIPS (2009)
Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C.: Object in context. In: ICCV (2007)
Google Scholar
Galleguillos, C., Belongie, S.: Context-based object categorization: a critical survey. J. Comput. Vis. Image Underst. 114(6), 712–722 (2010)
Article Google Scholar
He, X., Zemel, R., Carreira-Perpindn, M.A.: Multiscale conditional random fields for image labelling. In: CVPR, pp. 695–702 (2004)
Google Scholar
Kumar, S., Hebert, M.: A hierarchical field framework for unified context-based classification. In: ICCV, pp. 1284–1291 (2005)
Google Scholar
Verbeek, J., Triggs, B.: Scene segmentation with conditional random fields learned from partially labeled images. In: NIPS (2008)
Google Scholar
http://en.wikipedia.org/wiki/DARPA_Grand_Challenge
Vandapel, N., Huber, D.F., Kapuria, A., Hebert, M.: Natural terrain classification using three-dimensional Ladar data for ground robot mobility. J. Field Robot. 23(10), 839–861 (2006)
Article Google Scholar
Himmelsbach, M., Luettel, T., Wuensche, H.J.: Real-time object classification in 3D point clouds using point feature histograms. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, USA (2009)
Google Scholar
Thrun, S., et al.: Stanley: the robot that won the DARPA grand challenge. J. Robot. Syst. 23(9), 661–692 (2006)
Google Scholar
Rasmussen, C.: A hybrid vision+Ladar rural road follower. In: Proceedings of the IEEE Conference on Robotics and Automation, pp. 156–161 (2006)
Google Scholar
Manz, M., Himmelsbach, M., Luettel, T., Wuensche, H.: Detection and tracking of road networks in rural terrain by fusing vision and LIDAR. In: Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4562–4568 (2011)
Google Scholar
Ng, G.W., Xiao, X., Chan, R.Z., Tan, Y.S.: Scene understanding using DSO cognitive architecture. In: Proceedings of the 15th International Conference on Information Fusion (2012)
Google Scholar
Zhao, G., Xiao, X., Yuan, J., Ng, G.W.: Fusion of 3D-LIDAR and camera data for scene parsing. J. Vis. Commun. Image Represent. 25(1), 165–183 (2013)
Article Google Scholar
Hochstein, S., Ahissar, M.: View from the top: hierarchies and reverse hierarchies in the visual system. Neuron 36, 791–804 (2002)
Article Google Scholar
Bar, M.: A cortical mechanism for triggering top-down facilitation in visual object recognition. J. Cogn. Neurosci. 15(4), 600–609 (2003)
Article MathSciNet Google Scholar
Yao, J., Fidler, S., and Urtasun, R.: Describing the scene as a whole: joint object detection, scene classfication and semantic segmentation. In: CVPR (2012)
Google Scholar
Kasther, S., Ungerleider, G.: Mechanisms of visual attention in the human cortex. Annu. Rev. Neural Sci. 23, 315–341 (2000)
Google Scholar
Felzenszwalb, P., Huttenlocker, D.: Efficient graph-Based imagesegmentation. IJCV 2, 167–181 (2004)
Article Google Scholar
http://www.robots.ox.ac.uk/vgg/research/textclass/filters.html
http://www.mit.edu/jmutch/fhlib
Ojala, T., Pietikainen, M., Maenpaa, T.: Multi-resolution gray-scaleand rotation invariant texture classification with local binary patterns. PAMI 24(7), 971–986 (2002)
Article Google Scholar
Fenske, M.J., Aminoff, E., Gronau, N., Bar, M.: Top-down facilitation of visual object recognition: object-based and context-based contributions. Prog. Brain Res. 155, 3–21 (2006)
Article Google Scholar
Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(2), 520–527 (2007)
Article Google Scholar
Desai, C., Ramanan, D., Fowlkes, C.C.: Discriminative models for multi-class object layout. IJCV 2, 169–176 (2012)
Google Scholar
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned Salient Region Detection. In: CVPR (2009)
Google Scholar
Rensink, R.A.: The dynamic representation of scenes. Visual Cognition 7(1/2/3), 17–42 (2000)
Article Google Scholar
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008)
Chapter Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

DSO National Laboratories, 20 Science Park Drive, Singapore, 118230, Singapore
Xuhong Xiao, Gee Wah Ng, Yuan Sin Tan & Yeo Ye Chuan

Authors

Xuhong Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Gee Wah Ng
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Sin Tan
View author publications
You can also search for this author in PubMed Google Scholar
Yeo Ye Chuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuhong Xiao .

Editor information

Editors and Affiliations

Center for Visual Information Technology, International Institute of Information Technology, Hyderabad, India
C.V. Jawahar
Institue of Computing Technology, Chinese Academy of Sciences, Beijing, China
Shiguang Shan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiao, X., Ng, G.W., Tan, Y.S., Ye Chuan, Y. (2015). Scene Parsing and Fusion-Based Continuous Traversable Region Formation. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9008. Springer, Cham. https://doi.org/10.1007/978-3-319-16628-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-16628-5_28
Published: 12 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16627-8
Online ISBN: 978-3-319-16628-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics