Abstract
In this paper we show that combining knowledge of the orientation of a camera with visual information can be used to improve the performance of semantic image segmentation. This is based on the assumption that the direction in which a camera is facing acts as a prior on the content of the images it creates. We gathered egocentric video with a camera attached to a head-mounted display, and recorded its orientation using an inertial sensor. By combining orientation information with typical image descriptors, we show that segmentation of individual images improves in accuracy compared with vision alone, from 61 % to 71 % over six classes. We also show that this method can be applied to both point and line based features from the image, and that these can be combined together for further benefits. Our resulting system would have applications in autonomous robot locomotion and guiding visually impaired humans.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
Code available at www.ipol.im/pub/art/2012/gjmr-lsd.
- 5.
Using the ‘gco-v3.0’ code at vision.csd.uwo.ca/code.
- 6.
Using code available at www.philkr.net/home/densecrf.
- 7.
Our dataset can be found at www.osianh.com/inertial.
- 8.
Videos available at www.osianh.com/inertial.
References
Angelaki, D.E., Cullen, K.E.: Vestibular system: the many facets of a multimodal sense. Ann. Rev. Neurosci. 31, 125–150 (2008)
Bi, Y., Guan, J., Bell, D.: The combination of multiple classifiers using an evidential reasoning approach. Artif. Intell. 172(15), 1731–1751 (2008)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Brandt, T., Bartenstein, P., Janek, A., Dieterich, M.: Reciprocal inhibitory visual-vestibular interaction. Brain 121(9), 1749–1758 (1998)
Dahlkamp, H., Kaehler, A., Stavens, D., Thrun, S., Bradski, G.R.: Self-supervised monocular road detection in desert terrain. In: Robotics Science and Systems (2006)
Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast approximate energy minimization with label costs. Int. J. Comput. Vis. 96(1), 1–27 (2012)
Deshpande, N., Patla, A.E.: Visual-vestibular interaction during goal directed locomotion: effects of aging and blurring vision. Exp. Brain Res. 176(1), 43–53 (2007)
De Souza, G.N., Kak, A.C.: IVision for mobile robot navigation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 237–267 (2002)
Domke, J.: Learning graphical model parameters with approximate marginal inference. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2454 (2013)
Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: IEEE International Conference on Computer Vision (2009)
Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation. Int. J. Comput. Vis. 112, 1–17 (2014)
Haines, O., Calway, A.: Recognising planes in a single image. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1849–1861 (2014)
Haines, O., Bull, D., Burn, J.F.: Using inertial data to enhance image segmentation. In: International Conference on Computer Vision Theory and Applications (2015)
Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vis. 1(75), 151–172 (2007)
Joshi, N., Kang, S.B., Zitnick, C.L., Szeliski, R.: Image deblurring using inertial measurement sensors. ACM Trans. Graph. 29(4), 30 (2010)
Kleiner, A., Dornhege, C.: Real-time localization and elevation mapping within urban search and rescue scenarios. J. Field Robot. 24(8–9), 723–745 (2007)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: Advances in Neural Information Processing Systems (2011)
Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 703–718. Springer, Heidelberg (2014)
Li, S.Z.: Markov Random Field Modeling in Image Analysis. Springer, New York (2009)
Lorch, O., Albert, A., Denk, J., Gerecke, M., Cupec, R., Seara, J.F., Gerth, W., Schmidt, G.: Experiments in vision-guided biped walking. In: IEEE International Conference on Intelligent Robots and Systems (2002)
Maimone, M., Cheng, Y., Matthies, L.: Two years of visual odometry on the mars exploration rovers. J. Field Robot. 24(3), 169–186 (2007)
Nützi, G., Weiss, S., Scaramuzza, D., Siegwart, R.: Fusion of IMU and vision for absolute scale estimation in monocular SLAM. J. Intell. Robot. Syst. 61, 287–299 (2011)
Patla, A.E.: Understanding the roles of vision in the control of human locomotion. Gait Posture 1(5), 54–69 (1997)
Piniés, P., Lupton, T., Sukkarieh, S., Tardós, J.D.: Inertial aiding of inverse depth SLAM using a monocular camera. In: International Conference on Robotics and Automation (2007)
Sadhukhan, D., Moore, C., Collins, E.: Terrain estimation using internal sensors. In: International Conference on Robotics and Applications (2004)
Gould, S., Fulton, R., Koller, D.: Combining appearance and structure from motion features for road scene understanding. In: British Machine Vision Conference (2009)
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Introduction to Statistical Relational Learning, pp. 93–128 (2006)
Tapu, R., Mocanu, B., Zaharia, T.: A computer vision system that ensure the autonomous navigation of blind people. In: Conference on E-Health and Bioengineerin (2013)
Vidal, P.P., Degallaix, L., Josset, P., Gasc, J.P., Cullen, K.E.: Postural and locomotor control in normal and vestibularly deficient mice. J. Physiol. 559(2), 625638 (2004)
Virre, E.: Virtual reality and the vestibular apparatus. Eng. Med. Biol. Mag. 15(2), 41–43 (1996)
Von Gioi, R.G., Jakubowicz, J., Morel, J., Randall, G.: LSD: a fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 722–732 (2010)
Acknowledgments
This work was funded by the UK Engineering and Physical Sciences Research Council (EP/J012025/1). The authors would like to thank Austin Gregg-Smith and Geoffrey Daniels for help with hardware and data, and Adeline Paiement for all the enlightening discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Haines, O., Bull, D.R., Burn, J.F. (2016). Fusing Intertial Data with Vision for Enhanced Image Understanding. In: Braz, J., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2015. Communications in Computer and Information Science, vol 598. Springer, Cham. https://doi.org/10.1007/978-3-319-29971-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-29971-6_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29970-9
Online ISBN: 978-3-319-29971-6
eBook Packages: Computer ScienceComputer Science (R0)