Skip to main content

Fusing Intertial Data with Vision for Enhanced Image Understanding

  • Conference paper
Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 598))

  • 1088 Accesses

Abstract

In this paper we show that combining knowledge of the orientation of a camera with visual information can be used to improve the performance of semantic image segmentation. This is based on the assumption that the direction in which a camera is facing acts as a prior on the content of the images it creates. We gathered egocentric video with a camera attached to a head-mounted display, and recorded its orientation using an inertial sensor. By combining orientation information with typical image descriptors, we show that segmentation of individual images improves in accuracy compared with vision alone, from 61 % to 71 % over six classes. We also show that this method can be applied to both point and line based features from the image, and that these can be combined together for further benefits. Our resulting system would have applications in autonomous robot locomotion and guiding visually impaired humans.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    en.ids-imaging.com.

  2. 2.

    www.oculus.com.

  3. 3.

    www.docs.opencv.org.

  4. 4.

    Code available at www.ipol.im/pub/art/2012/gjmr-lsd.

  5. 5.

    Using the ‘gco-v3.0’ code at vision.csd.uwo.ca/code.

  6. 6.

    Using code available at www.philkr.net/home/densecrf.

  7. 7.

    Our dataset can be found at www.osianh.com/inertial.

  8. 8.

    Videos available at www.osianh.com/inertial.

References

  1. Angelaki, D.E., Cullen, K.E.: Vestibular system: the many facets of a multimodal sense. Ann. Rev. Neurosci. 31, 125–150 (2008)

    Article  Google Scholar 

  2. Bi, Y., Guan, J., Bell, D.: The combination of multiple classifiers using an evidential reasoning approach. Artif. Intell. 172(15), 1731–1751 (2008)

    Article  MATH  Google Scholar 

  3. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  4. Brandt, T., Bartenstein, P., Janek, A., Dieterich, M.: Reciprocal inhibitory visual-vestibular interaction. Brain 121(9), 1749–1758 (1998)

    Article  Google Scholar 

  5. Dahlkamp, H., Kaehler, A., Stavens, D., Thrun, S., Bradski, G.R.: Self-supervised monocular road detection in desert terrain. In: Robotics Science and Systems (2006)

    Google Scholar 

  6. Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast approximate energy minimization with label costs. Int. J. Comput. Vis. 96(1), 1–27 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  7. Deshpande, N., Patla, A.E.: Visual-vestibular interaction during goal directed locomotion: effects of aging and blurring vision. Exp. Brain Res. 176(1), 43–53 (2007)

    Article  Google Scholar 

  8. De Souza, G.N., Kak, A.C.: IVision for mobile robot navigation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 237–267 (2002)

    Article  Google Scholar 

  9. Domke, J.: Learning graphical model parameters with approximate marginal inference. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2454 (2013)

    Article  Google Scholar 

  10. Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: IEEE International Conference on Computer Vision (2009)

    Google Scholar 

  11. Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation. Int. J. Comput. Vis. 112, 1–17 (2014)

    MathSciNet  Google Scholar 

  12. Haines, O., Calway, A.: Recognising planes in a single image. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1849–1861 (2014)

    Article  Google Scholar 

  13. Haines, O., Bull, D., Burn, J.F.: Using inertial data to enhance image segmentation. In: International Conference on Computer Vision Theory and Applications (2015)

    Google Scholar 

  14. Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vis. 1(75), 151–172 (2007)

    Article  MATH  Google Scholar 

  15. Joshi, N., Kang, S.B., Zitnick, C.L., Szeliski, R.: Image deblurring using inertial measurement sensors. ACM Trans. Graph. 29(4), 30 (2010)

    Article  Google Scholar 

  16. Kleiner, A., Dornhege, C.: Real-time localization and elevation mapping within urban search and rescue scenarios. J. Field Robot. 24(8–9), 723–745 (2007)

    Article  Google Scholar 

  17. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: Advances in Neural Information Processing Systems (2011)

    Google Scholar 

  18. Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 703–718. Springer, Heidelberg (2014)

    Google Scholar 

  19. Li, S.Z.: Markov Random Field Modeling in Image Analysis. Springer, New York (2009)

    MATH  Google Scholar 

  20. Lorch, O., Albert, A., Denk, J., Gerecke, M., Cupec, R., Seara, J.F., Gerth, W., Schmidt, G.: Experiments in vision-guided biped walking. In: IEEE International Conference on Intelligent Robots and Systems (2002)

    Google Scholar 

  21. Maimone, M., Cheng, Y., Matthies, L.: Two years of visual odometry on the mars exploration rovers. J. Field Robot. 24(3), 169–186 (2007)

    Article  Google Scholar 

  22. Nützi, G., Weiss, S., Scaramuzza, D., Siegwart, R.: Fusion of IMU and vision for absolute scale estimation in monocular SLAM. J. Intell. Robot. Syst. 61, 287–299 (2011)

    Article  Google Scholar 

  23. Patla, A.E.: Understanding the roles of vision in the control of human locomotion. Gait Posture 1(5), 54–69 (1997)

    Article  Google Scholar 

  24. Piniés, P., Lupton, T., Sukkarieh, S., Tardós, J.D.: Inertial aiding of inverse depth SLAM using a monocular camera. In: International Conference on Robotics and Automation (2007)

    Google Scholar 

  25. Sadhukhan, D., Moore, C., Collins, E.: Terrain estimation using internal sensors. In: International Conference on Robotics and Applications (2004)

    Google Scholar 

  26. Gould, S., Fulton, R., Koller, D.: Combining appearance and structure from motion features for road scene understanding. In: British Machine Vision Conference (2009)

    Google Scholar 

  27. Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Introduction to Statistical Relational Learning, pp. 93–128 (2006)

    Google Scholar 

  28. Tapu, R., Mocanu, B., Zaharia, T.: A computer vision system that ensure the autonomous navigation of blind people. In: Conference on E-Health and Bioengineerin (2013)

    Google Scholar 

  29. Vidal, P.P., Degallaix, L., Josset, P., Gasc, J.P., Cullen, K.E.: Postural and locomotor control in normal and vestibularly deficient mice. J. Physiol. 559(2), 625638 (2004)

    Article  Google Scholar 

  30. Virre, E.: Virtual reality and the vestibular apparatus. Eng. Med. Biol. Mag. 15(2), 41–43 (1996)

    Article  Google Scholar 

  31. Von Gioi, R.G., Jakubowicz, J., Morel, J., Randall, G.: LSD: a fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 722–732 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

This work was funded by the UK Engineering and Physical Sciences Research Council (EP/J012025/1). The authors would like to thank Austin Gregg-Smith and Geoffrey Daniels for help with hardware and data, and Adeline Paiement for all the enlightening discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osian Haines .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Haines, O., Bull, D.R., Burn, J.F. (2016). Fusing Intertial Data with Vision for Enhanced Image Understanding. In: Braz, J., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2015. Communications in Computer and Information Science, vol 598. Springer, Cham. https://doi.org/10.1007/978-3-319-29971-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29971-6_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29970-9

  • Online ISBN: 978-3-319-29971-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics