Skip to main content

3-D Vision and Recognition

  • Reference work entry
Springer Handbook of Robotics

Abstract

In this chapter, we describe methods to be applied on a robot equipped with one or more camera sensors. Our goal is to present representations and models for both three-dimensional (3-D) motion and structure estimation as well as recognition. We do not delve into estimation and inference issues since these are extensively treated in other chapters. The same applies to the fusion with other sensors, which we heavily encourage but do not describe here.

In the first part we describe the main methods in 3-D inference from two-dimensional (2-D) images. We are at the point where we could propose a recipe, at least for a small spatial extent. If we are able to track a few visual features in our images, we are able to estimate the self-motion of the robot as well as its pose with respect to any known landmark. Having solutions for minimal case problems, the obvious way here is to apply random sample consensus. If no known 3-D landmark is given then the trajectory of the camera exhibits drift. From the trajectory of the camera, time windows over several frames are selected and a 3-D dense depth map is obtained through solving the stereo problem. Large-scale reconstructions based on camera only do raise challenges with respect to drift and loop closing.

In the second part we deal with recognition as appealed to robotics. The main challenge here is to detect an instance of an object and recognize or categorize it. Since in robotics applications an object of interest always resides in a cluttered environment any algorithm has to be insensitive to missing parts of the object of interest and outliers. The dominant paradigm is based on matching the appearance of pictures. Features are detected and quantized into visual words. Similarity is based on the difference between histograms of such visual words. Recognition has a long way to go but robotics provides the opportunity to explore an object and be active in the recognition process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 309.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

DOG:

difference of Gaussian

GPS:

global positioning system

HMM:

hidden Markov model

IMU:

inertial measurement units

MAP:

maximum a posteriori probability

MSER:

maximally stable extremal regions

RANSAC:

random sample consensus

RBC:

recognition-by-components

RGB:

red, green, blue

SIFT:

scale-invariant feature transformation

SLAM:

simultaneous localization and mapping

SVD:

singular value decomposition

SfM:

structure from motion

TF-IDF:

term-frequency inverse document frequency

References

  1. Z. Zhang: A flexible new technique for camera calibration, IEEE Trans. Pattern Anal. Mach. Intell. 22, 1330–1334 (2000)

    Article  Google Scholar 

  2. M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, R. Koch: Visual modeling with a hand-held camera, Int. J. Comput. Vis. 59, 207–232 (2004)

    Article  Google Scholar 

  3. M. Pollefeys, L. Van Gool: Stratified self-calibration with the modulus constraint, IEEE Trans. Pattern Anal. Mach. Intell. 21, 707–724 (1999)

    Article  Google Scholar 

  4. O. Faugeras, Q.-T. Luong, T. Papadopoulo: The Geometry of Multiple Images (MIT Press, Cambridge 2001)

    MATH  Google Scholar 

  5. R. Hartley, A. Zisserman: Multiple View Geometry (Cambridge Univ. Press, Cambridge 2000)

    MATH  Google Scholar 

  6. K. Ottenberg, R.M. Haralick, C.-N. Lee, M. Nolle: Review and analysis of solutions of the three-point perspective problem, Int. J. Comput. Vis. 13, 331–356 (1994)

    Article  Google Scholar 

  7. M.A. Fischler, R.C. Bolles: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM 24, 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  8. R. Kumar, A.R. Hanson: Robust methods for estimaging pose and a sensitivity analysis, Comput. Vis. Image Underst. 60, 313–342 (1994)

    Article  Google Scholar 

  9. C.-P. Lu, G. Hager, E. Mjolsness: Fast and globally convergent pose estimation from video images, IEEE Trans. Pattern Anal. Mach. Intell. 22, 610–622 (2000)

    Article  Google Scholar 

  10. L. Quan, Z. Lan: Linear n-point camera pose determination, IEEE Trans. Pattern Anal. Mach. Intell. 21, 774–780 (1999)

    Article  Google Scholar 

  11. A. Ansar, K. Daniilidis: Linear pose estimation from points and lines, IEEE Trans. Pattern Anal. Mach. Intell. 25, 578–589 (2003)

    Article  Google Scholar 

  12. R.I. Hartley, P. Sturm: Triangulation. Computer Vision and Image Understanding (1997)

    Google Scholar 

  13. B.K.P. Horn, H.M. Hilden, S. Negahdaripour: Closed-form solution of absolute orientation using orthonormal matrices, J. Opt. Soc. Am. A A5, 1127–1135 (1988)

    Article  MathSciNet  Google Scholar 

  14. G.H. Golub, C.F. van Loan: Matrix Computations (The Johns Hopkins Univ. Press, Baltimore 1983)

    MATH  Google Scholar 

  15. A.J. Davison, I.D. Reid, N.D. Molton, O. Stasse: Monoslam: Real-time single camera slam, IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  16. T.S. Huang, O.D. Faugeras: Some properties of the e matrix in two-view motion estimation, IEEE Trans. Pattern Anal. Mach. Intell. 11, 1310–1312 (1989)

    Article  Google Scholar 

  17. D. Nister: An efficient solution for the five-point relative pose problem, IEEE Trans. Pattern Anal. Mach. Intell. 26, 756–777 (2004)

    Article  Google Scholar 

  18. S. Maybank: Theory of Reconstruction from Image Motion (Springer, Berlin, Heidelberg 1993)

    MATH  Google Scholar 

  19. S.J. Maybank: The projective geometry of ambiguous surfaces, Philos. Trans. R. Soc. London A 332(1623), 1–47 (1990)

    Article  MathSciNet  Google Scholar 

  20. A. Jepson, D.J. Heeger: A fast subspace algorithm for recovering rigid motion, Proc. IEEE Workshop on Visual Motion (Princeton 1991) pp. 124–131

    Google Scholar 

  21. C. Fermüller, Y. Aloimonos: Algorithmic independent instability of structure from motion, Proc. 5th Eur. Conf. Comput. Vis. (Freiburg 1998)

    Google Scholar 

  22. K. Daniilidis, M. Spetsakis: Understanding noise sensitivity in structure from motion. In: Visual Navigation, ed. by Y. Aloimonos. (Lawrence Erlbaum, Hillsdale 1996), pp.61–88

    Google Scholar 

  23. S.R. Soatto Brockett: Optimal structure from motion: Local ambiguities and global estimates, IEEE Conf. Comput. Vis. Pattern Recog. (Santa Barbara 1998)

    Google Scholar 

  24. J. Oliensis: A new structure-from-motion ambiguity, IEEE Trans. Pattern Anal. Mach. Intell. 22, 685–700 (1999)

    Article  Google Scholar 

  25. Y. Ma, K. Huang, R. Vidal, J. Kosecka, S. Sastry: Rank conditions of the multiple view matrix, Int. J. Comput. Vis. 59(2), 115–137 (2004)

    Article  Google Scholar 

  26. Y. Ma, S. Soatto, J. Kosecka, S. Sastry: An Invitation to 3-D Vision (Springer, Berlin, Heidelberg 2003)

    Google Scholar 

  27. W. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon: Bundle adjustment for structure from motion (Springer Verlag 2000) pp. 298–375

    Google Scholar 

  28. M. Lourakis, A. Argyros: The design and implementation of a generic sparse bundle adjustment software package based on the Levenberg–Marquard method. Technical Report 340, ICS/FORTH (2004)

    Google Scholar 

  29. S. Teller, M. Antone, Z. Bodnar, M. Bosse, S. Coorg: Calibrated, registered images of an extended urban area, Int. Conf. Comput. Vis. Pattern Recogn., Vol. 1 (Kanai 2001) pp. 813–820

    Google Scholar 

  30. E. Trucco, A. Verri: Introductory Techniques for 3-D Computer Vision (Prentice Hall, Upper Saddle River 1998)

    Google Scholar 

  31. S.S. Intille, A.F. Bobick: Disparity-space images and large occlusion stereo, ECCV 2, 179–186 (1994)

    Google Scholar 

  32. R. Szeliski, D. Scharstein: Sampling the disparity space image, IEEE Trans. Pattern Anal. Mach. Intell. 26(3), 419–425 (2004)

    Article  Google Scholar 

  33. R. Yang, M. Pollefeys, G. Welch: Dealing with textureless regions and specular highlights: A progressive space carving scheme using a novel photo-consistency measure, Proc. Int. Conf. Comput. Vis. (2003)

    Google Scholar 

  34. X. Zabulis, A. Patterson, K. Daniilidis: Digitizing archaeological excavations from multiple monocular views, 5th Int. Conf. 3-D Digital Imag. Mod. (2005)

    Google Scholar 

  35. R.T. Collins: A space-sweep approach to true multi-image matching, IEEE Conf. Comput. Vis. Pattern Recog. (San Fransisco 1996) pp. 358–363

    Google Scholar 

  36. T. Kanade, M. Okutomi: A stereo matching algorithm with an adaptive window: Theory and experiment, IEEE Trans. Pattern Anal. Mach. Intell. 16(9), 920–932 (1994)

    Article  Google Scholar 

  37. D. Scharstein, R. Szeliski: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, Int. J. Comput. Vis. 47(1/2/3), 7–42 (2002)

    Article  MATH  Google Scholar 

  38. H. Hirschmuller: Stereo vision in structured environments by consistent semi-global matching, Comput. Vis. Pattern Recog. 02, 2386–2393 (2006)

    Google Scholar 

  39. O. Veksler: Stereo correspondence by dynamic programming on a tree, Comput. Vis. Pattern Recog. 2, 384–390 (2005)

    Google Scholar 

  40. S. Roy, I. Cox: A maximum-flow formulation of the N-camera stereo correspondence problem, Proc. Int. Conf. Comput. Vis. (1998)

    Google Scholar 

  41. V. Kolmogorov, R. Zabih: Computing visual correspondence with occlusions using graph cuts, Int. Conf. Comput. Vis. 02, 508 (2001)

    Google Scholar 

  42. H.-Y. Shum, J. Sun, N.-N. Zheng: Stereo matching using belief propagation, IEEE Trans. Pattern Anal. Mach. Intell. 25, 787–800 (2003)

    Article  Google Scholar 

  43. L. Zhang, S.M. Seitz: Estimating optimal parameters for mrf stereo from a single image pair, IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 331–342 (2007)

    Article  Google Scholar 

  44. P.F. Felzenszwalb, D.P. Huttenlocher: Efficient belief propagation for early vision, Comput. Vis. Pattern Recog. 01, 261–268 (2004)

    Google Scholar 

  45. H. Hirschmuller: Accurate and efficient stereo processing by semi-global matching and mutual information, Comput. Vis. Pattern Recog. 2, 807–814 (2005)

    Google Scholar 

  46. S.M. Seitz, B. Curless, J. Diebel, D. Scharstein, R. Szeliski: A comparison and evaluation of multi-view stereo reconstruction algorithms, Comput. Vis. Pattern Recog. 1, 519–528 (2006)

    Google Scholar 

  47. C.R. Dyer: Volumetric scene reconstruction from multiple views. In: Foundations of Image Understanding, ed. by L. Davis (Kluwer, Boston 2001) pp. 469–489

    Google Scholar 

  48. D.A. Forsyth, J. Ponce: Computer Vision: A Modern Approach, Prentice Hall Professional Technical Reference (Prentice Hall, Upper Saddle River 2002)

    Google Scholar 

  49. L. Fei Fei, R. Fergus, A. Torralba: Recognizing and learning object categories, Short course given at CVPR 2007 (2007)

    Google Scholar 

  50. A. Pinz: Object categorization, Foundations and Trends in Computer Graphics and Vision 1(4), 255–353 (2005)

    Article  Google Scholar 

  51. A. Guzman: Decomposition of a visual scene into three-dimensional bodies. In: Automatic Interpretation and Classification of Images, ed. by A. Grasseli (Academic, New York 1965)

    Google Scholar 

  52. T.O. Binford: Visual perception by computer, Proc. IEEE Conf. Syst. Contr. (Miami 1971)

    Google Scholar 

  53. R. Brooks: Model-Based Computer Vision (Kluwer Academic, Dordrecht 1984)

    Google Scholar 

  54. D. Marr, K. Nishihara: Representation and recognition of the spatial organization of three-dimensional shapes, Proc. R. Soc. London B 200, 269–294 (1978)

    Article  Google Scholar 

  55. D. Marr: Vision (Freeman, New York 1990)

    Google Scholar 

  56. O.D. Faugeras, M. Hebert: The representation, recognition, and localization of 3D objects, Int. J. Rob. Res. 5(3), 27–52 (1986)

    Article  Google Scholar 

  57. R.C. Bolles, P. Horaud: 3dpo: A three-dimensional part orientation system, Int. J. Robot. Res. 5(3), 3–26 (1986)

    Article  Google Scholar 

  58. I. Biederman: Human image understanding: recent research and a theory, Comput. Vis. Graphics Image Process. 32, 29–73 (1985)

    Article  Google Scholar 

  59. R. Mohan, R. Nevatia: Perceptual organization for scene segmentation and description, IEEE Trans. Pattern Anal. Mach. Intell. 14(6), 616–635 (1992)

    Article  Google Scholar 

  60. A. Zisserman, J.L. Mundy, D.A. Forsyth, J. Liu, N. Pillow, C. Rothwell, S. Utcke: Class-based grouping in perspective images, Int. Conf. Comput. Vis. (1995) pp. 183–188

    Google Scholar 

  61. R.C. Nelson, A. Selinger: Large-scale tests of a keyed, appearance-based 3d object recognition system, Vis. Res. special issue on computational vision 38, 15–16 (1998)

    Google Scholar 

  62. M.J. Tarr, H.H. Bülthoff: Image-based object recognition in man, monkey and machine. In: Object Recognition in Man, Monkey, and Machine, ed. by M. J. Tarr, H. H. Bülthoff (MIT Press, Cambridge 1998) pp. 1–20

    Google Scholar 

  63. M. Turk, A. Pentland: Eigenfaces for recognition, J. Cognit. Neurosci. 3, 71–86 (1991)

    Article  Google Scholar 

  64. T. Poggio, S. Edelman: A neural network that learns to recognize three-dimensional object, Nature 343, 263–266 (1990)

    Article  Google Scholar 

  65. H. Murase, S.K. Nayar: Visual learning and recognition of 3-d objects from appearance, Int. J. Comput. Vis. 14(1), 5–24 (1995)

    Article  Google Scholar 

  66. R.P.N. Rao, D.H. Ballard: Object indexing using an iconic sparse distributed memory. Tech. Rep. TR559, University of Rochester (1995)

    Google Scholar 

  67. M.A. Fischler, R.A. Elschlager: The representation and matching of pictorial structure, IEEE Trans. Comput. 22, 67–92 (1973)

    Article  Google Scholar 

  68. R.M. Haralick, L.G. Shapiro: Computer and Robot Vision (Addison-Wesley, Boston 1992)

    Google Scholar 

  69. T. Lindeberg: On the axiomatic foundations of linear scale-space: Combining semi-group structure with causality vs. scale invariance. In: Gaussian Scale-Space Theory: Proc. PhD School on Scale-Space Theory (Kluwer Academic, Dordrecht 1994)

    Google Scholar 

  70. D.G. Lowe: Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  71. C. Schmid, R. Mohr: Local grayvalue invariants for image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 530–534 (1997)

    Article  Google Scholar 

  72. K. Mikolajczyk, C. Schmid: An affine invariant interest point detector. In: Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, ed. by A. Heyden, G. Sparr, P. Johansen, M. Nielsen (Springer, Berlin, Heidelberg 2002) pp. 128–142

    Google Scholar 

  73. J. Matas, O. Chum, M. Urban, T. Pajdla: Robust wide baseline stereo from maximally stable extremal regions, Br. Mach. Vis. Conf. (2002)

    Google Scholar 

  74. K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool: A comparison of affine region detectors, Int. J. Comput. Vis. 65(1/2), 43–72 (2005)

    Article  Google Scholar 

  75. H.P. Moravec: Towards automatic visual obstacle avoidance, IJCAI (1977) p. 584

    Google Scholar 

  76. C. Harris, M.J. Stephens: A combined corner and edge detector, Alvey Vision Conference (1988) pp. 147–152

    Google Scholar 

  77. W. Foerstner: On the geometric precision of digital correlation, Int. Arch. Photogram. Rem. Sens. (1982)

    Google Scholar 

  78. G. Granlund, J. Bigun: Optimal orientation detection of linear symmetry, Proc. IEEE 1st Int. Conf. Comput. Vis. (1987)

    Google Scholar 

  79. D.G. Lowe: Object recognition from local scale-invariant features, Proc. Int. Conf. Comput. Vis., Corfu (1999) pp. 1150–1157

    Google Scholar 

  80. T. Lindeberg: Feature detection with automatic scale selection, Int. J. Comput. Vis. 30(2), 79–116 (1998)

    Article  Google Scholar 

  81. R. Sedgewick: Algorithms (2nd ed.) (Addison-Wesley, Boston 1988)

    Google Scholar 

  82. T. Kadir, J.M. Brady: Scale, salience and image description, Int. J. Comput. Vis. 45, 83–105 (2001)

    Article  MATH  Google Scholar 

  83. S.S. Smith, J.M. Brady: Susan – a new approach to low level image processing, Int. J. Comput. Vis. 23, 45–78 (1997)

    Article  Google Scholar 

  84. B. Leibe, B. Schiele, A. Leonardis: Combined object categorization and segmentation with an implicit shape model, Europ. Conf. Comp. Vision (2004)

    Google Scholar 

  85. B. Schiele, J.L. Crowley: Recognition without correspondence using multidimensional receptive field histograms, Int. J. Comput. Vis. 36(1), 31–50 (2000)

    Article  Google Scholar 

  86. S. Agarwal, D. Roth: Learning a sparse representation for object detection, Proc. 7th Eur. Conf. Comput. Vis., Vol. 4 (2002) pp. 113–130

    Google Scholar 

  87. M.J. Swain, D.H. Ballard: Color indexing, Int. J. Comput. Vis. 7, 11–32 (1991)

    Article  Google Scholar 

  88. H. Schneiderman, T. Kanade: A statistical method for 3d object detection applied to faces and cars, IEEE Conf. Comput. Vis. Pattern Recog. (2000)

    Google Scholar 

  89. O. Linde, T. Lindeberg: Object recognition using composed receptive field histograms of higher dimensionality, Proc. Int. Conf. Pattern Recog. (2004)

    Google Scholar 

  90. J.J. Koenderink, A.J. Van Doorn: The structure of locally orderless images, Int. J. Comput. Vis. 31(2-3), 159–168 (1999)

    Article  Google Scholar 

  91. Y. Rubner, C. Tomasi: Perceptual Metrics for Image Database Navigation (Kluwer Academic, Dordrecht 2000)

    Google Scholar 

  92. J. Sivic, A. Zisserman: Video Google: A text retrieval approach to object matching in videos, Proc. 9th Int. Conf. Comput. Vis. (Nice 2003) pp. 1470–1477

    Google Scholar 

  93. R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval (Addison Wesley, Reading 1999)

    Google Scholar 

  94. D. Nister, H. Stewenius: Scalable recognition with a vocabulary tree, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog. (2006) pp. 2161–2168

    Google Scholar 

  95. K. Grauman, T. Darrell: Approximate correspondences in high dimensions, Adv. Neural Inform. Proc. Syst 19, 505–512 (2007)

    Google Scholar 

  96. J. Beis, D. Lowe: Shape indexing using approximate nearest-neighbor search in highdimensional spaces

    Google Scholar 

  97. P. Indyk, R. Motwani: Approximate nearest neighbors: towards removing the curse of dimensionality, Proc. 30th Ann. ACM Symp. Theory Comput. (1998) pp. 604–613

    Google Scholar 

  98. O. Drbohlav, D. Omercevic, A. Leonardis: High-dimensional feature matching: Employing the concept of meaningful nearest neighbors, Proc. 11th Int. Conf. Comput. Vis. (2007), in press

    Google Scholar 

  99. P.F. Felzenszwalb, D.P. Huttenlocher: Pictorial structures for object recognition, Int. J. Comput. Vis. 61(1), 55–79 (2005)

    Article  Google Scholar 

  100. R. Fergus, P. Perona, A. Zisserman: Weakly supervised scale-invariant learning of models for visual recognition, Int. J. Comput. Vis. (2005)

    Google Scholar 

  101. J. Kosecka, F. Li: Vision based Markov localization, ICRA (2004)

    Google Scholar 

  102. J. Wolf, W. Burgard, H. Burkhardt: Using an image retrieval system for visionbased mobile robot localization (2002)

    Google Scholar 

  103. I. Ulrich, I. Nourbakhsh: Appearance-based place recognition for topological localization, Proc. ICRA, Vol. 2 (2000) pp. 1023–1029

    Google Scholar 

  104. A. Davison, D. Murray: Simultaneous localisation and map-building using active vision, IEEE Trans. Pattern Anal. Mach. Intell. 24, 865–880 (2002)

    Article  Google Scholar 

  105. R. Bajcsy: Active perception, Proc. IEEE 76, 996–1005 (1988)

    Article  Google Scholar 

  106. F.T. Ramos, B. Upcroft, S. Kumar, H.F. Durrant-Whyte: A Bayesian approach for place recognition, Int. Joint Conf. Artif. Intell. Workshop on Reasoning with Uncertainty in Robotics (RUR-05) (2005)

    Google Scholar 

  107. O. Faugeras: Three-dimensional Computer Vision (MIT Press, Cambridge 1993)

    Google Scholar 

  108. A. Akbarzadeh, J.-M. Frahm, P. Mordohai, B. Clipp, C. Engels, D. Gallup, P. Merell, M. Phels, S. Sinha, B. Talton, L. Wang, Q. Yang, H. Stewenius, R. Yang, G. Welch, H. Towles, D. Nister, M. Pollefeys: Towards urban 3D reconstruction from video, Third Int. Symp. on 3D Data Processing, Visualization, and Transmission (2006)

    Google Scholar 

  109. D. Nister: Preemptive ransac for live structure and motion estimation, Proc. Int. Conf. Comput. Vis. (2003) pp. 199–206

    Google Scholar 

  110. P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. Frahmand, R. Yang, D. Nister, M. Pollefeys: Real-time visibility-based fusion of depth maps, Int. Conf. Comput. Vis. (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kostas Daniilidis Prof or Jan-Olof Eklundh Prof .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag

About this entry

Cite this entry

Daniilidis, K., Eklundh, JO. (2008). 3-D Vision and Recognition. In: Siciliano, B., Khatib, O. (eds) Springer Handbook of Robotics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30301-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30301-5_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23957-4

  • Online ISBN: 978-3-540-30301-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics