Abstract
In this chapter, we describe methods to be applied on a robot equipped with one or more camera sensors. Our goal is to present representations and models for both three-dimensional (3-D) motion and structure estimation as well as recognition. We do not delve into estimation and inference issues since these are extensively treated in other chapters. The same applies to the fusion with other sensors, which we heavily encourage but do not describe here.
In the first part we describe the main methods in 3-D inference from two-dimensional (2-D) images. We are at the point where we could propose a recipe, at least for a small spatial extent. If we are able to track a few visual features in our images, we are able to estimate the self-motion of the robot as well as its pose with respect to any known landmark. Having solutions for minimal case problems, the obvious way here is to apply random sample consensus. If no known 3-D landmark is given then the trajectory of the camera exhibits drift. From the trajectory of the camera, time windows over several frames are selected and a 3-D dense depth map is obtained through solving the stereo problem. Large-scale reconstructions based on camera only do raise challenges with respect to drift and loop closing.
In the second part we deal with recognition as appealed to robotics. The main challenge here is to detect an instance of an object and recognize or categorize it. Since in robotics applications an object of interest always resides in a cluttered environment any algorithm has to be insensitive to missing parts of the object of interest and outliers. The dominant paradigm is based on matching the appearance of pictures. Features are detected and quantized into visual words. Similarity is based on the difference between histograms of such visual words. Recognition has a long way to go but robotics provides the opportunity to explore an object and be active in the recognition process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- DOG:
-
difference of Gaussian
- GPS:
-
global positioning system
- HMM:
-
hidden Markov model
- IMU:
-
inertial measurement units
- MAP:
-
maximum a posteriori probability
- MSER:
-
maximally stable extremal regions
- RANSAC:
-
random sample consensus
- RBC:
-
recognition-by-components
- RGB:
-
red, green, blue
- SIFT:
-
scale-invariant feature transformation
- SLAM:
-
simultaneous localization and mapping
- SVD:
-
singular value decomposition
- SfM:
-
structure from motion
- TF-IDF:
-
term-frequency inverse document frequency
References
Z. Zhang: A flexible new technique for camera calibration, IEEE Trans. Pattern Anal. Mach. Intell. 22, 1330–1334 (2000)
M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, R. Koch: Visual modeling with a hand-held camera, Int. J. Comput. Vis. 59, 207–232 (2004)
M. Pollefeys, L. Van Gool: Stratified self-calibration with the modulus constraint, IEEE Trans. Pattern Anal. Mach. Intell. 21, 707–724 (1999)
O. Faugeras, Q.-T. Luong, T. Papadopoulo: The Geometry of Multiple Images (MIT Press, Cambridge 2001)
R. Hartley, A. Zisserman: Multiple View Geometry (Cambridge Univ. Press, Cambridge 2000)
K. Ottenberg, R.M. Haralick, C.-N. Lee, M. Nolle: Review and analysis of solutions of the three-point perspective problem, Int. J. Comput. Vis. 13, 331–356 (1994)
M.A. Fischler, R.C. Bolles: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM 24, 381–395 (1981)
R. Kumar, A.R. Hanson: Robust methods for estimaging pose and a sensitivity analysis, Comput. Vis. Image Underst. 60, 313–342 (1994)
C.-P. Lu, G. Hager, E. Mjolsness: Fast and globally convergent pose estimation from video images, IEEE Trans. Pattern Anal. Mach. Intell. 22, 610–622 (2000)
L. Quan, Z. Lan: Linear n-point camera pose determination, IEEE Trans. Pattern Anal. Mach. Intell. 21, 774–780 (1999)
A. Ansar, K. Daniilidis: Linear pose estimation from points and lines, IEEE Trans. Pattern Anal. Mach. Intell. 25, 578–589 (2003)
R.I. Hartley, P. Sturm: Triangulation. Computer Vision and Image Understanding (1997)
B.K.P. Horn, H.M. Hilden, S. Negahdaripour: Closed-form solution of absolute orientation using orthonormal matrices, J. Opt. Soc. Am. A A5, 1127–1135 (1988)
G.H. Golub, C.F. van Loan: Matrix Computations (The Johns Hopkins Univ. Press, Baltimore 1983)
A.J. Davison, I.D. Reid, N.D. Molton, O. Stasse: Monoslam: Real-time single camera slam, IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
T.S. Huang, O.D. Faugeras: Some properties of the e matrix in two-view motion estimation, IEEE Trans. Pattern Anal. Mach. Intell. 11, 1310–1312 (1989)
D. Nister: An efficient solution for the five-point relative pose problem, IEEE Trans. Pattern Anal. Mach. Intell. 26, 756–777 (2004)
S. Maybank: Theory of Reconstruction from Image Motion (Springer, Berlin, Heidelberg 1993)
S.J. Maybank: The projective geometry of ambiguous surfaces, Philos. Trans. R. Soc. London A 332(1623), 1–47 (1990)
A. Jepson, D.J. Heeger: A fast subspace algorithm for recovering rigid motion, Proc. IEEE Workshop on Visual Motion (Princeton 1991) pp. 124–131
C. Fermüller, Y. Aloimonos: Algorithmic independent instability of structure from motion, Proc. 5th Eur. Conf. Comput. Vis. (Freiburg 1998)
K. Daniilidis, M. Spetsakis: Understanding noise sensitivity in structure from motion. In: Visual Navigation, ed. by Y. Aloimonos. (Lawrence Erlbaum, Hillsdale 1996), pp.61–88
S.R. Soatto Brockett: Optimal structure from motion: Local ambiguities and global estimates, IEEE Conf. Comput. Vis. Pattern Recog. (Santa Barbara 1998)
J. Oliensis: A new structure-from-motion ambiguity, IEEE Trans. Pattern Anal. Mach. Intell. 22, 685–700 (1999)
Y. Ma, K. Huang, R. Vidal, J. Kosecka, S. Sastry: Rank conditions of the multiple view matrix, Int. J. Comput. Vis. 59(2), 115–137 (2004)
Y. Ma, S. Soatto, J. Kosecka, S. Sastry: An Invitation to 3-D Vision (Springer, Berlin, Heidelberg 2003)
W. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon: Bundle adjustment for structure from motion (Springer Verlag 2000) pp. 298–375
M. Lourakis, A. Argyros: The design and implementation of a generic sparse bundle adjustment software package based on the Levenberg–Marquard method. Technical Report 340, ICS/FORTH (2004)
S. Teller, M. Antone, Z. Bodnar, M. Bosse, S. Coorg: Calibrated, registered images of an extended urban area, Int. Conf. Comput. Vis. Pattern Recogn., Vol. 1 (Kanai 2001) pp. 813–820
E. Trucco, A. Verri: Introductory Techniques for 3-D Computer Vision (Prentice Hall, Upper Saddle River 1998)
S.S. Intille, A.F. Bobick: Disparity-space images and large occlusion stereo, ECCV 2, 179–186 (1994)
R. Szeliski, D. Scharstein: Sampling the disparity space image, IEEE Trans. Pattern Anal. Mach. Intell. 26(3), 419–425 (2004)
R. Yang, M. Pollefeys, G. Welch: Dealing with textureless regions and specular highlights: A progressive space carving scheme using a novel photo-consistency measure, Proc. Int. Conf. Comput. Vis. (2003)
X. Zabulis, A. Patterson, K. Daniilidis: Digitizing archaeological excavations from multiple monocular views, 5th Int. Conf. 3-D Digital Imag. Mod. (2005)
R.T. Collins: A space-sweep approach to true multi-image matching, IEEE Conf. Comput. Vis. Pattern Recog. (San Fransisco 1996) pp. 358–363
T. Kanade, M. Okutomi: A stereo matching algorithm with an adaptive window: Theory and experiment, IEEE Trans. Pattern Anal. Mach. Intell. 16(9), 920–932 (1994)
D. Scharstein, R. Szeliski: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, Int. J. Comput. Vis. 47(1/2/3), 7–42 (2002)
H. Hirschmuller: Stereo vision in structured environments by consistent semi-global matching, Comput. Vis. Pattern Recog. 02, 2386–2393 (2006)
O. Veksler: Stereo correspondence by dynamic programming on a tree, Comput. Vis. Pattern Recog. 2, 384–390 (2005)
S. Roy, I. Cox: A maximum-flow formulation of the N-camera stereo correspondence problem, Proc. Int. Conf. Comput. Vis. (1998)
V. Kolmogorov, R. Zabih: Computing visual correspondence with occlusions using graph cuts, Int. Conf. Comput. Vis. 02, 508 (2001)
H.-Y. Shum, J. Sun, N.-N. Zheng: Stereo matching using belief propagation, IEEE Trans. Pattern Anal. Mach. Intell. 25, 787–800 (2003)
L. Zhang, S.M. Seitz: Estimating optimal parameters for mrf stereo from a single image pair, IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 331–342 (2007)
P.F. Felzenszwalb, D.P. Huttenlocher: Efficient belief propagation for early vision, Comput. Vis. Pattern Recog. 01, 261–268 (2004)
H. Hirschmuller: Accurate and efficient stereo processing by semi-global matching and mutual information, Comput. Vis. Pattern Recog. 2, 807–814 (2005)
S.M. Seitz, B. Curless, J. Diebel, D. Scharstein, R. Szeliski: A comparison and evaluation of multi-view stereo reconstruction algorithms, Comput. Vis. Pattern Recog. 1, 519–528 (2006)
C.R. Dyer: Volumetric scene reconstruction from multiple views. In: Foundations of Image Understanding, ed. by L. Davis (Kluwer, Boston 2001) pp. 469–489
D.A. Forsyth, J. Ponce: Computer Vision: A Modern Approach, Prentice Hall Professional Technical Reference (Prentice Hall, Upper Saddle River 2002)
L. Fei Fei, R. Fergus, A. Torralba: Recognizing and learning object categories, Short course given at CVPR 2007 (2007)
A. Pinz: Object categorization, Foundations and Trends in Computer Graphics and Vision 1(4), 255–353 (2005)
A. Guzman: Decomposition of a visual scene into three-dimensional bodies. In: Automatic Interpretation and Classification of Images, ed. by A. Grasseli (Academic, New York 1965)
T.O. Binford: Visual perception by computer, Proc. IEEE Conf. Syst. Contr. (Miami 1971)
R. Brooks: Model-Based Computer Vision (Kluwer Academic, Dordrecht 1984)
D. Marr, K. Nishihara: Representation and recognition of the spatial organization of three-dimensional shapes, Proc. R. Soc. London B 200, 269–294 (1978)
D. Marr: Vision (Freeman, New York 1990)
O.D. Faugeras, M. Hebert: The representation, recognition, and localization of 3D objects, Int. J. Rob. Res. 5(3), 27–52 (1986)
R.C. Bolles, P. Horaud: 3dpo: A three-dimensional part orientation system, Int. J. Robot. Res. 5(3), 3–26 (1986)
I. Biederman: Human image understanding: recent research and a theory, Comput. Vis. Graphics Image Process. 32, 29–73 (1985)
R. Mohan, R. Nevatia: Perceptual organization for scene segmentation and description, IEEE Trans. Pattern Anal. Mach. Intell. 14(6), 616–635 (1992)
A. Zisserman, J.L. Mundy, D.A. Forsyth, J. Liu, N. Pillow, C. Rothwell, S. Utcke: Class-based grouping in perspective images, Int. Conf. Comput. Vis. (1995) pp. 183–188
R.C. Nelson, A. Selinger: Large-scale tests of a keyed, appearance-based 3d object recognition system, Vis. Res. special issue on computational vision 38, 15–16 (1998)
M.J. Tarr, H.H. Bülthoff: Image-based object recognition in man, monkey and machine. In: Object Recognition in Man, Monkey, and Machine, ed. by M. J. Tarr, H. H. Bülthoff (MIT Press, Cambridge 1998) pp. 1–20
M. Turk, A. Pentland: Eigenfaces for recognition, J. Cognit. Neurosci. 3, 71–86 (1991)
T. Poggio, S. Edelman: A neural network that learns to recognize three-dimensional object, Nature 343, 263–266 (1990)
H. Murase, S.K. Nayar: Visual learning and recognition of 3-d objects from appearance, Int. J. Comput. Vis. 14(1), 5–24 (1995)
R.P.N. Rao, D.H. Ballard: Object indexing using an iconic sparse distributed memory. Tech. Rep. TR559, University of Rochester (1995)
M.A. Fischler, R.A. Elschlager: The representation and matching of pictorial structure, IEEE Trans. Comput. 22, 67–92 (1973)
R.M. Haralick, L.G. Shapiro: Computer and Robot Vision (Addison-Wesley, Boston 1992)
T. Lindeberg: On the axiomatic foundations of linear scale-space: Combining semi-group structure with causality vs. scale invariance. In: Gaussian Scale-Space Theory: Proc. PhD School on Scale-Space Theory (Kluwer Academic, Dordrecht 1994)
D.G. Lowe: Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60(2), 91–110 (2004)
C. Schmid, R. Mohr: Local grayvalue invariants for image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 530–534 (1997)
K. Mikolajczyk, C. Schmid: An affine invariant interest point detector. In: Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, ed. by A. Heyden, G. Sparr, P. Johansen, M. Nielsen (Springer, Berlin, Heidelberg 2002) pp. 128–142
J. Matas, O. Chum, M. Urban, T. Pajdla: Robust wide baseline stereo from maximally stable extremal regions, Br. Mach. Vis. Conf. (2002)
K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool: A comparison of affine region detectors, Int. J. Comput. Vis. 65(1/2), 43–72 (2005)
H.P. Moravec: Towards automatic visual obstacle avoidance, IJCAI (1977) p. 584
C. Harris, M.J. Stephens: A combined corner and edge detector, Alvey Vision Conference (1988) pp. 147–152
W. Foerstner: On the geometric precision of digital correlation, Int. Arch. Photogram. Rem. Sens. (1982)
G. Granlund, J. Bigun: Optimal orientation detection of linear symmetry, Proc. IEEE 1st Int. Conf. Comput. Vis. (1987)
D.G. Lowe: Object recognition from local scale-invariant features, Proc. Int. Conf. Comput. Vis., Corfu (1999) pp. 1150–1157
T. Lindeberg: Feature detection with automatic scale selection, Int. J. Comput. Vis. 30(2), 79–116 (1998)
R. Sedgewick: Algorithms (2nd ed.) (Addison-Wesley, Boston 1988)
T. Kadir, J.M. Brady: Scale, salience and image description, Int. J. Comput. Vis. 45, 83–105 (2001)
S.S. Smith, J.M. Brady: Susan – a new approach to low level image processing, Int. J. Comput. Vis. 23, 45–78 (1997)
B. Leibe, B. Schiele, A. Leonardis: Combined object categorization and segmentation with an implicit shape model, Europ. Conf. Comp. Vision (2004)
B. Schiele, J.L. Crowley: Recognition without correspondence using multidimensional receptive field histograms, Int. J. Comput. Vis. 36(1), 31–50 (2000)
S. Agarwal, D. Roth: Learning a sparse representation for object detection, Proc. 7th Eur. Conf. Comput. Vis., Vol. 4 (2002) pp. 113–130
M.J. Swain, D.H. Ballard: Color indexing, Int. J. Comput. Vis. 7, 11–32 (1991)
H. Schneiderman, T. Kanade: A statistical method for 3d object detection applied to faces and cars, IEEE Conf. Comput. Vis. Pattern Recog. (2000)
O. Linde, T. Lindeberg: Object recognition using composed receptive field histograms of higher dimensionality, Proc. Int. Conf. Pattern Recog. (2004)
J.J. Koenderink, A.J. Van Doorn: The structure of locally orderless images, Int. J. Comput. Vis. 31(2-3), 159–168 (1999)
Y. Rubner, C. Tomasi: Perceptual Metrics for Image Database Navigation (Kluwer Academic, Dordrecht 2000)
J. Sivic, A. Zisserman: Video Google: A text retrieval approach to object matching in videos, Proc. 9th Int. Conf. Comput. Vis. (Nice 2003) pp. 1470–1477
R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval (Addison Wesley, Reading 1999)
D. Nister, H. Stewenius: Scalable recognition with a vocabulary tree, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog. (2006) pp. 2161–2168
K. Grauman, T. Darrell: Approximate correspondences in high dimensions, Adv. Neural Inform. Proc. Syst 19, 505–512 (2007)
J. Beis, D. Lowe: Shape indexing using approximate nearest-neighbor search in highdimensional spaces
P. Indyk, R. Motwani: Approximate nearest neighbors: towards removing the curse of dimensionality, Proc. 30th Ann. ACM Symp. Theory Comput. (1998) pp. 604–613
O. Drbohlav, D. Omercevic, A. Leonardis: High-dimensional feature matching: Employing the concept of meaningful nearest neighbors, Proc. 11th Int. Conf. Comput. Vis. (2007), in press
P.F. Felzenszwalb, D.P. Huttenlocher: Pictorial structures for object recognition, Int. J. Comput. Vis. 61(1), 55–79 (2005)
R. Fergus, P. Perona, A. Zisserman: Weakly supervised scale-invariant learning of models for visual recognition, Int. J. Comput. Vis. (2005)
J. Kosecka, F. Li: Vision based Markov localization, ICRA (2004)
J. Wolf, W. Burgard, H. Burkhardt: Using an image retrieval system for visionbased mobile robot localization (2002)
I. Ulrich, I. Nourbakhsh: Appearance-based place recognition for topological localization, Proc. ICRA, Vol. 2 (2000) pp. 1023–1029
A. Davison, D. Murray: Simultaneous localisation and map-building using active vision, IEEE Trans. Pattern Anal. Mach. Intell. 24, 865–880 (2002)
R. Bajcsy: Active perception, Proc. IEEE 76, 996–1005 (1988)
F.T. Ramos, B. Upcroft, S. Kumar, H.F. Durrant-Whyte: A Bayesian approach for place recognition, Int. Joint Conf. Artif. Intell. Workshop on Reasoning with Uncertainty in Robotics (RUR-05) (2005)
O. Faugeras: Three-dimensional Computer Vision (MIT Press, Cambridge 1993)
A. Akbarzadeh, J.-M. Frahm, P. Mordohai, B. Clipp, C. Engels, D. Gallup, P. Merell, M. Phels, S. Sinha, B. Talton, L. Wang, Q. Yang, H. Stewenius, R. Yang, G. Welch, H. Towles, D. Nister, M. Pollefeys: Towards urban 3D reconstruction from video, Third Int. Symp. on 3D Data Processing, Visualization, and Transmission (2006)
D. Nister: Preemptive ransac for live structure and motion estimation, Proc. Int. Conf. Comput. Vis. (2003) pp. 199–206
P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. Frahmand, R. Yang, D. Nister, M. Pollefeys: Real-time visibility-based fusion of depth maps, Int. Conf. Comput. Vis. (2007)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag
About this entry
Cite this entry
Daniilidis, K., Eklundh, JO. (2008). 3-D Vision and Recognition. In: Siciliano, B., Khatib, O. (eds) Springer Handbook of Robotics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30301-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-30301-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23957-4
Online ISBN: 978-3-540-30301-5
eBook Packages: EngineeringEngineering (R0)