3-D Vision and Recognition

Daniilidis, Kostas; Eklundh, Jan-Olof

doi:10.1007/978-3-540-30301-5_24

Kostas Daniilidis Prof³ &
Jan-Olof Eklundh Prof⁴

60k Accesses
5 Citations

Abstract

In this chapter, we describe methods to be applied on a robot equipped with one or more camera sensors. Our goal is to present representations and models for both three-dimensional (3-D) motion and structure estimation as well as recognition. We do not delve into estimation and inference issues since these are extensively treated in other chapters. The same applies to the fusion with other sensors, which we heavily encourage but do not describe here.

In the first part we describe the main methods in 3-D inference from two-dimensional (2-D) images. We are at the point where we could propose a recipe, at least for a small spatial extent. If we are able to track a few visual features in our images, we are able to estimate the self-motion of the robot as well as its pose with respect to any known landmark. Having solutions for minimal case problems, the obvious way here is to apply random sample consensus. If no known 3-D landmark is given then the trajectory of the camera exhibits drift. From the trajectory of the camera, time windows over several frames are selected and a 3-D dense depth map is obtained through solving the stereo problem. Large-scale reconstructions based on camera only do raise challenges with respect to drift and loop closing.

In the second part we deal with recognition as appealed to robotics. The main challenge here is to detect an instance of an object and recognize or categorize it. Since in robotics applications an object of interest always resides in a cluttered environment any algorithm has to be insensitive to missing parts of the object of interest and outliers. The dominant paradigm is based on matching the appearance of pictures. Features are detected and quantized into visual words. Similarity is based on the difference between histograms of such visual words. Recognition has a long way to go but robotics provides the opportunity to explore an object and be active in the recognition process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 309.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

DOG:: difference of Gaussian
GPS:: global positioning system
HMM:: hidden Markov model
IMU:: inertial measurement units
MAP:: maximum a posteriori probability
MSER:: maximally stable extremal regions
RANSAC:: random sample consensus
RBC:: recognition-by-components
RGB:: red, green, blue
SIFT:: scale-invariant feature transformation
SLAM:: simultaneous localization and mapping
SVD:: singular value decomposition
SfM:: structure from motion
TF-IDF:: term-frequency inverse document frequency

References

Z. Zhang: A flexible new technique for camera calibration, IEEE Trans. Pattern Anal. Mach. Intell. 22, 1330–1334 (2000)
Article Google Scholar
M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, R. Koch: Visual modeling with a hand-held camera, Int. J. Comput. Vis. 59, 207–232 (2004)
Article Google Scholar
M. Pollefeys, L. Van Gool: Stratified self-calibration with the modulus constraint, IEEE Trans. Pattern Anal. Mach. Intell. 21, 707–724 (1999)
Article Google Scholar
O. Faugeras, Q.-T. Luong, T. Papadopoulo: The Geometry of Multiple Images (MIT Press, Cambridge 2001)
MATH Google Scholar
R. Hartley, A. Zisserman: Multiple View Geometry (Cambridge Univ. Press, Cambridge 2000)
MATH Google Scholar
K. Ottenberg, R.M. Haralick, C.-N. Lee, M. Nolle: Review and analysis of solutions of the three-point perspective problem, Int. J. Comput. Vis. 13, 331–356 (1994)
Article Google Scholar
M.A. Fischler, R.C. Bolles: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM 24, 381–395 (1981)
Article MathSciNet Google Scholar
R. Kumar, A.R. Hanson: Robust methods for estimaging pose and a sensitivity analysis, Comput. Vis. Image Underst. 60, 313–342 (1994)
Article Google Scholar
C.-P. Lu, G. Hager, E. Mjolsness: Fast and globally convergent pose estimation from video images, IEEE Trans. Pattern Anal. Mach. Intell. 22, 610–622 (2000)
Article Google Scholar
L. Quan, Z. Lan: Linear n-point camera pose determination, IEEE Trans. Pattern Anal. Mach. Intell. 21, 774–780 (1999)
Article Google Scholar
A. Ansar, K. Daniilidis: Linear pose estimation from points and lines, IEEE Trans. Pattern Anal. Mach. Intell. 25, 578–589 (2003)
Article Google Scholar
R.I. Hartley, P. Sturm: Triangulation. Computer Vision and Image Understanding (1997)
Google Scholar
B.K.P. Horn, H.M. Hilden, S. Negahdaripour: Closed-form solution of absolute orientation using orthonormal matrices, J. Opt. Soc. Am. A A5, 1127–1135 (1988)
Article MathSciNet Google Scholar
G.H. Golub, C.F. van Loan: Matrix Computations (The Johns Hopkins Univ. Press, Baltimore 1983)
MATH Google Scholar
A.J. Davison, I.D. Reid, N.D. Molton, O. Stasse: Monoslam: Real-time single camera slam, IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Article Google Scholar
T.S. Huang, O.D. Faugeras: Some properties of the e matrix in two-view motion estimation, IEEE Trans. Pattern Anal. Mach. Intell. 11, 1310–1312 (1989)
Article Google Scholar
D. Nister: An efficient solution for the five-point relative pose problem, IEEE Trans. Pattern Anal. Mach. Intell. 26, 756–777 (2004)
Article Google Scholar
S. Maybank: Theory of Reconstruction from Image Motion (Springer, Berlin, Heidelberg 1993)
MATH Google Scholar
S.J. Maybank: The projective geometry of ambiguous surfaces, Philos. Trans. R. Soc. London A 332(1623), 1–47 (1990)
Article MathSciNet Google Scholar
A. Jepson, D.J. Heeger: A fast subspace algorithm for recovering rigid motion, Proc. IEEE Workshop on Visual Motion (Princeton 1991) pp. 124–131
Google Scholar
C. Fermüller, Y. Aloimonos: Algorithmic independent instability of structure from motion, Proc. 5th Eur. Conf. Comput. Vis. (Freiburg 1998)
Google Scholar
K. Daniilidis, M. Spetsakis: Understanding noise sensitivity in structure from motion. In: Visual Navigation, ed. by Y. Aloimonos. (Lawrence Erlbaum, Hillsdale 1996), pp.61–88
Google Scholar
S.R. Soatto Brockett: Optimal structure from motion: Local ambiguities and global estimates, IEEE Conf. Comput. Vis. Pattern Recog. (Santa Barbara 1998)
Google Scholar
J. Oliensis: A new structure-from-motion ambiguity, IEEE Trans. Pattern Anal. Mach. Intell. 22, 685–700 (1999)
Article Google Scholar
Y. Ma, K. Huang, R. Vidal, J. Kosecka, S. Sastry: Rank conditions of the multiple view matrix, Int. J. Comput. Vis. 59(2), 115–137 (2004)
Article Google Scholar
Y. Ma, S. Soatto, J. Kosecka, S. Sastry: An Invitation to 3-D Vision (Springer, Berlin, Heidelberg 2003)
Google Scholar
W. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon: Bundle adjustment for structure from motion (Springer Verlag 2000) pp. 298–375
Google Scholar
M. Lourakis, A. Argyros: The design and implementation of a generic sparse bundle adjustment software package based on the Levenberg–Marquard method. Technical Report 340, ICS/FORTH (2004)
Google Scholar
S. Teller, M. Antone, Z. Bodnar, M. Bosse, S. Coorg: Calibrated, registered images of an extended urban area, Int. Conf. Comput. Vis. Pattern Recogn., Vol. 1 (Kanai 2001) pp. 813–820
Google Scholar
E. Trucco, A. Verri: Introductory Techniques for 3-D Computer Vision (Prentice Hall, Upper Saddle River 1998)
Google Scholar
S.S. Intille, A.F. Bobick: Disparity-space images and large occlusion stereo, ECCV 2, 179–186 (1994)
Google Scholar
R. Szeliski, D. Scharstein: Sampling the disparity space image, IEEE Trans. Pattern Anal. Mach. Intell. 26(3), 419–425 (2004)
Article Google Scholar
R. Yang, M. Pollefeys, G. Welch: Dealing with textureless regions and specular highlights: A progressive space carving scheme using a novel photo-consistency measure, Proc. Int. Conf. Comput. Vis. (2003)
Google Scholar
X. Zabulis, A. Patterson, K. Daniilidis: Digitizing archaeological excavations from multiple monocular views, 5th Int. Conf. 3-D Digital Imag. Mod. (2005)
Google Scholar
R.T. Collins: A space-sweep approach to true multi-image matching, IEEE Conf. Comput. Vis. Pattern Recog. (San Fransisco 1996) pp. 358–363
Google Scholar
T. Kanade, M. Okutomi: A stereo matching algorithm with an adaptive window: Theory and experiment, IEEE Trans. Pattern Anal. Mach. Intell. 16(9), 920–932 (1994)
Article Google Scholar
D. Scharstein, R. Szeliski: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, Int. J. Comput. Vis. 47(1/2/3), 7–42 (2002)
Article MATH Google Scholar
H. Hirschmuller: Stereo vision in structured environments by consistent semi-global matching, Comput. Vis. Pattern Recog. 02, 2386–2393 (2006)
Google Scholar
O. Veksler: Stereo correspondence by dynamic programming on a tree, Comput. Vis. Pattern Recog. 2, 384–390 (2005)
Google Scholar
S. Roy, I. Cox: A maximum-flow formulation of the N-camera stereo correspondence problem, Proc. Int. Conf. Comput. Vis. (1998)
Google Scholar
V. Kolmogorov, R. Zabih: Computing visual correspondence with occlusions using graph cuts, Int. Conf. Comput. Vis. 02, 508 (2001)
Google Scholar
H.-Y. Shum, J. Sun, N.-N. Zheng: Stereo matching using belief propagation, IEEE Trans. Pattern Anal. Mach. Intell. 25, 787–800 (2003)
Article Google Scholar
L. Zhang, S.M. Seitz: Estimating optimal parameters for mrf stereo from a single image pair, IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 331–342 (2007)
Article Google Scholar
P.F. Felzenszwalb, D.P. Huttenlocher: Efficient belief propagation for early vision, Comput. Vis. Pattern Recog. 01, 261–268 (2004)
Google Scholar
H. Hirschmuller: Accurate and efficient stereo processing by semi-global matching and mutual information, Comput. Vis. Pattern Recog. 2, 807–814 (2005)
Google Scholar
S.M. Seitz, B. Curless, J. Diebel, D. Scharstein, R. Szeliski: A comparison and evaluation of multi-view stereo reconstruction algorithms, Comput. Vis. Pattern Recog. 1, 519–528 (2006)
Google Scholar
C.R. Dyer: Volumetric scene reconstruction from multiple views. In: Foundations of Image Understanding, ed. by L. Davis (Kluwer, Boston 2001) pp. 469–489
Google Scholar
D.A. Forsyth, J. Ponce: Computer Vision: A Modern Approach, Prentice Hall Professional Technical Reference (Prentice Hall, Upper Saddle River 2002)
Google Scholar
L. Fei Fei, R. Fergus, A. Torralba: Recognizing and learning object categories, Short course given at CVPR 2007 (2007)
Google Scholar
A. Pinz: Object categorization, Foundations and Trends in Computer Graphics and Vision 1(4), 255–353 (2005)
Article Google Scholar
A. Guzman: Decomposition of a visual scene into three-dimensional bodies. In: Automatic Interpretation and Classification of Images, ed. by A. Grasseli (Academic, New York 1965)
Google Scholar
T.O. Binford: Visual perception by computer, Proc. IEEE Conf. Syst. Contr. (Miami 1971)
Google Scholar
R. Brooks: Model-Based Computer Vision (Kluwer Academic, Dordrecht 1984)
Google Scholar
D. Marr, K. Nishihara: Representation and recognition of the spatial organization of three-dimensional shapes, Proc. R. Soc. London B 200, 269–294 (1978)
Article Google Scholar
D. Marr: Vision (Freeman, New York 1990)
Google Scholar
O.D. Faugeras, M. Hebert: The representation, recognition, and localization of 3D objects, Int. J. Rob. Res. 5(3), 27–52 (1986)
Article Google Scholar
R.C. Bolles, P. Horaud: 3dpo: A three-dimensional part orientation system, Int. J. Robot. Res. 5(3), 3–26 (1986)
Article Google Scholar
I. Biederman: Human image understanding: recent research and a theory, Comput. Vis. Graphics Image Process. 32, 29–73 (1985)
Article Google Scholar
R. Mohan, R. Nevatia: Perceptual organization for scene segmentation and description, IEEE Trans. Pattern Anal. Mach. Intell. 14(6), 616–635 (1992)
Article Google Scholar
A. Zisserman, J.L. Mundy, D.A. Forsyth, J. Liu, N. Pillow, C. Rothwell, S. Utcke: Class-based grouping in perspective images, Int. Conf. Comput. Vis. (1995) pp. 183–188
Google Scholar
R.C. Nelson, A. Selinger: Large-scale tests of a keyed, appearance-based 3d object recognition system, Vis. Res. special issue on computational vision 38, 15–16 (1998)
Google Scholar
M.J. Tarr, H.H. Bülthoff: Image-based object recognition in man, monkey and machine. In: Object Recognition in Man, Monkey, and Machine, ed. by M. J. Tarr, H. H. Bülthoff (MIT Press, Cambridge 1998) pp. 1–20
Google Scholar
M. Turk, A. Pentland: Eigenfaces for recognition, J. Cognit. Neurosci. 3, 71–86 (1991)
Article Google Scholar
T. Poggio, S. Edelman: A neural network that learns to recognize three-dimensional object, Nature 343, 263–266 (1990)
Article Google Scholar
H. Murase, S.K. Nayar: Visual learning and recognition of 3-d objects from appearance, Int. J. Comput. Vis. 14(1), 5–24 (1995)
Article Google Scholar
R.P.N. Rao, D.H. Ballard: Object indexing using an iconic sparse distributed memory. Tech. Rep. TR559, University of Rochester (1995)
Google Scholar
M.A. Fischler, R.A. Elschlager: The representation and matching of pictorial structure, IEEE Trans. Comput. 22, 67–92 (1973)
Article Google Scholar
R.M. Haralick, L.G. Shapiro: Computer and Robot Vision (Addison-Wesley, Boston 1992)
Google Scholar
T. Lindeberg: On the axiomatic foundations of linear scale-space: Combining semi-group structure with causality vs. scale invariance. In: Gaussian Scale-Space Theory: Proc. PhD School on Scale-Space Theory (Kluwer Academic, Dordrecht 1994)
Google Scholar
D.G. Lowe: Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
C. Schmid, R. Mohr: Local grayvalue invariants for image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 530–534 (1997)
Article Google Scholar
K. Mikolajczyk, C. Schmid: An affine invariant interest point detector. In: Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, ed. by A. Heyden, G. Sparr, P. Johansen, M. Nielsen (Springer, Berlin, Heidelberg 2002) pp. 128–142
Google Scholar
J. Matas, O. Chum, M. Urban, T. Pajdla: Robust wide baseline stereo from maximally stable extremal regions, Br. Mach. Vis. Conf. (2002)
Google Scholar
K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool: A comparison of affine region detectors, Int. J. Comput. Vis. 65(1/2), 43–72 (2005)
Article Google Scholar
H.P. Moravec: Towards automatic visual obstacle avoidance, IJCAI (1977) p. 584
Google Scholar
C. Harris, M.J. Stephens: A combined corner and edge detector, Alvey Vision Conference (1988) pp. 147–152
Google Scholar
W. Foerstner: On the geometric precision of digital correlation, Int. Arch. Photogram. Rem. Sens. (1982)
Google Scholar
G. Granlund, J. Bigun: Optimal orientation detection of linear symmetry, Proc. IEEE 1st Int. Conf. Comput. Vis. (1987)
Google Scholar
D.G. Lowe: Object recognition from local scale-invariant features, Proc. Int. Conf. Comput. Vis., Corfu (1999) pp. 1150–1157
Google Scholar
T. Lindeberg: Feature detection with automatic scale selection, Int. J. Comput. Vis. 30(2), 79–116 (1998)
Article Google Scholar
R. Sedgewick: Algorithms (2nd ed.) (Addison-Wesley, Boston 1988)
Google Scholar
T. Kadir, J.M. Brady: Scale, salience and image description, Int. J. Comput. Vis. 45, 83–105 (2001)
Article MATH Google Scholar
S.S. Smith, J.M. Brady: Susan – a new approach to low level image processing, Int. J. Comput. Vis. 23, 45–78 (1997)
Article Google Scholar
B. Leibe, B. Schiele, A. Leonardis: Combined object categorization and segmentation with an implicit shape model, Europ. Conf. Comp. Vision (2004)
Google Scholar
B. Schiele, J.L. Crowley: Recognition without correspondence using multidimensional receptive field histograms, Int. J. Comput. Vis. 36(1), 31–50 (2000)
Article Google Scholar
S. Agarwal, D. Roth: Learning a sparse representation for object detection, Proc. 7th Eur. Conf. Comput. Vis., Vol. 4 (2002) pp. 113–130
Google Scholar
M.J. Swain, D.H. Ballard: Color indexing, Int. J. Comput. Vis. 7, 11–32 (1991)
Article Google Scholar
H. Schneiderman, T. Kanade: A statistical method for 3d object detection applied to faces and cars, IEEE Conf. Comput. Vis. Pattern Recog. (2000)
Google Scholar
O. Linde, T. Lindeberg: Object recognition using composed receptive field histograms of higher dimensionality, Proc. Int. Conf. Pattern Recog. (2004)
Google Scholar
J.J. Koenderink, A.J. Van Doorn: The structure of locally orderless images, Int. J. Comput. Vis. 31(2-3), 159–168 (1999)
Article Google Scholar
Y. Rubner, C. Tomasi: Perceptual Metrics for Image Database Navigation (Kluwer Academic, Dordrecht 2000)
Google Scholar
J. Sivic, A. Zisserman: Video Google: A text retrieval approach to object matching in videos, Proc. 9th Int. Conf. Comput. Vis. (Nice 2003) pp. 1470–1477
Google Scholar
R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval (Addison Wesley, Reading 1999)
Google Scholar
D. Nister, H. Stewenius: Scalable recognition with a vocabulary tree, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog. (2006) pp. 2161–2168
Google Scholar
K. Grauman, T. Darrell: Approximate correspondences in high dimensions, Adv. Neural Inform. Proc. Syst 19, 505–512 (2007)
Google Scholar
J. Beis, D. Lowe: Shape indexing using approximate nearest-neighbor search in highdimensional spaces
Google Scholar
P. Indyk, R. Motwani: Approximate nearest neighbors: towards removing the curse of dimensionality, Proc. 30th Ann. ACM Symp. Theory Comput. (1998) pp. 604–613
Google Scholar
O. Drbohlav, D. Omercevic, A. Leonardis: High-dimensional feature matching: Employing the concept of meaningful nearest neighbors, Proc. 11th Int. Conf. Comput. Vis. (2007), in press
Google Scholar
P.F. Felzenszwalb, D.P. Huttenlocher: Pictorial structures for object recognition, Int. J. Comput. Vis. 61(1), 55–79 (2005)
Article Google Scholar
R. Fergus, P. Perona, A. Zisserman: Weakly supervised scale-invariant learning of models for visual recognition, Int. J. Comput. Vis. (2005)
Google Scholar
J. Kosecka, F. Li: Vision based Markov localization, ICRA (2004)
Google Scholar
J. Wolf, W. Burgard, H. Burkhardt: Using an image retrieval system for visionbased mobile robot localization (2002)
Google Scholar
I. Ulrich, I. Nourbakhsh: Appearance-based place recognition for topological localization, Proc. ICRA, Vol. 2 (2000) pp. 1023–1029
Google Scholar
A. Davison, D. Murray: Simultaneous localisation and map-building using active vision, IEEE Trans. Pattern Anal. Mach. Intell. 24, 865–880 (2002)
Article Google Scholar
R. Bajcsy: Active perception, Proc. IEEE 76, 996–1005 (1988)
Article Google Scholar
F.T. Ramos, B. Upcroft, S. Kumar, H.F. Durrant-Whyte: A Bayesian approach for place recognition, Int. Joint Conf. Artif. Intell. Workshop on Reasoning with Uncertainty in Robotics (RUR-05) (2005)
Google Scholar
O. Faugeras: Three-dimensional Computer Vision (MIT Press, Cambridge 1993)
Google Scholar
A. Akbarzadeh, J.-M. Frahm, P. Mordohai, B. Clipp, C. Engels, D. Gallup, P. Merell, M. Phels, S. Sinha, B. Talton, L. Wang, Q. Yang, H. Stewenius, R. Yang, G. Welch, H. Towles, D. Nister, M. Pollefeys: Towards urban 3D reconstruction from video, Third Int. Symp. on 3D Data Processing, Visualization, and Transmission (2006)
Google Scholar
D. Nister: Preemptive ransac for live structure and motion estimation, Proc. Int. Conf. Comput. Vis. (2003) pp. 199–206
Google Scholar
P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. Frahmand, R. Yang, D. Nister, M. Pollefeys: Real-time visibility-based fusion of depth maps, Int. Conf. Comput. Vis. (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science GRASP Laboratory, University of Pennsylvania, 3330 Walnut Street, 19104, Philadelphia, PA, USA
Kostas Daniilidis Prof
KTH Royal Institute of Technology, Teknikringen 14, 10044, Stockholm, Sweden
Jan-Olof Eklundh Prof

Authors

Kostas Daniilidis Prof
View author publications
You can also search for this author in PubMed Google Scholar
Jan-Olof Eklundh Prof
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kostas Daniilidis Prof or Jan-Olof Eklundh Prof .

Editor information

Editors and Affiliations

PRISMA Lab, Dipartimento di Informatica e Sistemistica, Universitá degli Studi di Napoli Federico II, 80125, Napoli, Italy, Via Claudio 21
Bruno Siciliano Prof.
Artificial Intelligence Laboratory, Department of Computer Science, Stanford University, 94305-9010, Stanford, CA, USA
Oussama Khatib Prof.

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Daniilidis, K., Eklundh, JO. (2008). 3-D Vision and Recognition. In: Siciliano, B., Khatib, O. (eds) Springer Handbook of Robotics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30301-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-540-30301-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23957-4
Online ISBN: 978-3-540-30301-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics