Multi-view Object Categorization and Pose Estimation

Savarese, Silvio; Fei-Fei, Li

doi:10.1007/978-3-642-12848-6_8

Silvio Savarese⁴ &
Li Fei-Fei⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 285))

4110 Accesses
14 Citations

Abstract

Object and scene categorization has been a central topic of computer vision research in recent years. The problem is a highly challenging one. A single object may show tremendous variability in appearance and structure under various photometric and geometric conditions. In addition, members of the same class may differ from each other due to various degrees of intra-class variability. Recently, researchers have proposed new models towards the goal of: i) finding a suitable representation that can efficiently capture the intrinsic three-dimensional and multi-view nature of object categories; ii) taking advantage of this representation to help the recognition and categorization task. In this Chapter we will review recent approaches aimed at tackling this challenging problem and focus on the work by Savarese & Fei-Fei [54, 55]. In [54, 55] multi-view object models are obtained by linking together diagnostic parts of the objects from different viewing point. Instead of recovering a full 3D geometry, parts are connected through their mutual homographic transformation. The resulting model is a compact summarization of both the appearance and geometry information of the object class. We show that such a model can be learnt via minimal supervision compared to competitive techniques. The model can be used to detect objects under arbitrary and/or unseen poses by means of a two-step algorithm. This algorithm, inspired by works in single object view synthesis (e.g., Seitz & Dyer [57]), has the ability to synthesize object appearance and shape properties at recognition time, and in turn estimate the object pose that best matches the observations.We conclude this Chapter by presenting experiments on detection, recognition and pose estimation results with respect to two datasets in [54,55] as well as to PASCAL Visual Object Classes (VOC) dataset [15]. Experiments indicate that representation and algorithms presented in [54,55] can be successfully employed in a number of generic object recognition tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Matching Multiple Perspectives for Efficient Representation Learning

Pose-Aware Self-supervised Learning with Viewpoint Trajectory Regularization

A Unified Framework for Multi-view Multi-class Object Pose Estimation

References

The princeton shape benchmark. In: Proceedings of the Shape Modeling International, pp. 167–178 (2004)
Google Scholar
Arie-Nachimson, M., Basri, R.: Constructing implicit 3d shape models for pose estimation. In: Proceedings of the International Conference on Computer Vision (2009)
Google Scholar
Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition 13(2), 111–122 (1981)
Article MATH Google Scholar
Bart, E., Byvatov, E., Ullman, S.: View-invariant recognition using corresponding object fragments. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 152–165. Springer, Heidelberg (2004)
Google Scholar
Bowyer, K., Dyer, R.: Aspect graphs: An introduction and survey of recent results. International Journal of Imaging Systems and Technology 2(4), 315–328 (1990)
Article Google Scholar
Brown, M., Lowe, D.G.: Unsupervised 3d object recognition and reconstruction in unordered datasets. In: Proceedings of the Fifth International Conference on 3-D Digital Imaging and Modeling, pp. 56–63 (2005)
Google Scholar
Burl, M.C., Weber, M., Perona, P.: A probabilistic approach to object recognition using local photometry and global geometry. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, p. 628. Springer, Heidelberg (1998)
Chapter Google Scholar
Chen, S., Williams, L.: View interpolation for image synthesis. Computer Graphics 27, 279–288 (1993)
Google Scholar
Chiu, H.P., Kaelbling, L.P., Lozano-Perez, T.: Virtual training for multi-view object class recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Google Scholar
Cyr, C., Kimia, B.: A similarity-based aspect-graph approach to 3D object recognition. International Journal of Computer Vision 57(1), 5–22 (2004)
Article Google Scholar
Dance, C., Willamowski, J., Fan, L., Bray, C., Csurka, G.: Visual categorization with bags of keypoints. In: Proceedigs of the ECCV International Workshop on Statistical Learning in Computer Vision (2004)
Google Scholar
Dickinson, S.J., Pentland, A.P., Rosenfeld, A.: 3-d shape recovery using distributed aspect matching. IEEE Transaction on Pattern Analisys and Machine Intelligence 14(2), 174–198 (1992)
Article Google Scholar
Eggert, D., Bowyer, K.: Computing the perspective projection aspect graph of solids of revolution. IEEE Transaction on Pattern Analisys and Machine Intelligence 15(2), 109–128 (1993)
Article Google Scholar
Eggert, D., Bowyer, K., Dyer, C., Christensen, H., Goldgof, D.: The scale space aspect graph. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1114–1130 (1993)
Article Google Scholar
Everingham, M., et al.: The 2005 pascal visual object class challenge. In Proceedings of the 1st PASCAL Challenges Workshop (to appear)
Google Scholar
Farhadi, A., Tabrizi, J., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: Proceedings of the International Conference on Computer Vision (2009)
Google Scholar
Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. CVPR Short Course (2007)
Google Scholar
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 2066–2073 (2000)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 264–271 (2003)
Google Scholar
Ferrari, V., Tuytelaars, T., Van Gool, L.: Simultaneous object recognition and segmentation from single or multiple model views. Iternational Journal of Computer Vision (2006)
Google Scholar
Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM 24, 381–395 (1981)
Article MathSciNet Google Scholar
Frome, A., Huber, D., Kolluri, R., Bulow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 224–237. Springer, Heidelberg (2004)
Google Scholar
Fulkerson, B., Vedaldi, A., Soatto, S.: Class Segmentation and Object Localization with Superpixel Neighborhoods. In: Proceedings of the International Conference on Computer Vision (2009)
Google Scholar
Grimson, W., Lozano-Perez, T.: Recognition and localization of overlapping parts in two and three dimensions. In: Proceedings of the International Conference on Robotics and Automation, pp. 61–66 (1985)
Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
MATH Google Scholar
Hetzel, G., Leibe, B., Levi, P., Schiele, B.: 3d object recognition from range images using local feature histograms. IEEE Transactions on Pattern Analysis and Machine Intelligence 2 (2001)
Google Scholar
Hoeim, D., Rother, C., Winn, J.: 3d layout crf for multi-view object class recognition and segmentation. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)
Google Scholar
Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 5 (1999)
Google Scholar
Kadir, T., Brady, M.: Scale, saliency and image description. International Journal of Computer Vision 45(2), 83–105 (2001)
Article MATH Google Scholar
Kazhdan, M., Funkhouser, T., Rusinkiewicz, S.: Rotation invariant spherical harmonic representation of 3d shape descriptors. In: Proceedings of the Symposium on Geometry Processing (2003)
Google Scholar
Koenderink, J., van Doorn, A.: The singularities of the visual mappings. Biological Cybernetics 24(1), 51–59 (1976)
Article MATH Google Scholar
Koenderink, J.J., van Doorn, A.J.: The internal representation of solid shape with respect to vision. Biological cybernetics 32(4), 211–216 (1979)
Article MATH Google Scholar
Kushal, A., Schmid, C., Ponce, J.: Flexible object models for category-level 3d object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Semi-local affine parts for object recognition. In: Proceedings of the British Machine Vision Conference, vol. 2, pp. 959–968 (2004)
Google Scholar
Leibe, B., Schiele, B.: Scale Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 145–153. Springer, Heidelberg (2004)
Google Scholar
Li, X., Guskov, I., Barhak, J.: Feature-based alignment of range scan data to cad model. International Journal of Shape Modeling 13, 1–23 (2007)
Article MATH MathSciNet Google Scholar
Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3d feature maps. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, pp. 1150–1157 (1999)
Google Scholar
Lowe, D.G.: Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence 31, 355–395 (1987)
Article Google Scholar
Lowe, D.G.: Local feature view clustering for 3d object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2001)
Google Scholar
Marr, D.: Vision: A computational investigation into the human representation and processing of visual information. Freeman, New York (1982)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, pp. 384–393 (2002)
Google Scholar
Mei, L., Sun, M., Carter, K., Hero, A., Savarese, S.: Object pose classification from short video sequences. In: Proceedings of the British Machine Vision Conference (2009)
Google Scholar
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Chapter Google Scholar
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63–86 (2004)
Article Google Scholar
Murase, H., Nayar, S.K.: Learning by a generation approach to appearance-based object recognition. In: Proceedings of the International Conference on Pattern Recognition (1996)
Google Scholar
Nayar, S.K., Nene, S.A., Murase, H.: Real-time 100 object recognition system. In: Proceedings of the International Conference on Robotics and Automation, pp. 2321–2325 (1996)
Google Scholar
Ng, J., Gong, S.: Multi-view face detection and pose estimation using a composite support vector machine across the view sphere. In: Proceedings of the International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (1999)
Google Scholar
Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision 66(3), 231–259 (2006)
Article Google Scholar
Rothwell, C.A., Zisserman, A., Forsyth, D.A., Mundy, J.L., Joseph, L.: Canonical frames for planar object recognition. In: Sandini, G. (ed.) ECCV 1992. LNCS, vol. 588. Springer, Heidelberg (1992)
Google Scholar
Ruiz-Correa, S., Shapiro, L., Meila, M.: A new signature-based method for efficient 3-d object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2001)
Google Scholar
Russell, B., Torralba, A., Murphy, K., Freeman, W.: Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision (in press)
Google Scholar
Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: Proceedings of the International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Savarese, S., Fei-Fei, L.: View synthesis for recognizing unseen poses of object classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 602–615. Springer, Heidelberg (2008)
Chapter Google Scholar
Schneiderman, H., Kanade, T.: A statistical approach to 3D object detection applied to faces and cars. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 746–751 (2000)
Google Scholar
Seitz, S., Dyer, C.: View morphing. In: Proceedings of the ACM SIGGRAPH, pp. 21–30 (1996)
Google Scholar
Shimshoni, I., Ponce, J.: Finite-resolution aspect graphs of polyhedral objects. IEEE Transaction on Pattern Analysis Machine Intelligence 19(4), 315–327 (1997)
Article Google Scholar
Stewman, J., Bowyer, K.: Learning graph matching. In: Proceedings of the International Conference on Computer Vision, pp. 494–500 (1988)
Google Scholar
Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: Proceedings of International Conference on Computer Vision (2009)
Google Scholar
Sun, M., Su, H., Savarese, S., Fei-Fei, L.: A multi-view probabilistic model for 3d object classes. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Tangelder, J.W.H., Veltkamp, R.C.: A survey of content based 3d shape retrieval methods. In: Proceedings of Shape Modeling Applications, pp. 145–156 (2004)
Google Scholar
Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., Van Gool, L.: Towards multi-view object class detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 1589–1596 (2006)
Google Scholar
Torralba, A., Murphy, K., Freeman, W.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2004)
Google Scholar
Ullman, S., Basri, R.: Recognition by linear combination of models. Technical Report, Cambridge, MA, USA (1989)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 511–518 (2001)
Google Scholar
Weber, M., Einhäuser, W., Welling, M., Perona, P.: Viewpoint-invariant learning and detection of human heads. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (2000)
Google Scholar
Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 101–108. Springer, Heidelberg (2000)
Chapter Google Scholar
Xiao, J., Chen, J., Yeung, D.Y., Quan, L.: Structuring visual words in 3d for arbitrary-view object localization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 725–737. Springer, Heidelberg (2008)
Chapter Google Scholar
Yan, P., Khan, D., Shah, M.: 3d model based object class detection in an arbitrary view. In: Proceedings of the International Conference on Computer Vision (2007)
Google Scholar
Yan Li Leon Gu, T.K.: A robust shape model for multi-view car alignment. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Zhang, Z.: Floatboost learning and statistical face detection. IEEE Transaction on Pattern Analysis Machine Intelligence 26(9), 1112–1123 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electical Engineering, University of Michigan at Ann Arbor, USA
Silvio Savarese
Department of Computer Science, Stanford University, USA
Li Fei-Fei

Authors

Silvio Savarese
View author publications
You can also search for this author in PubMed Google Scholar
Li Fei-Fei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Engineering, University of Cambridge, CB2 1PZ, Cambridge, UK
Roberto Cipolla
Dipartimento di Matematica ed Informatica, University of Catania, Viale A. Doria 6, I, 95125, Catania, Italy
Sebastiano Battiato & Giovanni Maria Farinella &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Savarese, S., Fei-Fei, L. (2010). Multi-view Object Categorization and Pose Estimation. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds) Computer Vision. Studies in Computational Intelligence, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12848-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-12848-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12847-9
Online ISBN: 978-3-642-12848-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics