Abstract
Object and scene categorization has been a central topic of computer vision research in recent years. The problem is a highly challenging one. A single object may show tremendous variability in appearance and structure under various photometric and geometric conditions. In addition, members of the same class may differ from each other due to various degrees of intra-class variability. Recently, researchers have proposed new models towards the goal of: i) finding a suitable representation that can efficiently capture the intrinsic three-dimensional and multi-view nature of object categories; ii) taking advantage of this representation to help the recognition and categorization task. In this Chapter we will review recent approaches aimed at tackling this challenging problem and focus on the work by Savarese & Fei-Fei [54, 55]. In [54, 55] multi-view object models are obtained by linking together diagnostic parts of the objects from different viewing point. Instead of recovering a full 3D geometry, parts are connected through their mutual homographic transformation. The resulting model is a compact summarization of both the appearance and geometry information of the object class. We show that such a model can be learnt via minimal supervision compared to competitive techniques. The model can be used to detect objects under arbitrary and/or unseen poses by means of a two-step algorithm. This algorithm, inspired by works in single object view synthesis (e.g., Seitz & Dyer [57]), has the ability to synthesize object appearance and shape properties at recognition time, and in turn estimate the object pose that best matches the observations.We conclude this Chapter by presenting experiments on detection, recognition and pose estimation results with respect to two datasets in [54,55] as well as to PASCAL Visual Object Classes (VOC) dataset [15]. Experiments indicate that representation and algorithms presented in [54,55] can be successfully employed in a number of generic object recognition tasks.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
The princeton shape benchmark. In: Proceedings of the Shape Modeling International, pp. 167–178 (2004)
Arie-Nachimson, M., Basri, R.: Constructing implicit 3d shape models for pose estimation. In: Proceedings of the International Conference on Computer Vision (2009)
Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition 13(2), 111–122 (1981)
Bart, E., Byvatov, E., Ullman, S.: View-invariant recognition using corresponding object fragments. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 152–165. Springer, Heidelberg (2004)
Bowyer, K., Dyer, R.: Aspect graphs: An introduction and survey of recent results. International Journal of Imaging Systems and Technology 2(4), 315–328 (1990)
Brown, M., Lowe, D.G.: Unsupervised 3d object recognition and reconstruction in unordered datasets. In: Proceedings of the Fifth International Conference on 3-D Digital Imaging and Modeling, pp. 56–63 (2005)
Burl, M.C., Weber, M., Perona, P.: A probabilistic approach to object recognition using local photometry and global geometry. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, p. 628. Springer, Heidelberg (1998)
Chen, S., Williams, L.: View interpolation for image synthesis. Computer Graphics 27, 279–288 (1993)
Chiu, H.P., Kaelbling, L.P., Lozano-Perez, T.: Virtual training for multi-view object class recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Cyr, C., Kimia, B.: A similarity-based aspect-graph approach to 3D object recognition. International Journal of Computer Vision 57(1), 5–22 (2004)
Dance, C., Willamowski, J., Fan, L., Bray, C., Csurka, G.: Visual categorization with bags of keypoints. In: Proceedigs of the ECCV International Workshop on Statistical Learning in Computer Vision (2004)
Dickinson, S.J., Pentland, A.P., Rosenfeld, A.: 3-d shape recovery using distributed aspect matching. IEEE Transaction on Pattern Analisys and Machine Intelligence 14(2), 174–198 (1992)
Eggert, D., Bowyer, K.: Computing the perspective projection aspect graph of solids of revolution. IEEE Transaction on Pattern Analisys and Machine Intelligence 15(2), 109–128 (1993)
Eggert, D., Bowyer, K., Dyer, C., Christensen, H., Goldgof, D.: The scale space aspect graph. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1114–1130 (1993)
Everingham, M., et al.: The 2005 pascal visual object class challenge. In Proceedings of the 1st PASCAL Challenges Workshop (to appear)
Farhadi, A., Tabrizi, J., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: Proceedings of the International Conference on Computer Vision (2009)
Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. CVPR Short Course (2007)
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 2066–2073 (2000)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 264–271 (2003)
Ferrari, V., Tuytelaars, T., Van Gool, L.: Simultaneous object recognition and segmentation from single or multiple model views. Iternational Journal of Computer Vision (2006)
Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM 24, 381–395 (1981)
Frome, A., Huber, D., Kolluri, R., Bulow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 224–237. Springer, Heidelberg (2004)
Fulkerson, B., Vedaldi, A., Soatto, S.: Class Segmentation and Object Localization with Superpixel Neighborhoods. In: Proceedings of the International Conference on Computer Vision (2009)
Grimson, W., Lozano-Perez, T.: Recognition and localization of overlapping parts in two and three dimensions. In: Proceedings of the International Conference on Robotics and Automation, pp. 61–66 (1985)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Hetzel, G., Leibe, B., Levi, P., Schiele, B.: 3d object recognition from range images using local feature histograms. IEEE Transactions on Pattern Analysis and Machine Intelligence 2 (2001)
Hoeim, D., Rother, C., Winn, J.: 3d layout crf for multi-view object class recognition and segmentation. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)
Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 5 (1999)
Kadir, T., Brady, M.: Scale, saliency and image description. International Journal of Computer Vision 45(2), 83–105 (2001)
Kazhdan, M., Funkhouser, T., Rusinkiewicz, S.: Rotation invariant spherical harmonic representation of 3d shape descriptors. In: Proceedings of the Symposium on Geometry Processing (2003)
Koenderink, J., van Doorn, A.: The singularities of the visual mappings. Biological Cybernetics 24(1), 51–59 (1976)
Koenderink, J.J., van Doorn, A.J.: The internal representation of solid shape with respect to vision. Biological cybernetics 32(4), 211–216 (1979)
Kushal, A., Schmid, C., Ponce, J.: Flexible object models for category-level 3d object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)
Lazebnik, S., Schmid, C., Ponce, J.: Semi-local affine parts for object recognition. In: Proceedings of the British Machine Vision Conference, vol. 2, pp. 959–968 (2004)
Leibe, B., Schiele, B.: Scale Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 145–153. Springer, Heidelberg (2004)
Li, X., Guskov, I., Barhak, J.: Feature-based alignment of range scan data to cad model. International Journal of Shape Modeling 13, 1–23 (2007)
Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3d feature maps. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2008)
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, pp. 1150–1157 (1999)
Lowe, D.G.: Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence 31, 355–395 (1987)
Lowe, D.G.: Local feature view clustering for 3d object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2001)
Marr, D.: Vision: A computational investigation into the human representation and processing of visual information. Freeman, New York (1982)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, pp. 384–393 (2002)
Mei, L., Sun, M., Carter, K., Hero, A., Savarese, S.: Object pose classification from short video sequences. In: Proceedings of the British Machine Vision Conference (2009)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63–86 (2004)
Murase, H., Nayar, S.K.: Learning by a generation approach to appearance-based object recognition. In: Proceedings of the International Conference on Pattern Recognition (1996)
Nayar, S.K., Nene, S.A., Murase, H.: Real-time 100 object recognition system. In: Proceedings of the International Conference on Robotics and Automation, pp. 2321–2325 (1996)
Ng, J., Gong, S.: Multi-view face detection and pose estimation using a composite support vector machine across the view sphere. In: Proceedings of the International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (1999)
Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision 66(3), 231–259 (2006)
Rothwell, C.A., Zisserman, A., Forsyth, D.A., Mundy, J.L., Joseph, L.: Canonical frames for planar object recognition. In: Sandini, G. (ed.) ECCV 1992. LNCS, vol. 588. Springer, Heidelberg (1992)
Ruiz-Correa, S., Shapiro, L., Meila, M.: A new signature-based method for efficient 3-d object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2001)
Russell, B., Torralba, A., Murphy, K., Freeman, W.: Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision (in press)
Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: Proceedings of the International Conference on Computer Vision, pp. 1–8 (2007)
Savarese, S., Fei-Fei, L.: View synthesis for recognizing unseen poses of object classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 602–615. Springer, Heidelberg (2008)
Schneiderman, H., Kanade, T.: A statistical approach to 3D object detection applied to faces and cars. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 746–751 (2000)
Seitz, S., Dyer, C.: View morphing. In: Proceedings of the ACM SIGGRAPH, pp. 21–30 (1996)
Shimshoni, I., Ponce, J.: Finite-resolution aspect graphs of polyhedral objects. IEEE Transaction on Pattern Analysis Machine Intelligence 19(4), 315–327 (1997)
Stewman, J., Bowyer, K.: Learning graph matching. In: Proceedings of the International Conference on Computer Vision, pp. 494–500 (1988)
Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: Proceedings of International Conference on Computer Vision (2009)
Sun, M., Su, H., Savarese, S., Fei-Fei, L.: A multi-view probabilistic model for 3d object classes. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)
Tangelder, J.W.H., Veltkamp, R.C.: A survey of content based 3d shape retrieval methods. In: Proceedings of Shape Modeling Applications, pp. 145–156 (2004)
Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., Van Gool, L.: Towards multi-view object class detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 1589–1596 (2006)
Torralba, A., Murphy, K., Freeman, W.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2004)
Ullman, S., Basri, R.: Recognition by linear combination of models. Technical Report, Cambridge, MA, USA (1989)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 511–518 (2001)
Weber, M., Einhäuser, W., Welling, M., Perona, P.: Viewpoint-invariant learning and detection of human heads. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (2000)
Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 101–108. Springer, Heidelberg (2000)
Xiao, J., Chen, J., Yeung, D.Y., Quan, L.: Structuring visual words in 3d for arbitrary-view object localization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 725–737. Springer, Heidelberg (2008)
Yan, P., Khan, D., Shah, M.: 3d model based object class detection in an arbitrary view. In: Proceedings of the International Conference on Computer Vision (2007)
Yan Li Leon Gu, T.K.: A robust shape model for multi-view car alignment. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)
Zhang, Z.: Floatboost learning and statistical face detection. IEEE Transaction on Pattern Analysis Machine Intelligence 26(9), 1112–1123 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Savarese, S., Fei-Fei, L. (2010). Multi-view Object Categorization and Pose Estimation. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds) Computer Vision. Studies in Computational Intelligence, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12848-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-12848-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12847-9
Online ISBN: 978-3-642-12848-6
eBook Packages: EngineeringEngineering (R0)