Abstract
A major problem in object recognition is that a novel image of a given object can be different from all previously seen images. Images can vary considerably due to changes in viewing conditions such as viewing position and illumination. In this paper we distinguish between three types of recognition schemes by the level at which generalization to novel images takes place: universal, class, and model-based. The first is applicable equally to all objects, the second to a class of objects, and the third uses known properties of individual objects. We derive theoretical limitations on each of the three generalization levels. For the universal level, previous results have shown that no invariance can be obtained. Here we show that this limitation holds even when the assumptions made on the objects and the recognition functions are relaxed. We also extend the results to changes of illumination direction. For the class level, previous studies presented specific examples of classes of objects for which functions invariant to viewpoint exist. Here, we distinguish between classes that admit such invariance and classes that do not. We demonstrate that there is a tradeoff between the set of objects that can be discriminated by a given recognition function and the set of images from which the recognition function can recognize these objects. Furthermore, we demonstrate that although functions that are invariant to illumination direction do not exist at the universal level, when the objects are restricted to belong to a given class, an invariant function to illumination direction can be defined. A general conclusion of this study is that class-based processing, that has not been used extensively in the past, is often advantageous for dealing with variations due to viewpoint and illuminant changes.
Similar content being viewed by others
References
Adini, Y., Moses, Y. and Ullman, S. 1997. Face recognition: the problem of compensating for illumination changes. IEEE Transactions on Pattern Analysis and Machine Intelligence,19:721–732.
Basri, R. and Moses, Y. 1998. When is it possible to identify 3D objects from single images using class constraints? In International Conference on Computer Vision,pp. 541–548.
Belhumeur, P.N., Hespanha, J.P. and Kriegman, D.J. 1997. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7): 711–720.
Biederman, I. 1985. Human image understanding: recent research and a theory. Computer, Graphics, and Image Processing,32:29–73.
Brunelli, R. and Poggio, T. 1991. HyperBF networks for real object recognition. In IJCAI,Australia, pp. 1278–1284.
Burns, J.B. Weiss, R.S. and Riseman, E.M. 1992. The non-existence of general-case view-invariants. In J. L. Mundy and A. Zisserman, Eds., Geometrical Invariance in Computer Vision,M.I.T. Press.
Canny, J. F. 1986. A computational approach to edge detection. Pattern Analysis and Machine Intelligence,8:679–698.
Clemens, D.J. and Jacobs, D.W. 1990. Model-group indexing for recognition. In Proc. Image Understanding Workshop,pp. 604–613.
Clemens, D.J. and Jacobs, D.W. 1991. Space and time bounds on indexing 3D models from 2D images. Pattern Analysis and Machine Intelligence,13(10):1007–1017.
Craw, I., Ellis, H. and Lishman, J.R. 1987. Automatic extraction of face-features. Pattern Recognition Letters,5:183–187.
Daugman, J. G. 1985. Uncertainty relation for resolution in space, spatial frequency and orientation, optimized by two dimensional cortical filters. Journal of Optical Society of America,2:1160–1169.
Davis, L. S. 1975. A survey of edge detection techniques. Computer Graphics and Image Processing,4:248–270.
Faugeras, O.D. 1992. What can be seen in three dimensions with an uncalibrated stereo rig? In Proc. European Conference on Computer Vision,pp. 563–564.
Fawcett, R., Zisserman, A., and Brady, J.M. 1994. Extracting structure from an affine view of a 3D point set with one or two bilateral symmetries. Image and Vision Computing,12(9):615–622.
Fischler, M. A., and Bolles, R. C. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM,24:381–395.
Hallinan, P.W. A low-dimensional representation of human faces for arbitrary lighting conditions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pp. 995–999.
Haralick, R. M. 1984. Digital step edges from zero crossings of second directional derivatives. IEEE Transactions on Pattern Analysis and Machine Intelligence,6:58–68.
Hubel, D.G. and Wiesel, T.N. 1962. Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. Journal of Physiology,160:106–154.
Hubel, D.G. and Wiesel, T.N. 1968. Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology,195:215–243.
Huttenlocher, D. P., and Ullman, S. 1990. Recognizing solid objects by alignment with an image. International Journal of Computer Vision, 5(2): 195–212.
Jacobs, D. 1992. Space efficient 3D model indexing. In IEEE Conference on Computer Vision and Pattern Recognition,pp. 439–444.
Kanade, T. 1977. Computer recognition of human faces. Birkhauser Verlag.
Kaya, Y. and Kobayashi, K. 1972. A basic study of human face recognition. In S. Watanabe, Ed.,Frontiers of Pattern Recognition,pp. 265–289.
Koenderink, J. J., and Van Doorn, A. J. 1991. Affine structure from motion. Journal of the Optical Society of America,8(2):377-385.
Lamdan, Y., Schwartz, J.T. and Wolfson, H.J. 1987. Affine invariant model-based object recognition. IEEE Transaction on Robotics and Automation,6:578–589.
Lamdan, Y. and Wolfson, H. 1988. Geometric hashing: a general and efficient recognition scheme. In Proceedings of the 2nd International Conference on Computer Vision, pp. 238–251.
Longuet-Higgins, H. C. 1981. Acomputer algorithm for reconstructing a scene from two projections. Nature,293:133–135.
Lowe, D. G. 1987. Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence,31:355–395.
Marcelja, S. 1980. Mathematical description of the responses of simple cortical cells. J. Optical Soc., 70:1297–1300.
Marr, D. and Hildreth, E. 1980. Theory of edge detection. Proc. R. Soc. Lond. B,207:187–217.
Moses, Y. 1993. Face recognition: generalization to novel images. Ph.D Thesis, Weizmann Institute of Science.
Moses, Y., Edelman, S. and Ullman, S. 1996. Generalization to novel images in upright and inverted faces. Perception,25:443–461.
Moses, Y., and Ullman, S. 1992. Limitation of Non-model-based recognition schemes. In Proc. European Conference on Computer Vision,pp. 820–828.
Nixon, M. 1985. Eye spacing measurements for facial recognition. SPIE Application of Digital Image Processing VIII,575:279–285.
Pollen, D., and Ronner, S. 1983. Visual cortical neurons as localized spatial frequency filters. IEEE Transactions on System, Man and Cybernetics,SMC-13: 907–916.
Rothwell, C. A., Forsyth, D. A., Zisserman, A. and Mundy, J.L. 1993. Extracting projective structure from single perspective views of 3D point sets. In Proceeding of International Conference on Computer Vision,pp. 573–582.
Rothwell, C.A., Zisserman, A., Forsyth, D.A. and Mundy, J.L. 1992. Canonical frames for planar object recognition. In European Conference on Computer Vision,pp. 757–772.
Shashua, A. 1992. Illumination and viewposition in 3D visual recognition. In J.E. Moody, J. E. Hanson, and R.P. Lippman, Eds., Advances in Neural Information Processing Systems 4, Morgan Kaufman, pp. 68–74.
Torre, V., and Poggio, T. 1986. On edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence,8:147–163.
Tsai, R.Y. and Huang, T.S. 1984. Uniqueness and estimation of three dimensional motion parameters of rigid objects with curved surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence,6:13-27.
Ullman, S. 1979. The interpretation of visual motion. MIT Press.
Ullman, S. 1989. Aligning pictorial descriptions: an approach to object recognition. Cognition,32:93–254.
Ullman, S. and Basri, R. 1991. Recognition by linear combinations of models. IEEE Transactions on Pattern Analysis and Machine Intelligence,13:992–1005.
Viola, P., and Wells III, W. M. 1995. Alignment by maximization of mutual information. In Fifth International Conference on Computer Vision,pp.16–23.
Warrington, E.K, and Taylor, A.M. 1978. Two categorical stages of object recognition. Perception,7:152–164.
Weinshall, D. 1993. Model-based invariants for 3D vision. International Journal on Computer Vision,10(1):27–42.
Wong, K.H., Law, H.M. and Tsang, P.W.M. 1989. A system for recognising human face. In Proc. ICASSP,pp. 1638–1642.
Yuille, A. L., Cohen, D.C. and Hallinan, P.W. 1992. Feature extraction from faces using deformable templates. International Journal of Computer Vision,8(2):99–111.
Zisserman, A., Forsyth, D., Mundy, J., Rothwell, C., Liu, J. and Pillow, N. 1995. 3D Object Recognition Using Invariance. Artificial Intelligent, 78(1-2):239–288.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Moses, Y., Ullman, S. Generalization to Novel Views: Universal, Class-based, and Model-based Processing. International Journal of Computer Vision 29, 233–253 (1998). https://doi.org/10.1023/A:1008088813977
Issue Date:
DOI: https://doi.org/10.1023/A:1008088813977