Skip to main content
Log in

Learning Recognition and Segmentation Using the Cresceptron

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents a framework called Cresceptron for view-based learning, recognition and segmentation. Specifically, it recognizes and segments image patterns that are similar to those learned, using a stochastic distortion model and view-based interpolation, allowing other view points that are moderately different from those used in learning. The learning phase is interactive. The user trains the system using a collection of training images. For each training image, the user manually draws a polygon outlining the region of interest and types in the label of its class. Then, from the directional edges of each of the segmented regions, the Cresceptron uses a hierarchical self-organization scheme to grow a sparsely connected network automatically, adaptively and incrementally during the learning phase. At each level, the system detects new image structures that need to be learned and assigns a new neural plane for each new feature. The network grows by creating new nodes and connections which memorize the new image structures and their context as they are detected. Thus, the structure of the network is a function of the training exemplars. The Cresceptron incorporates both individual learning and class learning; with the former, each training example is treated as a different individual while with the latter, each example is a sample of a class. In the performance phase, segmentation and recognition are tightly coupled. No foreground extraction is necessary, which is achieved by backtracking the response of the network down the hierarchy to the image parts contributing to recognition. Several stochastic shape distortion models are analyzed to show why multilevel matching such as that in the Cresceptron can deal with more general stochastic distortions that a single-level matching scheme cannot. The system is demonstrated using images from broadcast television and other video segments to learn faces and other objects, and then later to locate and to recognize similar, but possibly distorted, views of the same objects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, J. R. 1990. Cognitive Psychology and Its Implications. 3rd edition, Freeman: New York.

    Google Scholar 

  • Arman, F. and Aggarwal, J. K. 1991. Automatic generation of recognition strategies using CAD models. In Proc. IEEE Workshop on Directions in Automated CAD-Based Vision, pp. 124-133.

  • Bichsel, M. 1991. Strategies of robust object recognition for the automatic identification of human faces. Ph. D. thesis, Swiss Federal Institute of Technology, Zurich, Switzerland.

    Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and regression trees. Wadsworth, CA.

  • Brooks, R. A. 1981. Symbolic reasoning among 3-D models and 2-D images. Artificial Intelligence, 17(1-3):285-348.

    Google Scholar 

  • Carbonetto, S. and Muller, K. J. 1982. Nerve fiber growth and the cellular response to axotomy. Current Topics in Developmental Biology, 17:33-76.

    Google Scholar 

  • Carew, T. J. 1989. Developmental assembly of learning in aplysia. Trends in Neurosciences, 12:389-394.

    Google Scholar 

  • Carey, S. 1985. Conceptual Change in Childhood. The MIT Press: Cambridge, MA.

    Google Scholar 

  • Chen, C. and Kak, A. 1989. A robot vision system for recognizing 3-D objects in low-order polynomial time. IEEE Trans. Systems, Man, and Cybernetics, 19(6):1535-1563.

    Google Scholar 

  • Cover, T. M. Learning in pattern recognition. Methodologies of Pattern Recognition, in S. Watanabe (Ed.), Academic Press: New York, pp. 111-132.

  • Cover, T. M. and Hart, P. E. 1967. Nearest neighbor pattern classification. IEEE Trans. Information Theory, IT-13:21-27.

    Google Scholar 

  • Desmond, N. L. and Levy, W. B. 1988. Anatomy of associative long-term synaptic modification. Long-Term Potentiation: From Biophysics to Behavior, in P. W. Landfield and S. A. Deadwyer (Eds.), Alan R. Liss, New York, pp. 265-305.

    Google Scholar 

  • Dreher, B. and Sanderson, K. J. 1973. Receptive field analysis: Responses to moving visual contours by single lateral geniculate neurons in the cat. Journal of Physiology, London, 234:95-118.

    Google Scholar 

  • Faugeras, O. D. and Hebert, M. 1986. The representation, recognition and location of 3-D objects. Int'l J. Robotics Research, 5(3):27-52.

    Google Scholar 

  • Forsyth, D., Mundy, J. L., Zisserman, A., Coelho, C., Heller, A., and Rothwell, C. 1991. Invariant descriptors for 3-D object recognition and pose. IEEE Trans. Pattern Anal. and Machine Intell., 13(10):971-992.

    Google Scholar 

  • Fu, K. S. 1968. Sequential methods in Pattern Recognition and Machine Learning, Academic Press: New York.

    Google Scholar 

  • Fukushima, K. 1975. Cognitron: A self-organizing multilayered neural network. Biological Cybernetics, 20:121-136.

    Google Scholar 

  • Fukushima, K. 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36:193-202.

    Google Scholar 

  • Fukushima, K., Miyake, S., and Ito, T. 1983. Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Systems, Man, Cybernetics, 13(5):826-834.

    Google Scholar 

  • Gool, L. V., Kempenaers, P., and Oosterlinck, A. 1991. Recognition and semi-differential invariants. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. pp. 454-460.

  • Grimson, W. E. L. and Lozano-Perez, T. 1984. Model-based recognition sand localization from sparse range or tactile data. International Journal of Robotics Research, 3(3):3-35.

    Google Scholar 

  • Guth, L. 1975. History of central nervous system regeneration research. Experimental Neurology, 48(3-15).

  • Hansen, C. and Henderson, T. C. 1989. CAGD-based computer vision. IEEE Trans. Pattern Anal. and Machine Intell., 10(11):1181- 1193.

    Google Scholar 

  • Hebb, D. O. 1949. The organization of behavior. Wiley: New York.

    Google Scholar 

  • Hubel, D. H. 1988. Eye, Brain, and Vision. Scientific American Library, 22.

  • Hubel, D. H. and Wiesel, T. N. 1977. Functional Architecture of macaque monkey visual cortex. Proc. Royal Society of London, Ser. B, Vol. 198, pp. 1-59.

    Google Scholar 

  • Huttenlocher, D. P. and Ullman, S. 1987. Object recognition using alignment. In Proc. Int'l Conf. Computer Vision, London, England, pp. 102-111.

  • Highleyman, W. H. 1962. Linear decision functions, with application to pattern recognition. Proc. IRE, Vol. 50, pp. 1501-1514.

    Google Scholar 

  • Iarbus, A. L. 1967. Eye Movements and Vision. Plenum Press: New York.

    Google Scholar 

  • Ikeuchi, K. and Kanade, T. 1988. Automatic generation of object recognition programs. In Proc. IEEE, Vol. 76, No. 8, pp. 1016- 1035.

    Google Scholar 

  • Jain, A. K. 1989. Fundamentals of Digital Image Processing. Prentice Hall: New Jersey.

    Google Scholar 

  • Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data. Prentice-Hall: New Jersey.

    Google Scholar 

  • Jain, A. K. and Hoffman, R. L. 1988. Evidence-based recognition of 3-D objects. IEEE Trans. Pattern Anal. and Machine Intell., 10(6):783-802.

    Google Scholar 

  • Kandel, E. and Schwartz, J. H. 1982. Molecular biology of learning: Modulation of transmitter release. Science, 218:433-443.

    Google Scholar 

  • Keehn, D. G. 1965. A note on learning for Gaussian properties. IEEE Trans. Information Theory, IT-11:126-132.

    Google Scholar 

  • Kohonen, T. 1988. Self-Organization and Associative Memory. 2nd edition, Springer-Verlag: Berlin.

    Google Scholar 

  • Kolers, P. A., Duchnicky, R. L., and Sundstroem, G. 1985. Size in visual processing of faces and words. J. Exp. Psychol. Human Percept. Perform., 11:726-751.

    Google Scholar 

  • Lamdan, Y. and Wolfson, H. J. 1988. Geometric hashing: A general and efficient model-based recognition scheme. In Proc. 2nd International Conf. Computer Vision, pp. 238-246.

  • Lévy-Schoen, A. 1981. Flexible and/or rigid control of oculomotor scanning behavior. In J. W. Senders (Ed.), Eye Movements: Cognition and Visual Perception, Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 299-314.

    Google Scholar 

  • Lippmann, R. P. 1987. An introduction to computing with neural nets. IEEE ASSP Magazine, 4(2):4-22.

    Google Scholar 

  • Loftsgaarden, D. O. and Quesenberry, C. P. 1965. A nonparametric estimate of a multivariate density function. Ann. Math. Stat., 36:1049-1051.

    Google Scholar 

  • Lowe, D. G. 1985. Perceptual Organization and Visual Recognition. Kluwer Academic: Hingham, MA.

    Google Scholar 

  • Martinez, J. L., Jr. and Kessner, R. P. (Eds.) 1991. Learning and Memory: A Biological View. 2nd edition, Academic Press: San Diego.

    Google Scholar 

  • Michalski, R., Mozetic, I., Hong, J., and Lavrac, N. 1986. The multipurpose incremental learning system AQ15 and its testing application to three medical domains. In Proc. Fifth Annual National Conf. Artificial Intelligence, Philadelphia, PA, pp. 1041-1045.

  • Nazir, T. A. and O'Regan, J. K. 1990. Some results on translation invariance in the human visual system. Spatial Vision, 5(2):81- 100.

    Google Scholar 

  • Pavlidis, T. 1992. Why progress in machine vision is so slow. Pattern Recognition Letters, 13:221-225.

    Google Scholar 

  • Poggio, T. and Edelman, S. 1990. A network that learns to recognize three-dimensional objects. Nature, 343:263-266.

    Google Scholar 

  • Pomerleau, D. A. 1989. ALVINN: An autonomous Land Vehicle in a Neural Network. Advances in Neural Information Processing, in D. Touretzky (Ed.), Vol. 1, pp. 305-313, Morgran-Kaufmann Publishers: San Mateo, CA.

    Google Scholar 

  • Quinlan, J. 1986. Introduction of Decision Trees. Machine Learning, 1:81-106.

    Google Scholar 

  • Pavlidis, T. 1977. Structural Pattern Recognition. Springer-Verlag: New York.

    Google Scholar 

  • Rakic, P. 1988. Specification of cerebral cortical areas. Science, 241:170-176, 1988.

    Google Scholar 

  • Ramachandran, V. S. 1990. Perceiving shape from shading. The Perceptual World, in I. Rock (Ed.), Freeman: San Francisco, CA, pp. 127-138.

    Google Scholar 

  • Rowley, H. A., Baluja, S., and Kanade, T. 1995. Human face detection in visual scenes. Report CMU-CS-95-158, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.

    Google Scholar 

  • Royden, H. L. Real Analysis. Macmillan: New York.

  • Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986. Learning internal representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations, in D. E. Rumelhart and J. L. McClelland (Eds.), MIT Press, MA.

    Google Scholar 

  • Sacks, O. 1993. To see and not see. The New Yorker, pp. 59-73.

  • Sato H. and Binford, T. O. 1992. On finding the ends of straight homogeneous generalized cylinders. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, Urbana, IL, pp. 695- 698.

    Google Scholar 

  • Shatz, C. J. 1992. The developing brain. Scientific American, pp. 61- 67.

  • Stein, F. and Medioni, G. 1992. Structural indexing: Efficient 3-D object recognition. IEEE Trans. Pattern Anal. and Machine Intell., 14(2):125-144.

    Google Scholar 

  • Sung, K. and Poggio, T. 1994. Example-based learning for view-based human face detection. A. I. Memo 1521, CBCL paper 112, MIT.

  • Swets, D., Punch, B., and Weng, J. 1995. Genetic algorithm for object recognition in a complex scene. In Proc. Int'l Conf. on Image Processing, Washington, D. C., pp. 22-25.

  • Thompson, P. 1980. Margaret Thatcher: a new illusion. Perception, 9:483-484.

    Google Scholar 

  • Treisman, A. M. 1983. The role of attention in object perception. Physical and Biological Processing of Images, in O. J. Braddick and A. C. Sleigh (Eds.), Springer-Verlag: Berlin.

    Google Scholar 

  • Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71-86.

    Google Scholar 

  • Weiss, I. 1993. Geometric invariants and object recognition. Int'l Journal of Computer Vision, 10(3):207-231.

    Google Scholar 

  • Weng, J. 1993. On the structure of retinotopic hierarchical networks. In Proc. World Congress on Neural Networks, Portland, Oregon, Vol. IV, pp. 149-153.

    Google Scholar 

  • Weng, J. 1996. Cresceptron and SHOSLIF: Toward comprehensive visual learning. In S. K. Nayar and T. Poggio (Eds.), Early Visual Learning, Oxford University Press: New York.

    Google Scholar 

  • Weng, J., Ahuja, N., and Huang, T. S. 1992. Cresceptron: A self-organizing neural network which grows adaptively. In Proc. International Joint Conference on Neural Networks, Baltimore, Maryland, Vol. I, pp. 576-581.

    Google Scholar 

  • Weng, J., Ahuja, N., and Huang, T. S. 1993. Learning recognition and segmentation of 3-D objects from 2-D images. In Proc. 4th International Conf. Computer Vision, Berlin, Germany, pp. 121- 128.

  • Wilson, H. R. and Giese, S. C. 1977. Threshold visibility of frequency gradient patterns. Vision Research, 17:1177-1190.

    Google Scholar 

  • Wilson, H. R. and Bergen, J. R. 1979. A four mechanism model for spatial vision. Vision Research, 19:19-32.

    Google Scholar 

  • Yang, G. and Huang, T. S. 1994. Human face detection in a complex background. Pattern Recognition, 27(1):53-63.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weng, J.(., Ahuja, N. & Huang, T.S. Learning Recognition and Segmentation Using the Cresceptron. International Journal of Computer Vision 25, 109–143 (1997). https://doi.org/10.1023/A:1007967800668

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007967800668

Navigation