Incremental class learning approach and its application to handwritten digit recognition

https://doi.org/10.1016/S0020-0255(02)00170-6Get rights and content

Abstract

Incremental class learning (ICL) provides a feasible framework for the development of scalable learning systems. Instead of learning a complex problem at once, ICL focuses on learning subproblems incrementally, one at a time – using the results of prior learning for subsequent learning – and then combining the solutions in an appropriate manner. With respect to multi-class classification problems, the ICL approach presented in this paper can be summarized as follows. Initially the system focuses on one category. After it learns this category, it tries to identify a compact subset of features (nodes) in the hidden layers, that are crucial for the recognition of this category. The system then freezes these crucial nodes (features) by fixing their incoming weights. As a result, these features cannot be obliterated in subsequent learning. These frozen features are available during subsequent learning and can serve as parts of weight structures built to recognize other categories. As more categories are learned, the set of features gradually stabilizes and learning a new category requires less effort. Eventually, learning a new category may only involve combining existing features in an appropriate manner. The approach promotes the sharing of learned features among a number of categories and also alleviates the well-known catastrophic interference problem. We present promising results of applying the ICL approach to the unconstrained handwritten digit recognition problem, based on a spatio-temporal representation of patterns.

Introduction

The catastrophic interference problem [1], [2], [3] remains a significant impediment in building large, scalable learning systems based on neural networks. In its simplest form, the problem may be stated as follows: when a network trained to solve task A is subsequently trained to solve task B, it “forgets” the solution to task A. In other words, the network is unable to acquire new knowledge without destroying previously acquired knowledge structures. A seemingly simple solution to this problem is to retrain the network on a cumulative training set containing examples from all previously learned categories. However, for large-scale problems this approach is not practical.

Incremental training methods are especially important for the fast or real-time control problems (e.g. ATM traffic control). Due to the large amount of noisy on-line data the Multilayer Perceptron-type networks with relatively slow retraining schemes are not suitable for such tasks. While there exist neural network models with learning schemes not prone to catastrophic interference (e.g. ART-type networks [4]), their effectiveness in dealing with large-scale and noisy problem domains is still under research. Some promising results have been obtained for the hybrid neuro-fuzzy ART models, e.g. Fuzzy ARTMAP, FasArt or PROBART.

The incremental class learning (ICL) approach [5] attempts to address the catastrophic interference problem and at the same time offers a learning framework that promotes the sharing of previously learned knowledge structures. With respect to object recognition and classification problems, the approach may be summarized as follows: The system starts off with all the nodes and links it will ever have, but initially, it focuses on only a small number of categories. After it learns to recognize these categories, it tries to identify which of the features formed in the “hidden layers” play a critical role in the recognition of these categories. The system “freezes” these critical features by fixing their input weights. As a result, they cannot be obliterated by subsequent learning. These frozen features, however, can participate in structures that are learned subsequently to recognize other categories. As the system learns to recognize more and more categories, it is hoped that the set of features will gradually stabilize and eventually, learning a new category will primarily consist of combining existing features in novel ways.

The paper is organized as follows. In Section 2, the proposed ICL approach is described and its relation to some of the existing incremental learning methods is discussed. Section 3 presents computer simulation results of the ICL for the handwritten digit recognition (HDR) problem. Conclusions and directions for future work appear in Section 4.

Section snippets

Incremental class learning

The ICL approach is a supervised learning procedure for neural networks that can be described as follows:

  • Subproblems are learned incrementally.

  • Structures playing a critical role in solving a subproblem are frozen.

  • The above structures are available for subsequent learning.

  • Solutions to subproblems are combined in an appropriate manner to solve the complete problem.

The success of the approach depends on four key factors. First, it should be possible to decompose the problem into subproblems in an

ICL application to HDR problem

In this section we present the application of the proposed ICL method to the HDR problem. The main objective of the work reported here is to show the efficacy of the proposed learning scheme in the context of a non-trivial and real-world problem domain involving noisy data. While we strive for a solid recognition performance, it is not our objective to develop a state-of-the-art HDR system. Such systems achieve higher recognition rates by incorporating sophisticated and specialized post- and

Conclusions

In this work we have proposed the ICL approach based on freezing relevant features and sharing common (similar) features among multiple classes. The ICL approach not only takes advantage of existing knowledge when learning a new problem, it also offers immunity from the catastrophic interference problem. Promising results obtained for the unconstrained HDR problem suggest that the approach may be a suitable framework for building large, scalable learning systems. We conjecture that the sharing

References (21)

  • M. McCloskey et al.

    Catastrophic interference in connectionist networks: the sequential learning problem

  • G.A. Carpenter et al.

    A massively parallel architecture for a self-organizing neural pattern recognition machine

    Computer Vision Graphics and Image Processing

    (1987)
  • D. Rumelhart et al.

    Feature discovery by competitive learning

    Cognitive Science

    (1985)
  • J.L. McClelland, B.L. McNaughton, R.C. O'Reilly, Why there are complementary learning systems in the hippocampus and...
  • E. Pessa et al.

    Catastrophic interference in learning process by neural networks

  • L. Shastri, Attribution learning as a solution to the catastrophic interference problem in learning with neural nets,...
  • S.E. Fahlman et al.

    The cascade-correlation learning architecture

  • M. Frean

    The upstart algorithm: a method for constructing and training feedforward neural networks

    Neural Computation

    (1990)
  • A. Waibel

    Consonant recognition by modular construction of large phonemic time-delay neural networks

  • L. Shastri et al.

    Recognizing handwritten digit strings using modular spatio-temporal connectionist networks

    Connection Science

    (1995)
There are more references available in the full text version of this article.

Cited by (19)

  • Domain adaptation and continual learning in semantic segmentation

    2021, Advanced Methods and Deep Learning in Computer Vision
  • Achieving a compromise between performance and complexity of structure: An incremental approach

    2015, Information Sciences
    Citation Excerpt :

    These situations arise during many real-world tasks, and thus machine learning techniques that apply incremental learning have been applied to several applications, such as recommender systems that adapt dynamically [30], causality models for events built from data streams [1], robots that learn dynamic models incrementally [16] and adaptive mechanisms of spam filtering [13]. Several incremental approaches have been presented in the literature [6,27,28,33]. However, given the large amount of data needed for long-term training, these techniques tend to produce models with complicated structures, rendering them unfeasible for accomplishing certain tasks.

  • Reinforced learning systems based on merged and cumulative knowledge to predict human actions

    2014, Information Sciences
    Citation Excerpt :

    Imitation, observation and trial-and-error are usually used to implement learning systems. These systems are based on iterative learning [8,20,21,29,50,53,54] or incremental learning [6,19,26–28] techniques and use the results of previous learning phases to predict the current or future output data of a given system. Iterative learning control is usually used to learn from tracking errors, which are subsequently minimized when achieving automated repetitive tasks.

  • A multi-viewpoint system to support abductive reasoning

    2011, Information Sciences
    Citation Excerpt :

    Reasoning based on reinforcement to strengthen the reorganisation of knowledge [26,29,38,39]. Incremental reasoning to manage online-skill knowledge, focusing on a single problem at a time [22,23]. Reasoning based on adjustments to manage the current knowledge [20].

View all citing articles on Scopus
1

This research was performed while the author was visiting International Computer Science Institute and EECS Department, University of California at Berkeley, Berkeley, CA, USA thanks to the support of Fulbright Senior Scholarship No. 20895.

View full text