Elsevier

Pattern Recognition

Volume 40, Issue 3, March 2007, Pages 993-1003
Pattern Recognition

Handwritten digit classification using higher order singular value decomposition

https://doi.org/10.1016/j.patcog.2006.08.004Get rights and content

Abstract

In this paper we present two algorithms for handwritten digit classification based on the higher order singular value decomposition (HOSVD). The first algorithm uses HOSVD for construction of the class models and achieves classification results with error rate lower than 6%. The second algorithm uses the HOSVD for tensor approximation simultaneously in two modes. Classification results for the second algorithm are almost down at 5% even though the approximation reduces the original training data with more than 98% before the construction of the class models. The actual classification in the test phase for both algorithms is conducted by solving a series least squares problems. Considering computational amount for the test presented the second algorithm is twice as efficient as the first one.

Introduction

The automatic classification of handwritten digits is often considered as a standard problem in pattern recognition, and it involves many of the difficulties encountered in this area. The task of assigning an unknown object to one of the 10 predefined classes is a hard problem to solve, since the variation of the objects within each class is high, at the same time as objects from different classes may be quite similar.

There are many different approaches to solving this problem: principal component analysis (PCA), support vector methods, nearest neighbor and k-nearest neighbor methods, regression, statistical modelling and neural networks. Surveys of different pattern recognition methods are given in Refs.[1], [2]. A comparison of different algorithms for classification of handwritten digits is given in Ref. [3]. The best performing algorithms are based on neural networks [4], and the so-called tangent distance [5], that uses a distance measure invariant under local affine transformations. Other algorithms are given in Ref. [6] (extended tangent distance), [7] (elastic matching) and [8] (manifold modelling). In general algorithms with good performance have either large descriptive complexity or are computationally heavy (in training and/or classification).

In this paper we will present two simple and efficient algorithms with fairly good performance. Both algorithms are based on the higher order singular value decomposition (HOSVD) of a tensor [9]. The first algorithm uses HOSVD to compute a small set of basis matrices that span the dominant subspace for each class of digits. The basis matrices are then used to describe unknown digits. This algorithm is closely related to SIMCA [10] and PCA. The second algorithm uses HOSVD to compress the training set. The class models (here basis vectors) are computed using the reduced data only, and classification is performed as in the first algorithm. The advantages are twofold; the descriptions of the class models require less memory and the classification phase is more efficient without drawbacks in the performance. The algorithm gives an error rate of 5% even after a compression of the training set with more than 98%.

In recent years the application of tensor methods to problems in pattern recognition and other areas has attracted more and more attention. By tensors we mean multidimensional or multimode arrays. Often the data have a multidimensional structure and it is then somewhat unnatural to organize them as matrices or vectors. As a simple example, consider a time series of images. Each image is a two-dimensional data array and together with images from different time steps the data constitute a third order tensor (three-dimensional data array). In many cases it is beneficial to use the collected data without destroying its inherent multidimensional structure. Tensor methods have been used for a long time in chemometrics and psychometrics (see, e.g., [11]). Recently HOSVD has been applied to face recognition [12].

In this paper we use handwritten digits from the US Postal Service database to test the performance of the algorithms proposed. The digits are represented by 16×16 gray scale pixel images scanned from envelopes. This data set is widely used for evaluation of classification algorithms. Samples are shown in Figure 1.

The rest of the paper is structured as follows: Section 2 contains an introduction to tensor concepts and some theoretical results that are used in the algorithms. In Section 3 both algorithms are presented. The numerical tests are described in Section 4, where also a more detailed description of the data set is given.

The algorithms will be illustrated using pseudo-MATLAB code. Thus, in code examples we will use the notation A(i,j,k) meaning aijk. Also in formulas we will sometimes use MATLAB-style notation. For instance, we define the mode-1 fibers of a 3-tensor A to be the column vectors A(:,j,k). The definition of fibers along the other modes is obvious. Thus, fibers are characterized by fixing the index in all modes but one. Similarly, we define slices of a tensor to be the sub-tensors obtained by fixing the index in one mode, e.g. A(:,:,k).

Section snippets

Tensor concepts

Loosely speaking, an Nth order tensor is an object with N indices. The “dimensions” of a tensor will be referred to as modes. Vectors and matrices can be considered as first and second order tensors, respectively. In the applications of this paper we will deal with the case N=3. Therefore, for simplicity, some of the theory presented in this section is stated only for third order tensors ARI×J×K,where I, J, K are positive integers; the vector space RI×J×K has dimension IJK. The generalization

Algorithm 1: Classification by HOSVD

In this section we describe how HOSVD can be used to build algorithms for handwritten digit classification. The training set of the digits is manually classified. Considering each digit5 as a point in R20×20, it is reasonable to assume that the digits of the training set constitute 10 fairly well separated clusters,

Tests and results

The process described this far is general and there are several parameters that can be varied to customize an algorithm. One such parameter is the number k of basis matrices to be used in the classification. Other parameters are p and q in the second algorithm. In this section we present the tests performed to validate the algorithms. But first we give a brief description of the data set we used in the tests and how it was preprocessed.

Conclusions

In this paper we have presented two rather simple and computationally effective algorithms based on linear and multilinear theory. We have showed that both algorithms achieve satisfactory classification results. Particulary the main algorithm in the paper gives an error rate of 5–5.5% even after reducing the training set with more than 98% prior the construction of the class specific models.

References (21)

  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • T. Hastie et al.

    The elements of statistical learning

    (2001)
  • Y. LeCun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Müller, E. Säckinger, P. Simard,...
  • Y. LeCun et al.

    Gradient-based learning applied to document recognition

    Proc. IEEE

    (1998)
  • P.Y. Simard et al.

    Transformation invariance in pattern recognition—tangent distance and tangent propagation

    Internat. J. Imag. Systems Technol.

    (2000)
  • D. Keysers, J. Dahmen, T. Theiner, H. Ney, Experiments with an extended tangent distance, Technical Report, Lehrstuhl...
  • P. Scattolin, Recognition of handwritten numerals using elastic matching, Master's Thesis, Department of Computer...
  • G.E. Hinton et al.

    Modelling the manifolds of images of handwritten digits

    IEEE Trans. Neural Networks

    (1997)
  • L. De Lathauwer et al.

    A multilinear singular value decomposition

    SIAM J. Matrix Anal. Appl.

    (2000)
  • S. Wold

    Pattern recognition by means of disjoint principal components models

    Pattern Recognition

    (1975)
There are more references available in the full text version of this article.

Cited by (0)

View full text