Handwritten digit classification using higher order singular value decomposition
Introduction
The automatic classification of handwritten digits is often considered as a standard problem in pattern recognition, and it involves many of the difficulties encountered in this area. The task of assigning an unknown object to one of the 10 predefined classes is a hard problem to solve, since the variation of the objects within each class is high, at the same time as objects from different classes may be quite similar.
There are many different approaches to solving this problem: principal component analysis (PCA), support vector methods, nearest neighbor and k-nearest neighbor methods, regression, statistical modelling and neural networks. Surveys of different pattern recognition methods are given in Refs.[1], [2]. A comparison of different algorithms for classification of handwritten digits is given in Ref. [3]. The best performing algorithms are based on neural networks [4], and the so-called tangent distance [5], that uses a distance measure invariant under local affine transformations. Other algorithms are given in Ref. [6] (extended tangent distance), [7] (elastic matching) and [8] (manifold modelling). In general algorithms with good performance have either large descriptive complexity or are computationally heavy (in training and/or classification).
In this paper we will present two simple and efficient algorithms with fairly good performance. Both algorithms are based on the higher order singular value decomposition (HOSVD) of a tensor [9]. The first algorithm uses HOSVD to compute a small set of basis matrices that span the dominant subspace for each class of digits. The basis matrices are then used to describe unknown digits. This algorithm is closely related to SIMCA [10] and PCA. The second algorithm uses HOSVD to compress the training set. The class models (here basis vectors) are computed using the reduced data only, and classification is performed as in the first algorithm. The advantages are twofold; the descriptions of the class models require less memory and the classification phase is more efficient without drawbacks in the performance. The algorithm gives an error rate of 5% even after a compression of the training set with more than 98%.
In recent years the application of tensor methods to problems in pattern recognition and other areas has attracted more and more attention. By tensors we mean multidimensional or multimode arrays. Often the data have a multidimensional structure and it is then somewhat unnatural to organize them as matrices or vectors. As a simple example, consider a time series of images. Each image is a two-dimensional data array and together with images from different time steps the data constitute a third order tensor (three-dimensional data array). In many cases it is beneficial to use the collected data without destroying its inherent multidimensional structure. Tensor methods have been used for a long time in chemometrics and psychometrics (see, e.g., [11]). Recently HOSVD has been applied to face recognition [12].
In this paper we use handwritten digits from the US Postal Service database to test the performance of the algorithms proposed. The digits are represented by gray scale pixel images scanned from envelopes. This data set is widely used for evaluation of classification algorithms. Samples are shown in Figure 1.
The rest of the paper is structured as follows: Section 2 contains an introduction to tensor concepts and some theoretical results that are used in the algorithms. In Section 3 both algorithms are presented. The numerical tests are described in Section 4, where also a more detailed description of the data set is given.
The algorithms will be illustrated using pseudo-MATLAB code. Thus, in code examples we will use the notation A(i,j,k) meaning . Also in formulas we will sometimes use MATLAB-style notation. For instance, we define the fibers of a 3-tensor to be the column vectors . The definition of fibers along the other modes is obvious. Thus, fibers are characterized by fixing the index in all modes but one. Similarly, we define slices of a tensor to be the sub-tensors obtained by fixing the index in one mode, e.g. .
Section snippets
Tensor concepts
Loosely speaking, an th order tensor is an object with N indices. The “dimensions” of a tensor will be referred to as modes. Vectors and matrices can be considered as first and second order tensors, respectively. In the applications of this paper we will deal with the case . Therefore, for simplicity, some of the theory presented in this section is stated only for third order tensors where I, J, K are positive integers; the vector space has dimension IJK. The generalization
Algorithm 1: Classification by HOSVD
In this section we describe how HOSVD can be used to build algorithms for handwritten digit classification. The training set of the digits is manually classified. Considering each digit5 as a point in , it is reasonable to assume that the digits of the training set constitute 10 fairly well separated clusters,
Tests and results
The process described this far is general and there are several parameters that can be varied to customize an algorithm. One such parameter is the number k of basis matrices to be used in the classification. Other parameters are p and q in the second algorithm. In this section we present the tests performed to validate the algorithms. But first we give a brief description of the data set we used in the tests and how it was preprocessed.
Conclusions
In this paper we have presented two rather simple and computationally effective algorithms based on linear and multilinear theory. We have showed that both algorithms achieve satisfactory classification results. Particulary the main algorithm in the paper gives an error rate of 5–5.5% even after reducing the training set with more than 98% prior the construction of the class specific models.
References (21)
- et al.
Pattern Classification
(2001) - et al.
The elements of statistical learning
(2001) - Y. LeCun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Müller, E. Säckinger, P. Simard,...
- et al.
Gradient-based learning applied to document recognition
Proc. IEEE
(1998) - et al.
Transformation invariance in pattern recognition—tangent distance and tangent propagation
Internat. J. Imag. Systems Technol.
(2000) - D. Keysers, J. Dahmen, T. Theiner, H. Ney, Experiments with an extended tangent distance, Technical Report, Lehrstuhl...
- P. Scattolin, Recognition of handwritten numerals using elastic matching, Master's Thesis, Department of Computer...
- et al.
Modelling the manifolds of images of handwritten digits
IEEE Trans. Neural Networks
(1997) - et al.
A multilinear singular value decomposition
SIAM J. Matrix Anal. Appl.
(2000) Pattern recognition by means of disjoint principal components models
Pattern Recognition
(1975)