Elsevier

Pattern Recognition

Volume 34, Issue 10, October 2001, Pages 2041-2047
Pattern Recognition

A theorem on the uncorrelated optimal discriminant vectors

https://doi.org/10.1016/S0031-3203(00)00135-7Get rights and content

Abstract

This paper proposes a theorem on the uncorrelated optimal discriminant vectors (UODVs). It is proved that the classical optimal discriminant vectors are equivalent to UODV, which can be used to extract (L−1) uncorrelated discriminant features for L-class problems without losing any discriminant information in the meaning of Fisher discriminant criterion function. Experiments on Concordia University CENPARMI handwritten numeral database indicate that UODVs are much more powerful than the Foley–Sammon optimal discriminant vectors. It is believed that when the number of training samples is large, the conjugate orthogonal set of discriminant vectors can be much more powerful than the orthogonal set of discriminant vectors.

Introduction

It is well known that linear feature extraction is an efficient way of reducing dimensionality. Till date many linear feature extraction methods have been proposed. The Fisher linear discriminant vector [1] is very useful as a technique for pattern analysis. The basic idea is to calculate the Fisher optimal discriminant vector on the condition that the Fisher criterion function takes an extremum, then to construct a 1D feature space by projecting the high-dimensional feature vector on the obtained optimal discriminant vector. In 1962, Wilks proposed (L−1) vectors for L-class problems [2], [3], which can be called as the classical optimal discriminant vectors (CODVs).

Based on the Fisher linear discriminant method, Sammon proposed an optimal discriminat plane technology in 1970 [4]. In 1975, Foley and Sammon presented a set of optimal discriminant vectors for two-class problems [5], which are known as the Foley–Sammon optimal discriminant vectors (FSODVs). Kittler and Young [6] presented an approach to feature selection based on the Larhunen–Loeve expansion in 1973. In 1977, Kittler [7] discussed the relationship of the method of Kittler and Young and the FSODV method, and showed that the former method is based on conjugate orthogonality constraints, and is, from the point of view of dimensionality reduction, more powerful than the FSODV method, which is based on orthogonality constraints.

Okada and Tomita [8] proposed an optimal orthonormal system for discriminant analysis in 1985. Duchene and Leclercq [9] solved the problem of finding the set of FSODVs for multi-class problems. Hamamoto et al. proposed orthogonal discriminant analysis in a transformed space and presented a feature extraction method based on the modified “plus e-take away f” algorithm [10], [11]. These authors claimed that the orthogonal set of discriminant vectors is more powerful than CODV [8], [9], [10], [11].

Longstaff combined the Fisher vector with the Fukunaga–Koontz transform or a radius vector [12]. Liu et al. presented a generalized optimal set of discriminant vectors [13]. Jin et al. [14] proposed an optimal discriminant plane, which was more powerful than Sammon's plane with Iris data. Jin et al. [15] presented a set of uncorrelated optimal discriminant vectors (UODVs), which was shown to be more powerful than FSODV, and had been successfully used in face feature extraction [16].

There is a dimensionality problem about pattern feature extraction. It is believed that the accuracy of statistical pattern classifiers increases as the number of features increases, and decreases as the number becomes too large [17]. Fukunaga [18] showed that for L-class problems, there are (L−1) ideal features for classification. In other words, for L-class problems, the optimal number of features is (L−1). However, the ideal features are too hard to obtain in practice. Although CODV can be used to extract (L−1) features for L-class problems, it is not widely accepted to be effective and efficient.

In this paper, we present a theorem on UODV and discuss the effectiveness of CODV. The remainder of this paper is organized as follows: Section 2 has an introduction to CODV, FSODV and UODV. Section 3 presents a theorem on UODV. In Section 4, some experiments have been performed with the Concordia University CENPARMI handwritten numeral database. A brief summary is given in Section 5.

Section snippets

CODV, FSODV and UODV

Let, ω1, ω2, …, ωL be L known pattern classes. Let X be an N-dimensional sample. Suppose that mi, Ci, Pi (i=1, 2, …, L) are the mean vector, the covariance matrix, and a priori probability of class ωi, respectively. The between-class covariance matrix Sb, the within-class covariance matrix Sw and the population covariance matrix St are determined by the following formulas:Sb=i=1LPi[mi−E(X)][mi−E(X)]T,Sw=i=1LPiE[(X−mi)(X−mi)Tωi]=i=1LPiCi,St=E{[X−E(X)][X−E(X)]T}=Sb+Sw.

A theorem on UODV

In this section, we present a theorem on UODV and give some discussions.

Theorem 1

For L-class problems, suppose that the between-class covariance matrix Sb has rank (L−1) and the within-class covariance matrix Sw is nonsingular. Let the (L−1) nonzero eigenvalues of Sw−1Sb be represented and ordered from the largest to the smallest asλ1⩾λ2⩾⋯⩾λL−1>0and supposeλi≠λj(i≠j).For rL−1, regardless of the direction of eigenvectors, the rth UODV ϕr is the rth eigenvector φr of Sw−1Sb corresponding to the rth

Experiments and analysis

Experiments have been performed to compare UODV with FSODV on Concordia University CENPARMI handwritten numeral database. Four thousand samples are for training, and the other 2000 samples are for testing. Hu et al. [19] had done some preprocessing work and extracted four kinds of features as follows:

XG:256-dimensional Gabor transformation feature [20],
XL:121-dimensional Legendre moment feature [21],
XP:36-dimensional Pseudo-Zernike moment feature [22],
XZ:30-dimensional Zernike moment feature

Conclusions

This paper proposes a theorem on UODV. CODVs are proved to be equivalent to UODV, which can be used to extract (L−1) uncorrelated discriminant features for L-class problems without losing any discriminant information in the meaning of Fisher discriminant criterion. The classifiability criterion function (6) can be said to be equivalent to the Fisher criterion function (8) with the conjugate orthogonality constraints (10).

Experiments on Concordia University CENPARMI handwritten numeral database

Acknowledgements

We wish to thank K. Liu and C. Y. Suen of Concordia University for their support with CENPARMI handwritten numeral database.

About the Author—ZHONG JIN was born in Jiangsu, China, on 4th December 1961. He received the B.S. degree in Mathematics, M.S. degree in Applied Mathematics and the Ph.D. degree in Pattern Recognition and Intelligence System from Nanjing University of Science and Technology (NUST), Nanjing, China in 1982, 1984 and 1999, respectively. He is now an associate professor in the Department of Computer Science at NUST. He is the author of over 10 scientific papers in pattern recognition, image

References (23)

  • J. Duchene et al.

    An optimal transformation for discriminant and principal component analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1988)
  • Cited by (0)

    About the Author—ZHONG JIN was born in Jiangsu, China, on 4th December 1961. He received the B.S. degree in Mathematics, M.S. degree in Applied Mathematics and the Ph.D. degree in Pattern Recognition and Intelligence System from Nanjing University of Science and Technology (NUST), Nanjing, China in 1982, 1984 and 1999, respectively. He is now an associate professor in the Department of Computer Science at NUST. He is the author of over 10 scientific papers in pattern recognition, image processing, and artificial intelligence. His current interests are in the areas of pattern recognition, video image processing, face recognition, content-based image retrieval.

    About the Author—JING-YU YANG received the B.S. degree in Computer Science from Nanjing University of Science and Technology (NUST), Nanjing, China. From 1982 to 1984, he was a visiting scientist at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. He is currently Professor and Chairman in the Department of Computer Science at NUST. He is the author of over 100 scientific papers in computer vision, pattern recognition, robot, and artificial intelligence. His current interests are in the areas of pattern recognition, robot vision, image processing, and artificial intelligence.

    About the Author—ZHEN-MIN TANG received the B.S. degree and M.S. degree in Computer Science from Nanjing University of Science and Technology (NUST), Nanjing, China. He is now a Professor in the Department of Computer Science at NUST. He is the author of over 40 scientific papers in pattern recognition, image processing, and artificial intelligence. His current interests are in the areas of pattern recognition, image processing, artificial intelligence, and expert system.

    About the Author—ZHONG-SHAN HU was born in Jiangsu, China, in 1973. He received the B.S. degree in Applied Mathematics, and the Ph.D. degree in pattern recognition and intelligence system from Nanjing University of Science and Technology (NUST), Nanjing, China in 1995 and 1999, respectively. He is now an assistant professor in the Department of Computer Science at NUST. He is the author of over 10 scientific papers in pattern recognition, image processing, and artificial intelligence. His current interests are in the areas of pattern recognition, image processing, handwritten numeral recognition, and face recognition.

    View full text