Abstract
Many supervised machine learning tasks can be cast as multi-class classification problems. Support vector machines (SVMs) excel at binary classification problems, but the elegant theory behind large-margin hyperplane cannot be easily extended to their multi-class counterparts. On the other hand, it was shown that the decision hyperplanes for binary classification obtained by SVMs are equivalent to the solutions obtained by Fisher's linear discriminant on the set of support vectors. Discriminant analysis approaches are well known to learn discriminative feature transformations in the statistical pattern recognition literature and can be easily extend to multi-class cases. The use of discriminant analysis, however, has not been fully experimented in the data mining literature. In this paper, we explore the use of discriminant analysis for multi-class classification problems. We evaluate the performance of discriminant analysis on a large collection of benchmark datasets and investigate its usage in text categorization. Our experiments suggest that discriminant analysis provides a fast, efficient yet accurate alternative for general multi-class classification problems.
Similar content being viewed by others
References
Allwein EL, et al (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. JMLR 1:113–141
Bai Z (1992) The CSD, GSVD, their applications and computations. Tech. Rep. IMA Preprint Series 958, Minneapolis, MN
Barber D, Williams CKI (1997) Gaussian processes for bayesian classification via hybrid Monte Carlo. In: Mozer MC, Jordan MI, Petsche T (eds) Advances in neural information processing systems, vol.9. The MIT Press, p 340
Blake C, Merz C (1998) UCI repository of machine learning databasesIrvine, Department of Information and Computer Science, University of California, CA, COLT 2000 [http://www.ics.uci.edu/xmlearn/MLRepository.html]
Boley D, et al (1999) Document categorization and query generation on the world wide web using WebACE. AI Rev 13(5–6):365–391
Bottou L, et al (1994) Comparison of classifier methods: a case study in handwriting digit recognition. In: International Conference on Pattern Recognition, pp 77–87
Breiman L, et al (1993) Classification and regression trees. Chapman and Hall, New York
Chen L, et al (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recogn 33(10):1713–1726
Collobert R, Bengio S (2001) SVMTorch: support vector machines for large-scale regression problems. J Machine Learn Res 1:143–160
Crammer K, Singer Y (2000) On the learnability and design of output codes for multiclass problems. Comput Learn Theory, COLT 2000, pp 35–46
Crammer K, Singer Y (2001) Ultraconservative online algorithm for multiclass problems. In: Proceedings of COLT 2001, pp 99–115
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286
Duda RO, et al (2001) Pattern classification. Wiley, Inc.
Dzeroski S, Zenko B (2002) Stacking with multi-response model trees. In: Proceedings of The third international workshop on multiple classifier systems, MCS, Springer-Verlag, pp 201–211
Fisher R (1936) The use of multiple measurements in taxonomic problems. Annal Eugen (7):179–188
Friedman J (1996) Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic
Gallinari P, et al (1991) On the relations between discriminant analysis and multilayer perceptrons. Neural Networks 4(3):349–360
Ghani R (2001) Combining labeled and unlabeled data for text classification with a large number of categories. In: Proceedings of ICDM-01, pp 597–598
Gibbs MN, MacKay DJC (2000) Variational gaussian process classifiers. IEEE Trans Neural Networks 11(6):1458
Godbole S, et al (2002) Scaling multi-class support vector machine using inter-class confusion. In: Proceedings of KDD-02, pp 513–518
Golub TR, et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–536
Guruswami V, Sahai A (1999) Multiclass learning, boosting, and error-correcting codes. In: Proceedings of the 12th annual conference on Computational learning theory, ACM Press, pp 145–155
Han E-H, et al (1998) WebACE: A Web agent for document categorization and exploration. In: Sycara KP, Wooldridge M, (eds) Proceedings of the 2nd International Conference on Autonomous Agents. ACM Press, New York, pp 408–415
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. In: Jordan MI, Kearns MJ, Solla SA (eds) Advances in neural information processing systems, vol 10, The MIT Press
Hastie T, et al (2001) The elemetns of statistical learning: data mining, inference, prediction. Springer
Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Networks (13):415–425
Huang R, et al (2002) Solving the small size problem of LDA. In: 16th international conference on pattern recognition (ICPR 2002), vol 3
Joachims T (2001) A statistical learning model of text classification with support vector machines. In: Proceedings of the conference on research and development in information retrieval (SIGIR), ACM
Johnson RA, Wichern DW (1988) Applied multivariate statistical analysis. Prentice Hall
Kawatani T (2002) Topic difference factor extraction between two document sets of its application to text categorization.In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, Finland. ACM, pp 137–144
Kreeel UH-G (1999) Pairwise classification and support vector machines. In: Advances in Kernel mathods, MIT Press
Lee Y, et al (2001) Multicategory support vector machines. In: Proceedings of the 33rd symposium on the interface
Loan CV (1976) Generalizing the singular value decomposition. SIAM J Num Anal 13:76–83
Loog M, et al (2001) Multiclass linear dimension reduction by weighted pairwise fisher criteria. IEEE Trans Pattern Anal Machine Intell 23(7):762–766
Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Machine Intell 23(2):228–233
McCallum AK (1996) Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/mccallum/bow
McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley
Mika S, et al (1999) Fisher Discriminant Analysis with Kernels. In: Hu Y-H, Larsen J, Wilson E, Douglas S (eds) Neural networks for signal processing IX, IEEE, pp 41–48
Mitchell TM (1997) Machine learning. The McGraw-Hill Companies, Inc.
Noordewier MO, et al (1991) Training knowledge-based neural networks to recognize genes. In: Lippman RP, Moody JE, Touretzky DS (eds) Advances in neural information processing systems, vol 3. Morgan Kauffmann, Publishers, Inc., pp 530–536
Papadimitriou CH, et al (1998) Latent semantic indexing: a probabilistic analysis. In: Proceedings of the 17th ACM symposium on the principles of database systems, ACM Press, pp 159–168
Park H, et al (2001) Dimension reduction for text data representation based on cluster structure preserving projection. Tech. Rep. 01-013, Department of Computer Science, University of Minnesota
Platt J, et al (2000) Large Margin DAGs for Multiclass Classification. In: Solla S, Leen T, Muller K-R (eds) Advances in neural information processing systems, vol 12, MIT Press
Quinlan J (1993) C4.5: Programs for machine learning, Morgan Kaufmann
Rennie JDM (2001) Improving Multi-class Text Classification with Naive Bayes. Master's thesis, Massachusetts Institute of Technology
Roth D, et al (2000) Learning to Recognize Objects. In: Computer vision and pattern recognition (CVPR), pp 724–731
Roth V (2001) Probabilistic discriminative Kernel classifiers for multi-class problems. Lecture Notes in Computer Sci 2191:246–253
Schapire RE, Singer Y (2000) BoosTexter: A boosting-based system for text categorization. Machine Learn 39(2–3):135–168
Schölkopf B, Smola AJ (2002) Learning with Kernels. MIT Press, Cambridge, MA
SGI (2000) MLC++: Datasets from UCI.
Shashua A (1999) On the equivalence between the support vector machine for classification and sparsified Fisher's linear discriminant. Neural Process Lett 9(2):129–139
Swets DL, Weng J (1996) Using discriminant eigenfeatures for image retrieval. IEEE Trans Pattern Anal Machine Intell 18(8):831–836
TDT2 (1998) Nist Topic detection and tracking corpus. http://www.nist.gove/speech/tests/tdt/tdt98/index.htm
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Weston J, Watkins C (1998) Multi-class support vector machines. Tech. rep., Department of Computer Science, University of London, London
Yang C-H, et al (2000) Efficient routability check algorithms for segmented channel routing. ACM Trans Des Autom Electron Syst 5(3):735–747
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: the 22th Ann Int ACM SIGIR conference on research and development in information retrieval (SIGIR'99), pp 42-49
Yang Y, Pederson JO (1997) A Comparative study on Feature selection in text categorization. In: Proceedings of the fourteenth international conference on machine learning (ICML), pp 412–420
Zadrozny B (2001) Reducing multiclass to binary by coupling probability estimates. In: NIPS, pp 1041–1048
Zhao W, et al (1999) Subspace linear discriminant analysis for face recognition. Tech. Rep. CAR-TR-914., University of Maryland, College Park
Author information
Authors and Affiliations
Corresponding author
Additional information
Tao Li is currently an assistant professor in the School of Computer Science at Florida International University. He received his Ph.D. degree in Computer Science from University of Rochester in 2004. His primary research interests are: data mining, machine learning, bioinformatics, and music information retrieval.
Shenghuo Zhu is currently a researcher in NEC Laboratories America, Inc. He received his B.E. from Zhejiang University in 1994, B.E. from Tsinghua University in 1997, and Ph.D degree in Computer Science from University of Rochester in 2003. His primary research interests include information retrieval, machine learning, and data mining.
Mitsunori Ogihara received a Ph.D. in Information Sciences at Tokyo Institute of Technology in 1993. He is currently Professor and Chair of the Department of Computer Science at the University of Rochester. His primary research interests are data mining, computational complexity, and molecular computation.
Rights and permissions
About this article
Cite this article
Li, T., Zhu, S. & Ogihara, M. Using discriminant analysis for multi-class classification: an experimental investigation. Knowl Inf Syst 10, 453–472 (2006). https://doi.org/10.1007/s10115-006-0013-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-006-0013-y