Dimensionality Reduction for Classification

Plastria, Frank; De Bruyne, Steven; Carrizosa, Emilio

doi:10.1007/978-3-540-88192-6_38

Frank Plastria⁶,
Steven De Bruyne⁶ &
Emilio Carrizosa⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2564 Accesses
16 Citations

Abstract

We investigate the effects of dimensionality reduction using different techniques and different dimensions on six two-class data sets with numerical attributes as pre-processing for two classification algorithms. Besides reducing the dimensionality with the use of principal components and linear discriminants, we also introduce four new techniques. After this dimensionality reduction two algorithms are applied. The first algorithm takes advantage of the reduced dimensionality itself while the second one directly exploits the dimensional ranking. We observe that neither a single superior dimensionality reduction technique nor a straightforward way to select the optimal dimension can be identified. On the other hand we show that a good choice of technique and dimension can have a major impact on the classification power, generating classifiers that can rival industry standards. We conclude that dimensionality reduction should not only be used for visualisation or as pre-processing on very high dimensional data, but also as a general pre-processing technique on numerical data to raise the classification power. The difficult choice of both the dimensionality reduction technique and the reduced dimension however, should be directly based on the effects on the classification power.

Partially supported by the research project OZR1372 of the Vrije Universiteit Brussel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

De Bruyne, S., Plastria, F.: 2-class Internal Cross-validation Pruned Eigen Transformation Classification Trees. Optimization Online, http://www.optimization-online.org/DB_HTML/2008/05/1971.html
Fisher, R.: The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7, 179–188 (1936)
Google Scholar
Hotelling, H.: Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24, 417–441 (1933)
Article Google Scholar
Karam, A., Caporossi, G., Hansen, P.: Arbitrary-norm hyperplane separation by variable neighbourhood search. IMA J. Management Math. 2007 18, 173–189 (2007), http://imaman.oxfordjournals.org/cgi/content/abstract/18/2/173?etoc
Article MATH MathSciNet Google Scholar
Mangasarian, O.L.: Arbitrary-Norm Separating Plane. Operations Research Letters 24, 15–23 (1999)
Article MATH MathSciNet Google Scholar
Peres-Neto, P., Jackson, D., Somers, K.: How many principal components? stopping rules for determining the number of non-trivial axes revisited. Computational Statistics & Data Analysis 49, 974–997 (2005)
Article MathSciNet Google Scholar
Plastria, F., De Bruyne, S., Carrizosa, E.: Alternating local search based VNS for linear classification. Optimization Online, http://www.optimization-online.org/DB_HTML/2008/02/1910.html
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Google Scholar

Download references

Author information

Authors and Affiliations

Vrije Universiteit Brussel,
Frank Plastria & Steven De Bruyne
Universidad de Sevilla,
Emilio Carrizosa

Authors

Frank Plastria
View author publications
You can also search for this author in PubMed Google Scholar
Steven De Bruyne
View author publications
You can also search for this author in PubMed Google Scholar
Emilio Carrizosa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Sichuan University, 610065, Chengdu, China
Changjie Tang
Department of Computer Science, The University of Western Ontario, Canada
Charles X. Ling
School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
Faculty of Science & Engineering, York University, 355 Lumbers Building, M3J 1P3, Toronto, Ontario, Canada
Nick J. Cercone
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, 4072, Queensland, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Plastria, F., De Bruyne, S., Carrizosa, E. (2008). Dimensionality Reduction for Classification. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_38

Download citation

DOI: https://doi.org/10.1007/978-3-540-88192-6_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics