Small Sample Size in High Dimensional Space - Minimum Distance Based Classification

Skubalska-Rafajłowicz, Ewa

doi:10.1007/978-3-319-07173-2_52

Ewa Skubalska-Rafajłowicz²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8467))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

2417 Accesses
1 Citations

Abstract

In this paper we present some new results concerning classification in small sample and high dimensional case. We discuss geometric properties of data structures in high dimensions. It is known that such data form in high dimension an almost regular simplex, even if covariance structure of data is not unity. We restrict our attention to two class discrimination problems. It is assumed that observations from two classes are distributed as multivariate normal with a common covariance matrix. We develop consequences of our findings that in high dimensions N Gaussian random points generate a sample covariance matrix estimate which has similar properties as a covariance matrix of normal distribution obtained by random projection onto subspace of dimensionality N. Namely, eigenvalues of both covariance matrices follow the same distribution. We examine classification results obtained for minimum distance classifiers with dimensionality reduction based on PC analysis of a singular sample covariance matrix and a reduction obtained using normal random projections. Simulation studies are provided which confirm the theoretical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahn, J., Marron, J.S., Müller, K.M., Chi, Y.-Y.: The High Dimension. Low-Sample Size Geometric Representation Holds Under Mild Conditions. Biometrika 94, 760–766 (2007)
MATH Google Scholar
Bickel, P.J., Levina, E.: Some Theory for Fisher’s Linear Discriminant Function, Naive Bayes, and Some ASlternatives when There Are Many More Variables than Observations. Bernoulli 10, 989–1010 (2004)
Article MATH MathSciNet Google Scholar
Donoho, D.L., Tanner, J.: Neighborliness of randomly-projected simplices in high dimensions. Proc. Nat. Acad. Sci. 102, 9452–9457 (2005)
Article MATH MathSciNet Google Scholar
Fan, J., Fan, Y., Wu, Y.: High-dimensional Classification. In: Cai, T.T., Shen, X. (eds.) High-dimensional Data Analysis. Frontiers of Statistics, vol. 2, pp. 3–37. World Scientific, Singapore (2011)
Chapter Google Scholar
Fisher, R.A.: The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugenics 7, 179–188 (1936)
Article Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, San Diego (1990)
MATH Google Scholar
Golub, G., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Hall, P., Marron, J.S., Neeman, A.: Geometric Representation of High-Dimension Low-Sample Size Data. Journal of the Royal Statistical Society, Ser. B 67, 427–444 (2005)
Article MATH MathSciNet Google Scholar
Jung, S., Senb, A., Marron, J.S.: Boundary Behavior in High Dimension, Low Sample Size Asymptotics of PCA. Journal of Multivariate Analysis 109, 190–203 (2012)
Article MATH MathSciNet Google Scholar
Jung, S., Marron, J.S.: PCA Consistency in High Dimension, Low Sample Size Context. Ann. Statist. 37, 4104–4130 (2009)
Article MATH MathSciNet Google Scholar
Kiefer, J., Schwartz, R.: Admissible Bayes Character of T ² − − R ² and Other Fully Invariant Tests for Classical Multivariate Normal Problems. Ann. Math. Statist. 36, 747–770 (1965)
Article MATH MathSciNet Google Scholar
Mahalanobis, P.C.: On the Generalised Distance in Statistics. Proceedings of the National Institute of Sciences of India 2, 49–55 (1936)
MATH Google Scholar
Marron, J.S., Todd, M.J., Ahn, J.: Distance-Weighted Discrimination. Journal of the American Statistical Association 102, 1267–1271 (2007)
Article MATH MathSciNet Google Scholar
Marzetta, T.L., Tucci, G.H., Simon, S.H.: A Random Matrix-Theoretic Approach to Handling Singular Covariance Estimates. IEEE Transactions on Information Theory 57, 6256–6271 (2011)
Article MathSciNet Google Scholar
Rao, C.R.: Linear Statistical Inference and Its Applications, 2nd edn. Wiley, New York (1973)
Book MATH Google Scholar
Saranadasa, H.: Asymptotic Expansion of the Misclassification Probabilities of D- and A-criteria for Discrimination from the Two High dimensional Populations Using the Theory of Large Dimensional Metrices. J. Multivariate Anal. 46, 154–174 (1993)
Article MATH MathSciNet Google Scholar
Skubalska-Rafajłowicz, E.: Clustering of Data and Nearest Neighbors Search for Pattern Recognition with Dimensionality Reduction Using Random Projections. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part I. LNCS, vol. 6113, pp. 462–470. Springer, Heidelberg (2010)
Chapter Google Scholar
Skubalska-Rafajowicz, E.: Random Projections and Hotelling’s T ² Statistics for Change Detection in High–dimensional Data Streams. International Journal of Applied Mathematics and Computer Science 23, 447–461 (2013)
MathSciNet Google Scholar
Srivastava, M.S.: Minimum Distance Classification Rules for High Dimensional Data. Journal of Multivariate Analysis 97, 2057–2070 (2006)
Article MATH MathSciNet Google Scholar
Vempala, S.: The Random Projection Method. American Mathematical Society, Providence (2004)
MATH Google Scholar
Wald, A.: On the statistical problem arising in the classification of an individual into one of two groups. Ann. Math. Statist. 15, 145–162 (1944)
Article MATH MathSciNet Google Scholar
Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer, New York (2004)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wrocław University of Technology, Poland
Ewa Skubalska-Rafajłowicz

Authors

Ewa Skubalska-Rafajłowicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Częstochowa University of Technology, Armii Krajowej 36, 42-200, Częstochowa, Poland
Leszek Rutkowski , Marcin Korytkowski & Rafał Scherer , &
AGH University of Science and Technology, Mickiewicza 30, 30-059, Kraków, Poland
Ryszard Tadeusiewicz
Computer Science Division, Department of Electrical Engineering and Computer Sciences, University of California Berkeley, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Computational Intelligence Laboratory, Electrical and Computer Engineering, University of Louisville, 405 Lutz Hall, 40292, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Skubalska-Rafajłowicz, E. (2014). Small Sample Size in High Dimensional Space - Minimum Distance Based Classification. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8467. Springer, Cham. https://doi.org/10.1007/978-3-319-07173-2_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-07173-2_52
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07172-5
Online ISBN: 978-3-319-07173-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics