Abstract
In this paper we present some new results concerning classification in small sample and high dimensional case. We discuss geometric properties of data structures in high dimensions. It is known that such data form in high dimension an almost regular simplex, even if covariance structure of data is not unity. We restrict our attention to two class discrimination problems. It is assumed that observations from two classes are distributed as multivariate normal with a common covariance matrix. We develop consequences of our findings that in high dimensions N Gaussian random points generate a sample covariance matrix estimate which has similar properties as a covariance matrix of normal distribution obtained by random projection onto subspace of dimensionality N. Namely, eigenvalues of both covariance matrices follow the same distribution. We examine classification results obtained for minimum distance classifiers with dimensionality reduction based on PC analysis of a singular sample covariance matrix and a reduction obtained using normal random projections. Simulation studies are provided which confirm the theoretical analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahn, J., Marron, J.S., Müller, K.M., Chi, Y.-Y.: The High Dimension. Low-Sample Size Geometric Representation Holds Under Mild Conditions. Biometrika 94, 760–766 (2007)
Bickel, P.J., Levina, E.: Some Theory for Fisher’s Linear Discriminant Function, Naive Bayes, and Some ASlternatives when There Are Many More Variables than Observations. Bernoulli 10, 989–1010 (2004)
Donoho, D.L., Tanner, J.: Neighborliness of randomly-projected simplices in high dimensions. Proc. Nat. Acad. Sci. 102, 9452–9457 (2005)
Fan, J., Fan, Y., Wu, Y.: High-dimensional Classification. In: Cai, T.T., Shen, X. (eds.) High-dimensional Data Analysis. Frontiers of Statistics, vol. 2, pp. 3–37. World Scientific, Singapore (2011)
Fisher, R.A.: The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugenics 7, 179–188 (1936)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, San Diego (1990)
Golub, G., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1996)
Hall, P., Marron, J.S., Neeman, A.: Geometric Representation of High-Dimension Low-Sample Size Data. Journal of the Royal Statistical Society, Ser. B 67, 427–444 (2005)
Jung, S., Senb, A., Marron, J.S.: Boundary Behavior in High Dimension, Low Sample Size Asymptotics of PCA. Journal of Multivariate Analysis 109, 190–203 (2012)
Jung, S., Marron, J.S.: PCA Consistency in High Dimension, Low Sample Size Context. Ann. Statist. 37, 4104–4130 (2009)
Kiefer, J., Schwartz, R.: Admissible Bayes Character of T 2 − − R 2 and Other Fully Invariant Tests for Classical Multivariate Normal Problems. Ann. Math. Statist. 36, 747–770 (1965)
Mahalanobis, P.C.: On the Generalised Distance in Statistics. Proceedings of the National Institute of Sciences of India 2, 49–55 (1936)
Marron, J.S., Todd, M.J., Ahn, J.: Distance-Weighted Discrimination. Journal of the American Statistical Association 102, 1267–1271 (2007)
Marzetta, T.L., Tucci, G.H., Simon, S.H.: A Random Matrix-Theoretic Approach to Handling Singular Covariance Estimates. IEEE Transactions on Information Theory 57, 6256–6271 (2011)
Rao, C.R.: Linear Statistical Inference and Its Applications, 2nd edn. Wiley, New York (1973)
Saranadasa, H.: Asymptotic Expansion of the Misclassification Probabilities of D- and A-criteria for Discrimination from the Two High dimensional Populations Using the Theory of Large Dimensional Metrices. J. Multivariate Anal. 46, 154–174 (1993)
Skubalska-Rafajłowicz, E.: Clustering of Data and Nearest Neighbors Search for Pattern Recognition with Dimensionality Reduction Using Random Projections. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part I. LNCS, vol. 6113, pp. 462–470. Springer, Heidelberg (2010)
Skubalska-Rafajowicz, E.: Random Projections and Hotelling’s T 2 Statistics for Change Detection in High–dimensional Data Streams. International Journal of Applied Mathematics and Computer Science 23, 447–461 (2013)
Srivastava, M.S.: Minimum Distance Classification Rules for High Dimensional Data. Journal of Multivariate Analysis 97, 2057–2070 (2006)
Vempala, S.: The Random Projection Method. American Mathematical Society, Providence (2004)
Wald, A.: On the statistical problem arising in the classification of an individual into one of two groups. Ann. Math. Statist. 15, 145–162 (1944)
Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer, New York (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Skubalska-Rafajłowicz, E. (2014). Small Sample Size in High Dimensional Space - Minimum Distance Based Classification. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8467. Springer, Cham. https://doi.org/10.1007/978-3-319-07173-2_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-07173-2_52
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07172-5
Online ISBN: 978-3-319-07173-2
eBook Packages: Computer ScienceComputer Science (R0)