Skip to main content

Small Sample Size in High Dimensional Space - Minimum Distance Based Classification

  • Conference paper
Artificial Intelligence and Soft Computing (ICAISC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8467))

Included in the following conference series:

Abstract

In this paper we present some new results concerning classification in small sample and high dimensional case. We discuss geometric properties of data structures in high dimensions. It is known that such data form in high dimension an almost regular simplex, even if covariance structure of data is not unity. We restrict our attention to two class discrimination problems. It is assumed that observations from two classes are distributed as multivariate normal with a common covariance matrix. We develop consequences of our findings that in high dimensions N Gaussian random points generate a sample covariance matrix estimate which has similar properties as a covariance matrix of normal distribution obtained by random projection onto subspace of dimensionality N. Namely, eigenvalues of both covariance matrices follow the same distribution. We examine classification results obtained for minimum distance classifiers with dimensionality reduction based on PC analysis of a singular sample covariance matrix and a reduction obtained using normal random projections. Simulation studies are provided which confirm the theoretical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahn, J., Marron, J.S., Müller, K.M., Chi, Y.-Y.: The High Dimension. Low-Sample Size Geometric Representation Holds Under Mild Conditions. Biometrika 94, 760–766 (2007)

    MATH  Google Scholar 

  2. Bickel, P.J., Levina, E.: Some Theory for Fisher’s Linear Discriminant Function, Naive Bayes, and Some ASlternatives when There Are Many More Variables than Observations. Bernoulli 10, 989–1010 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  3. Donoho, D.L., Tanner, J.: Neighborliness of randomly-projected simplices in high dimensions. Proc. Nat. Acad. Sci. 102, 9452–9457 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  4. Fan, J., Fan, Y., Wu, Y.: High-dimensional Classification. In: Cai, T.T., Shen, X. (eds.) High-dimensional Data Analysis. Frontiers of Statistics, vol. 2, pp. 3–37. World Scientific, Singapore (2011)

    Chapter  Google Scholar 

  5. Fisher, R.A.: The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugenics 7, 179–188 (1936)

    Article  Google Scholar 

  6. Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, San Diego (1990)

    MATH  Google Scholar 

  7. Golub, G., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  8. Hall, P., Marron, J.S., Neeman, A.: Geometric Representation of High-Dimension Low-Sample Size Data. Journal of the Royal Statistical Society, Ser. B 67, 427–444 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  9. Jung, S., Senb, A., Marron, J.S.: Boundary Behavior in High Dimension, Low Sample Size Asymptotics of PCA. Journal of Multivariate Analysis 109, 190–203 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  10. Jung, S., Marron, J.S.: PCA Consistency in High Dimension, Low Sample Size Context. Ann. Statist. 37, 4104–4130 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  11. Kiefer, J., Schwartz, R.: Admissible Bayes Character of T 2 − − R 2 and Other Fully Invariant Tests for Classical Multivariate Normal Problems. Ann. Math. Statist. 36, 747–770 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  12. Mahalanobis, P.C.: On the Generalised Distance in Statistics. Proceedings of the National Institute of Sciences of India 2, 49–55 (1936)

    MATH  Google Scholar 

  13. Marron, J.S., Todd, M.J., Ahn, J.: Distance-Weighted Discrimination. Journal of the American Statistical Association 102, 1267–1271 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  14. Marzetta, T.L., Tucci, G.H., Simon, S.H.: A Random Matrix-Theoretic Approach to Handling Singular Covariance Estimates. IEEE Transactions on Information Theory 57, 6256–6271 (2011)

    Article  MathSciNet  Google Scholar 

  15. Rao, C.R.: Linear Statistical Inference and Its Applications, 2nd edn. Wiley, New York (1973)

    Book  MATH  Google Scholar 

  16. Saranadasa, H.: Asymptotic Expansion of the Misclassification Probabilities of D- and A-criteria for Discrimination from the Two High dimensional Populations Using the Theory of Large Dimensional Metrices. J. Multivariate Anal. 46, 154–174 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  17. Skubalska-Rafajłowicz, E.: Clustering of Data and Nearest Neighbors Search for Pattern Recognition with Dimensionality Reduction Using Random Projections. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part I. LNCS, vol. 6113, pp. 462–470. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  18. Skubalska-Rafajowicz, E.: Random Projections and Hotelling’s T 2 Statistics for Change Detection in High–dimensional Data Streams. International Journal of Applied Mathematics and Computer Science 23, 447–461 (2013)

    MathSciNet  Google Scholar 

  19. Srivastava, M.S.: Minimum Distance Classification Rules for High Dimensional Data. Journal of Multivariate Analysis 97, 2057–2070 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  20. Vempala, S.: The Random Projection Method. American Mathematical Society, Providence (2004)

    MATH  Google Scholar 

  21. Wald, A.: On the statistical problem arising in the classification of an individual into one of two groups. Ann. Math. Statist. 15, 145–162 (1944)

    Article  MATH  MathSciNet  Google Scholar 

  22. Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer, New York (2004)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Skubalska-Rafajłowicz, E. (2014). Small Sample Size in High Dimensional Space - Minimum Distance Based Classification. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8467. Springer, Cham. https://doi.org/10.1007/978-3-319-07173-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07173-2_52

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07172-5

  • Online ISBN: 978-3-319-07173-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics