Skip to main content
Log in

High dimensional two-sample test based on the inter-point distance

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The multivariate two-sample problem has been extensively investigated, and various methods have been proposed. However, most two-sample tests perform poorly when applied to high-dimensional data, and many of them are not applicable when the dimension of the data exceeds the sample size. We reconsider two previously reported tests (Baringhaus and Franz in Stat Sin 20:1333–1361, 2010; Biswas and Ghosh in J Multivar Anal 123:160–171, 2014), and propose two new criteria. Simulations demonstrate that the power of the proposed test is stable for high-dimensional data and large samples, and the power of our test is equivalent to that of the test by Biswas and Ghosh when the covariance matrices are different. We also investigate the theoretical properties of our test when the dimension tends to infinity and the sample size is fixed, and when the dimension is fixed and the sample size tends to infinity. In these cases, the proposed test is asymptotically distribution-free and consistent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bai Z, Saranadasa H (1996) Effect of high dimension: by an example of a two-sample problem. Stat Sin 6:311–329

    MathSciNet  MATH  Google Scholar 

  • Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88:190–206

    Article  MathSciNet  MATH  Google Scholar 

  • Baringhaus L, Franz C (2010) Rigid motion invariant two-sample tests. Stat Sin 20:1333–1361

    MathSciNet  MATH  Google Scholar 

  • Biswas M, Ghosh AK (2014) A nonparametric two-sample test applicable to high dimensional data. J Multivar Anal 123:160–171

    Article  MathSciNet  MATH  Google Scholar 

  • Biswas M, Mukhopadhyay M, Ghosh AK (2014) A distribution-free two-sample run test applicable to high dimensional data. Biometrika 101:913–926

    Article  MathSciNet  MATH  Google Scholar 

  • Chen SX, Qin Y-L (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808–835

    Article  MathSciNet  MATH  Google Scholar 

  • Choi K, Marden J (1997) An approach to multivariate rank tests in multivariate analysis of variance. J Am Stat Assoc 92:1581–1590

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald–Wolfowitz and Smirnov two-sample tests. Ann Stat 7:697–717

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosh AK, Biswas M (2016) Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes. Test 25:525–547

    Article  MathSciNet  MATH  Google Scholar 

  • Greton A, Borgwardt KM, Rasch MJ, Scholkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13:723–773

    MathSciNet  MATH  Google Scholar 

  • Hall P, Marron JS, Neeman A (2005) Geometric representation of high dimension, low sample size data. J R Stat Soc B 67:427–444

    Article  MathSciNet  MATH  Google Scholar 

  • Hall P, Tajvidi N (2002) Permutation tests for equality of distributions in high dimensional settings. Biometrika 89:359–374

    Article  MathSciNet  MATH  Google Scholar 

  • Henze N (1988) A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann Stat 16:772–783

    Article  MathSciNet  MATH  Google Scholar 

  • Hettmansperger TP, Möttönen J, Oja H (1998) Affine invariant multivariate rank tests for several samples. Stat Sin 8:785–800

    MathSciNet  MATH  Google Scholar 

  • Hettmansperger TP, Oja H (1994) Affine invariant multivariate multi-sample sign tests. J R Stat Soc B 56:235–249

    MATH  Google Scholar 

  • Lee AJ (1990) \(U\)-statistics: theory and practice. Marcel Dekker, New York

    MATH  Google Scholar 

  • Liu Z, Modarres R (2011) A triangle test for equality of distribution functions in high dimensions. J Nonparametr Stat 23:605–615

    Article  MathSciNet  MATH  Google Scholar 

  • Liu RY, Singh K (1993) A quality index based on data depth and multivariate rank tests. J Am Stat Assoc 88:252–260

    MathSciNet  MATH  Google Scholar 

  • Maa J-F, Pearl DK, Bartoszyński R (1996) Reducing multidimensional two-sample data to one-dimensional inter-point comparisons. Ann Stat 24:1069–1074

    Article  MATH  Google Scholar 

  • Mondal PK, Biswas M, Ghosh AK (2015) On high dimensional two sample tests based on nearest neighbors. J Multivar Anal 141:168–178

    Article  MathSciNet  MATH  Google Scholar 

  • Möttönen J, Oja H (1995) Multivariate spatial sign and rank methods. J Nonparametr Stat 5:201–213

    Article  MathSciNet  MATH  Google Scholar 

  • Park J, Ayyala DN (2013) A test for the mean vector in large dimension and small samples. J Stat Plan Inference 143:929–943

    Article  MathSciNet  MATH  Google Scholar 

  • Puri ML, Sen PK (1971) Nonparametric methods in multivariate analysis. Wiley, New York

    MATH  Google Scholar 

  • Randles RH, Peters D (1990) Multivariate rank tests for the two-sample location problem. Commun Stat Theory Methods 19:4225–4238

    Article  MathSciNet  Google Scholar 

  • Rosenbaum PR (2005) An exact distribution-free test comparing two multivariate distributions based on adjacency. J R Stat Soc B 67:515–530

    Article  MathSciNet  MATH  Google Scholar 

  • Rousson V (2002) On distribution-free tests for the multivariate two-sample location–scale model. J Multivar Anal 80:43–57

    Article  MathSciNet  MATH  Google Scholar 

  • Schilling MF (1986) Multivariate two-sample tests based on nearest neighbors. J Am Stat Assoc 81:799–806

    Article  MathSciNet  MATH  Google Scholar 

  • Srivastava MS (2009) A test for the mean vector with fewer observations than the dimension under non-normality. J Multivar Anal 100:518–532

    Article  MathSciNet  MATH  Google Scholar 

  • Srivastava MS, Katayama S, Kano Y (2013) A two sample test in high dimensional data. J Multivar Anal 114:349–358

    Article  MathSciNet  MATH  Google Scholar 

  • Wang L, Peng B, Li R (2015) A high-dimensional nonparametric multivariate test for mean vector. J Am Stat Assoc 110:1658–1669

    Article  MathSciNet  MATH  Google Scholar 

  • Zech G, Aslan B (2003) A multivariate two-sample test based on the concept of minimum energy. In: Proceedings of the conference on statistical problems in particle physics, astrophysics and cosmology (PHYSTAT 2003), SALC, Menlo Park, Stanford, California, 8–11 September, pp 97–100

Download references

Acknowledgements

The author would like to thank the referees sincerely for their careful reading and numerous valuable comments that helped to improve the original manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shin-ichi Tsukada.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsukada, Si. High dimensional two-sample test based on the inter-point distance. Comput Stat 34, 599–615 (2019). https://doi.org/10.1007/s00180-017-0777-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0777-4

Keywords

Navigation