Abstract
The multivariate two-sample problem has been extensively investigated, and various methods have been proposed. However, most two-sample tests perform poorly when applied to high-dimensional data, and many of them are not applicable when the dimension of the data exceeds the sample size. We reconsider two previously reported tests (Baringhaus and Franz in Stat Sin 20:1333–1361, 2010; Biswas and Ghosh in J Multivar Anal 123:160–171, 2014), and propose two new criteria. Simulations demonstrate that the power of the proposed test is stable for high-dimensional data and large samples, and the power of our test is equivalent to that of the test by Biswas and Ghosh when the covariance matrices are different. We also investigate the theoretical properties of our test when the dimension tends to infinity and the sample size is fixed, and when the dimension is fixed and the sample size tends to infinity. In these cases, the proposed test is asymptotically distribution-free and consistent.
Similar content being viewed by others
References
Bai Z, Saranadasa H (1996) Effect of high dimension: by an example of a two-sample problem. Stat Sin 6:311–329
Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88:190–206
Baringhaus L, Franz C (2010) Rigid motion invariant two-sample tests. Stat Sin 20:1333–1361
Biswas M, Ghosh AK (2014) A nonparametric two-sample test applicable to high dimensional data. J Multivar Anal 123:160–171
Biswas M, Mukhopadhyay M, Ghosh AK (2014) A distribution-free two-sample run test applicable to high dimensional data. Biometrika 101:913–926
Chen SX, Qin Y-L (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808–835
Choi K, Marden J (1997) An approach to multivariate rank tests in multivariate analysis of variance. J Am Stat Assoc 92:1581–1590
Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald–Wolfowitz and Smirnov two-sample tests. Ann Stat 7:697–717
Ghosh AK, Biswas M (2016) Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes. Test 25:525–547
Greton A, Borgwardt KM, Rasch MJ, Scholkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13:723–773
Hall P, Marron JS, Neeman A (2005) Geometric representation of high dimension, low sample size data. J R Stat Soc B 67:427–444
Hall P, Tajvidi N (2002) Permutation tests for equality of distributions in high dimensional settings. Biometrika 89:359–374
Henze N (1988) A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann Stat 16:772–783
Hettmansperger TP, Möttönen J, Oja H (1998) Affine invariant multivariate rank tests for several samples. Stat Sin 8:785–800
Hettmansperger TP, Oja H (1994) Affine invariant multivariate multi-sample sign tests. J R Stat Soc B 56:235–249
Lee AJ (1990) \(U\)-statistics: theory and practice. Marcel Dekker, New York
Liu Z, Modarres R (2011) A triangle test for equality of distribution functions in high dimensions. J Nonparametr Stat 23:605–615
Liu RY, Singh K (1993) A quality index based on data depth and multivariate rank tests. J Am Stat Assoc 88:252–260
Maa J-F, Pearl DK, Bartoszyński R (1996) Reducing multidimensional two-sample data to one-dimensional inter-point comparisons. Ann Stat 24:1069–1074
Mondal PK, Biswas M, Ghosh AK (2015) On high dimensional two sample tests based on nearest neighbors. J Multivar Anal 141:168–178
Möttönen J, Oja H (1995) Multivariate spatial sign and rank methods. J Nonparametr Stat 5:201–213
Park J, Ayyala DN (2013) A test for the mean vector in large dimension and small samples. J Stat Plan Inference 143:929–943
Puri ML, Sen PK (1971) Nonparametric methods in multivariate analysis. Wiley, New York
Randles RH, Peters D (1990) Multivariate rank tests for the two-sample location problem. Commun Stat Theory Methods 19:4225–4238
Rosenbaum PR (2005) An exact distribution-free test comparing two multivariate distributions based on adjacency. J R Stat Soc B 67:515–530
Rousson V (2002) On distribution-free tests for the multivariate two-sample location–scale model. J Multivar Anal 80:43–57
Schilling MF (1986) Multivariate two-sample tests based on nearest neighbors. J Am Stat Assoc 81:799–806
Srivastava MS (2009) A test for the mean vector with fewer observations than the dimension under non-normality. J Multivar Anal 100:518–532
Srivastava MS, Katayama S, Kano Y (2013) A two sample test in high dimensional data. J Multivar Anal 114:349–358
Wang L, Peng B, Li R (2015) A high-dimensional nonparametric multivariate test for mean vector. J Am Stat Assoc 110:1658–1669
Zech G, Aslan B (2003) A multivariate two-sample test based on the concept of minimum energy. In: Proceedings of the conference on statistical problems in particle physics, astrophysics and cosmology (PHYSTAT 2003), SALC, Menlo Park, Stanford, California, 8–11 September, pp 97–100
Acknowledgements
The author would like to thank the referees sincerely for their careful reading and numerous valuable comments that helped to improve the original manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tsukada, Si. High dimensional two-sample test based on the inter-point distance. Comput Stat 34, 599–615 (2019). https://doi.org/10.1007/s00180-017-0777-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0777-4