Abstract
In this paper we consider the problem of testing whether two samples of contaminated data arise from the same distribution. Is is assumed that the contaminations are additive noises with known, or estimated moments. This situation can also be viewed as two signals observed before and after perturbations. The problem is then to test the equality of both perturbations. The test statistic is based on the polynomials moments of the difference between observations and noises. The test is very simple and allows one to compare two independent as well as two paired contaminated samples. A data driven selection is proposed to choose automatically the number of involved polynomials. We present a simulation study in order to investigate the power of the proposed test within discrete and continuous cases. Real-data examples are presented to illustrate the method.
Similar content being viewed by others
References
Akaike H (1974) A new look at statistical model identification. IEEE Trans Automat Control 19:716–723
Antoch J, Husková M, Janic A, Ledwina T (2008) Data driven rank test for the change point problem. Metrika 1:1–15
Barndorff-Nielsen O (1978) Information and exponential families in statistical theory Wiley series in probability and mathematical statistics. Wiley, Chichester
Carleman T (1926) Les fonctions quasi analytiques. Collection de Monographies sur la Théorie des Fonctions. Gauthier-Villars, Paris
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Chapman Hall, New York
Chervoneva I, Iglewicz B (2005) Orthogonal bases approach for comparing. Nonnormal continuous distributions. Biometrika 92:679–690
Ghattas B, Pommeret D, Reboul L, Yao AF (2011) Data driven smooth test for paired populations. J Stat Plan Inference 141:262–275
Inglot T, Ledwina T (2006) Towards data driven selection of a penalty function for data driven Neyman test. Linear Algebra Its Appl 417:124–133
Janic-Wróblewska A, Ledwina T (2000) Data driven rank test for two-sample problem. Scand J Stat 27: 281–297
Kraus D (2009) Adaptive Neyman’s smooth tests of homogeneity of two samples of survival data. J Stat Plan Inference 139:3559–3569
Kundu D, Gupta RD (2009) Bivariate generalized exponential distribution. J Multivar Anal 100:581–593
Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 2:365–411
Ledwina T (1994) Data-driven version of neymans smooth test of Fit. J Am Stat Assoc 89:1000–1005
Meintanis SG (2007) Test of fit for Marshall–Olkin distributions with applications. J Stat Plan Inference 137:3954–3963
Neyman J (1937) Smooth test for goodness of fit. Skandinavisk Aktuarietidskrift 20:149–199
Pommeret D (2011) Data driven smooth test for contaminated density. J Stati Theory Pract 5:697–714
Rayner JCW, Best DJ (1989) Smooth tests of goodness of fit. Oxford University Press, New York
Rayner JCW, Best DJ (2001) A contingency table approach to nonparametric testing. Chapman and Hall/CRC, Boca Raton, Ela, USA
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Wang XF, Wang B (2011) Deconvolution estimation in measurement error models: the R package decon. J Stat Softw. http://www.jstatsoft.org/
Wylupek G (2010) Data driven k sample tests. Technometrics 52:107–123
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pommeret, D. A two-sample test when data are contaminated. Stat Methods Appl 22, 501–516 (2013). https://doi.org/10.1007/s10260-013-0235-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-013-0235-6