Skip to main content
Log in

Some hypothesis tests based on random projection

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Two new non-parametric tests are proposed based on continuous one-dimensional random projections. The first one addresses central symmetry and the second addresses independence. These tests are implemented for finite and infinite dimensional (functional) data sets. Both tests are distribution-free and universally consistent. Additionally, different techniques are proposed to improve the power of the tests. Promising results have been obtained by comparing the new tests with existing ones using simulation study. Real data in Banach spaces have been used to develop an application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/datasets/EEG+Database. A detailed description of the data set is available in Zhang et al. (1995). The database contains measurements from 64 electrodes that were placed on the subjects’ scalps and sampled at 256 Hz (3.9-msec epoch) for 1 sec. In this study, 9 triples of nodes were used; 27 of the 64 electrodes are shown in Fig. 4. Each subject was exposed to either a single stimulus (S1) or two stimuli (S1 and S2). The stimuli were pictures of objects chosen from a picture set. Each observation has 256 measures of one second, and the id of the subject, the group and the sample are labeled. The EEG emission values are expressed in micro volts. By removing noise spectral decomposition using the fast Fourier transform of the signals was performed. The association between the different triples is studied as shown in Fig. 4. To simplify the procedure, the remaining signals were not processed.

References

  • Aki S (1993) On nonparametric tests for symmetry in \(R^m\). Ann Inst Stat Math 45:787–800

    Article  MathSciNet  MATH  Google Scholar 

  • Albert P, Ratnasinghe D, Tangrea J, Wacholder S (2001) Limitations of the case-only design for identifying gene-environment interactions. Am J Epidemiol 154:687–693

    Article  Google Scholar 

  • Azzalini A, Valle AD (1996) The multivariate skew-normal distribution. Biometrika 83:715–726

    Article  MathSciNet  MATH  Google Scholar 

  • Blough DK (1989) Multivariate symmetry via projection pursuit. Ann Inst of Stat Math 41:461–475

    Article  MathSciNet  MATH  Google Scholar 

  • Brandwein A, Strawderman W (1991) Generalizations of James–Stein estimators under spherical symmetry. Ann Stat 19:1639–1650

    Article  MathSciNet  MATH  Google Scholar 

  • Cramér H, Wold H (1936) Some theorems on distribution functions. J Lond Math Soc 11:290–294

    Article  MathSciNet  MATH  Google Scholar 

  • Cuesta-Albertos JA, Fraiman R, Ransford T (2006) Random projections and goodness of fit tests in infinite-dimensional spaces. Bull Braz Math Soc 37:477–501

    Article  MathSciNet  MATH  Google Scholar 

  • Cuesta-Albertos JA, Fraiman R, Ransford T (2007) A sharp form of the Cramer–Wold theorem. J Theor Prob 20:201–209

    Article  MathSciNet  MATH  Google Scholar 

  • Cuevas A, Fraiman R (2009) On depth measures and dual statistics. A methodology for dealing with general data. J Multivariate Anal 100:753–766

    Article  MathSciNet  MATH  Google Scholar 

  • Dauwels J, Vialatte F, Cichocki A (2010) Diagnosis of alzheimers disease from EEG signals: where are we standing. Curr Alzheimer Res 7:487–505

    Article  Google Scholar 

  • Dyckerhoff R, Ley C, Paindaveine D (2015) Depth-based runs test for bivariate central symmetry. Ann Inst Stat Math 67:917–941

    Article  MathSciNet  MATH  Google Scholar 

  • Einmahl J, Gan Z (2016) Testing for central symmetry. J Stat Plann Inference 169:27–33

    Article  MathSciNet  MATH  Google Scholar 

  • Fermaninan JD (2005) Goodness-of-fit tests for copulas. J Multivariate Anal 95:119–152

    Article  MathSciNet  Google Scholar 

  • Fermaninan JD, Radulovic D, Wegkamp M (2004) Weak convergence of empirical copula processes. Bernoulli 10:847–860

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823

    Article  MathSciNet  Google Scholar 

  • Friedman JH, Tukey JW (1974) A projection pursuit algorithm for exploratory data analysis. IEEE Trans Comput 23:881–890

    Article  MATH  Google Scholar 

  • Genest C, Quessy JF, Rémillard B (2007) Asymptotic local efficiency of Cramér-Von Mises tests for multivariate independence. Ann Stat 35:166–191

    Article  MATH  Google Scholar 

  • Genest C, Rémillard B (2008) Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models. Ann l’ins Henri Poincaré (B) Probab Stat 44:1096–1127

    Article  MathSciNet  MATH  Google Scholar 

  • Hallin M, Paindaveine D (2002) Optimal tests for multivariate location based on interdirections and pseudo-Mahalanobis ranks. Ann Stat 30:1103–1133

    Article  MathSciNet  MATH  Google Scholar 

  • Heathcote C, Rachev S, Cheng B (1995) Testing multivariate symmetry. J Multivariate Anal 54(1):91–112

    Article  MathSciNet  MATH  Google Scholar 

  • Jones M, Sibson R (1987) What is projection pursuit? J R Stat Soc Ser A 150:1–36

    Article  MathSciNet  MATH  Google Scholar 

  • Ley, C.: Univariate and multivariate symmetry: statistical inference and distributional aspects. Ph.D. thesis, Université libre de Bruxelles (2010)

  • Marden, J.: Multivariate analysis, design of experiments, and survey sampling, chap. 14. Multivariate rank test, pp. 401–432. CRC Press (1999)

  • Mason DM, Schuenemeyer JH (1983) A modified Kolmogorov–Smirnov test sensitive to tail alternatives. Ann Stat 11:933–946

    Article  MathSciNet  MATH  Google Scholar 

  • Neuhaus G, Zhu L (1998) Permutation tests for reflected symmetry. J Multivariate Anal 67(2):129–153

    Article  MathSciNet  MATH  Google Scholar 

  • Padgett WJ, Taylor RL (1973) Laws of large number for normed linear spaces and Certain Fréchet spaces. Springer, Berlin

    Book  MATH  Google Scholar 

  • Puri ML, Sen PK (1971) Nonparametric methods in multivariate analysis. Wiley, New York, p 440

  • Sen PK, Chatterjee SK (1973) On Kolmogorov-Smirnov type test for symmetry. Ann Inst Stat Math 25:288–300

    MathSciNet  Google Scholar 

  • Sen PK, Puri ML (1967) On the theory of rank order tests for location in the multivariate one sample problem. Ann Math Stat 38:1216–1228

    Article  MathSciNet  MATH  Google Scholar 

  • Serfling, R.: Multivariate symmetry and asymmetry. In: Encyclopedia of Statistical Sciences, Second Edition, vol. 8, pp. 5338–5345. J. Wiley & Sons (2006)

  • Shohat JA, Tamarkin JD (1943) The problem of moments. Mathematical Surveys and Monographs

  • Székely GJ, Rizzo ML (2013) The distance correlation t-test of independence in high dimension. J Multivariate Anal 117:193–213

    Article  MathSciNet  MATH  Google Scholar 

  • Székely GJ, Rizzo ML, Bakirov N (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35:2769–2794

    Article  MathSciNet  MATH  Google Scholar 

  • Takács L (1967) Combinatorial methods in the theory of stochastic processes. Wiley, Hoboken

    MATH  Google Scholar 

  • Wilks SS (1935) On the independence of k sets of normally distributed statistical variables. Econometrica 3:309–326

    Article  MATH  Google Scholar 

  • Zhang X, Begleiter H, Porjesz B, Wang W, Litke A (1995) Event related potentials during object recognition tasks. Brain Res Bull 38(6):531–538

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the constructive comments of the referees which help to improve the quality of the paper significantly. This work was partially supported by grant ID2014-48, CSIC, Udelar.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Moreno.

Appendix.

Appendix.

Proof of Theorem 3

Let M and Q be the probability measures induced by the random elements \(\mathbf{X }\) and \(-\mathbf{X }\). Then

$$\begin{aligned} \mathcal {E}(M,Q)= & {} \{ h \in E^* / Q_{h}=M_{h} \} \\= & {} \{ h \in E^* / f(\mathbf{X }) \text {and} -f(\mathbf{X }) \text {have the same distribution} \}\\= & {} \mathcal {E}(\mathbf{X }). \end{aligned}$$

From Theorem 2, because \(\mathcal {E}(M,Q)\) has a positive \(\mu \)-measure, it can be concluded that \(Q=M\), namely, \(\mathbf{X }\) and \(-\mathbf{X }\) have the same distribution, so \(\mathbf{X }\) is a centrally symmetric random element.

Proof of Theorem 4

Given two probability measures M and Q on \(\textit{E} \times \textit{E}\) over the determinant class of the half-spaces \(A=\{ (x,y) \in E \times E / h(x,y) \le t, h \in (\textit{E} \times \textit{E})^* \}\),

$$\begin{aligned} \begin{array}{l} M(A) = \int \mathbf 1 _{ \{ h(x,y) \le t \}} dP_{\mathbf{X }}(x) dP_{\mathbf{Y }}(y), \quad \forall h \in (\textit{E} \times \textit{E})^*, \quad \forall t \in \mathbb {R}, \\ Q(A) = \int \mathbf 1 _{ \{ h(x,y) \le t \}} dP_{(\mathbf{X },\mathbf{Y })}(x,y), \quad \forall h \in (\textit{E} \times \textit{E})^*, \quad \forall t \in \mathbb {R}. \\ \end{array} \end{aligned}$$
(23)

The maximum is used as the norm on \(\textit{E}\times \textit{E}\), namely

$$\begin{aligned} \Vert (x,y) \Vert = \max \{ \Vert x \Vert _{E} ,\Vert y \Vert _{\textit{E}} \}, \end{aligned}$$
(24)

therefore,

$$\begin{aligned} m_M(n)= & {} \int \Vert (x,y) \Vert ^n dP_{\mathbf{X }}(x) dP_{\mathbf{Y }}(y) \\\le & {} \left[ \underbrace{\int \Vert x \Vert ^n dP_{\mathbf{X }}(x)}_{m_{\mathbf{X }}(n)} + \underbrace{\int \Vert y \Vert ^n dP_{\mathbf{Y }}(y)}_{m_{\mathbf{Y }}(n)} \right] \\\le & {} 2 \max \{ m_{\mathbf{X }}(n), m_{\mathbf{Y }}(n)\}. \end{aligned}$$

So,

$$\begin{aligned} m^{-1/n}_M(n)\ge & {} 2^{-1/n} \left[ \max \{ m_{\mathbf{X }}(n), m_{\mathbf{Y }}(n)\} \right] ^{-1/n} \\\ge & {} 2^{-1/n} \left[ \min \{ m^{-1/n}_{\mathbf{X }}(n), m^{-1/n}_{\mathbf{Y }}(n)\} \right] . \end{aligned}$$

Thus, the moments are finite, and Carleman’s condition for the measure M holds.

Let the set \(\mathcal {E}(M,Q) = \{ h \in (E \times E)^* / Q_{h}=M_{h} \}\). Let \(h \in \mathcal {E}(\mathbf{X },\mathbf{Y })\). Now, setting \(f(\mathbf{X })= h(\mathbf{X },0)\) and \(g(\mathbf{Y })=h(0,\mathbf{Y })\), one has that \(h(\mathbf{X },\mathbf{Y })=f(\mathbf{X })+g(\mathbf{Y })\). Then,

$$\begin{aligned} Q_{h} \left( (-\infty ,t] \right)= & {} P \left( f(\mathbf{X })+g(\mathbf{Y }) \le t \right) \\= & {} \int _{E \times E} \mathbf 1 _{ \{ f(x) + g(y) \le t \}} dP_{(\mathbf{X },\mathbf{Y })}(x,y)\\= & {} \int _{\mathbb {R} \times \mathbb {R} } \mathbf 1 _{ \{ u + v \le t \}} dP_{(f(\mathbf{X }),g(\mathbf{Y }))}(u,v) \\= & {} \int _{\mathbb {R} \times \mathbb {R} } \mathbf 1 _{ \{ u + v \le t \}} dP_{f(\mathbf{X })}(u) dP_{g(\mathbf{Y })}(v)\\= & {} M_{h} \left( (-\infty ,t] \right) . \end{aligned}$$

Then, \(h \in \mathcal {E}(M,Q)\), implying that \(\mathcal {E}(M,Q)\) has a positive \(\mu \)-measure. By Theorem 2, it follows that \(M=Q\), and thus \(\mathbf{X }\) and \(\mathbf{Y }\) are independent.\(\square \)

Proof of Theorem 5

Under \(H_0\), since \(\mathbf{X }_1\) is symmetric for any \(h \in \textit{E}^*\), Theorem 1 implies that \(h(\mathbf{X }_1)\in \mathbb {R} \) is also symmetric and therefore it fulfills the condition given in (11) for the null assumption in the one dimensional case. Let \(h(\mathbf{Y }_1) \ge h(\mathbf{Y }_2) \ge \ldots \ge h(\mathbf{Y }_n)\) be the order statistics (sorted from the largest to the smallest) of the absolute values \(\left| {h(\mathbf{X }_1} \right| , \left| {h(\mathbf{X }_2} \right| , \ldots ,\left| {h(\mathbf{X }_n} \right| \). Let \(t^{h}_{n,i}= F_{n}(-h(\mathbf{Y }_{i}))\). Then, \(0 \le t^{h}_{n,1} \le t^{h}_{n,2} \le \ldots \le t^{h}_{n,n} \le F(0)= 1/2\). Because F is continuous, ties not occur a.s. and therefore,

$$\begin{aligned} 0< t^{h}_{n,1}< t^{h}_{n,2}< \ldots< t^{h}_{n,n}< F(0)< 1/2 \quad \text {a.s.} \end{aligned}$$

Via the canonical transformation, one may define \(V^{h}_n(t)= n^{1/2}[G^{h}_n(t)-t] \) with \(0<t<1\), where \(G^{h}_n(t)= \frac{1}{n}\sum _{i=1}^{n} \mathbf 1 _{[-\infty , t )} \left( F^{h}(X_i) \right) \), and define

$$\begin{aligned} \tilde{V}^{h}_n(t)= V^{h}_n(t^{-}) + V^{h}_n(1-t) , \quad 0 \le t \le 1/2. \end{aligned}$$
(25)

Then, \(\tilde{V}^{h}_n(t)\) is a stochastic process defined on (0, 1 / 2) having n jumps of 1 or \(-1\) at the points \(t^{h}_{n,1},t^{h}_{n,2}, \ldots , t^{h}_{n,n}\). Let \(p_{i,j}= P \left( h(\mathbf{Y }_{n-i+1})= \vert h(\mathbf{X }_j \vert \right) \). Then

$$\begin{aligned} {\begin{matrix} P \left( h(\mathbf{Y }_{n-i+1}) \quad \text { matches with a positive value of} \quad h(\mathbf{X }_j) \right) \\ = \sum _{j=1}^{n} p_{ij} P \left( h(\mathbf{X }_j) >0 \Big / Y ^{h}_{n-i+1}= \vert h(\mathbf{X }_j) \vert \right) = 1/2 \sum _{j=1}^{n} p_{i,j}= 1/2. \end{matrix}} \end{aligned}$$
(26)

Let sg(.) stand for the sign function, and \(\vert \cdot \vert \) for the absolute value function. It is well known that the vectors

$$\begin{aligned} \left( sg \left( h(\mathbf{X }_1) \right) , \ldots , sg \left( h(\mathbf{X }_n) \right) \right) \ \text{ and } \left( \left| { h(\mathbf{X }_1)} \right| ,\ldots ,\left| { h(\mathbf{X }_n)} \right| \right) \end{aligned}$$

are independent under the null assumption.

Then, the jumps of \(n ^{1/2} \tilde{V}^{h}_n(t)\) at \(t^{h}_{n,1},t^{h}_{n,2}, \ldots , t^{h}_{n,n}\) are independent. Therefore, under \(H_0\), the distributions of the statistics \(D^h_-(n)\) and \(D^h_+(n)\) follow the distribution of the maximum of a symmetric random walk of n steps from the origin, and \(n D^{h}(n)\) follows the distribution of the maximum of the absolute values of the random walk as can be seen in Takács (1967).\(\square \)

Proof of Theorem 6

Write P for the distribution of X and Q for the distribution of \(-X\). The set \(\mathcal {E}(P,Q)\) has H-measure zero in \(\mathbb {R}^d\), since if the H-measure were positive, then by Theorem 3 X and \(-X\) would have the same distribution, which contradicts the hypothesis.

For each \(h \in \mathcal {E}^{c}(P,Q)\), by Theorem 1, \(h(\mathbf{X })\) is not symmetric, so if we define

$$\begin{aligned} \delta (F^{h})= \sup _{x \ge 0} \vert F^{h}(x)+ F^{h}(-x) -1 \vert , \end{aligned}$$

it holds that

$$\begin{aligned} \delta (F^{h}) = \left\{ \begin{array}{l@{\quad }l} 0 &{} \text {if } F ^{h}\in \mathcal {F}^{h}_0\\ >0 &{} \text {if } F ^{h}\in \mathcal {F}^{h}_1. \end{array} \right. \end{aligned}$$
(27)

There exists, under \(H_1\), \(t_h \in \mathbb {R}\) such that

$$\begin{aligned} P \left( x \in \mathbb {R}^d / \langle x, h \rangle \le t_h \right) \ne Q \left( x \in \mathbb {R}^d / \langle x, h \rangle \le t_h \right) , \end{aligned}$$
(28)

namely,

$$\begin{aligned} F ^{h}(t_h)+ F ^{h}(-t_h) - 1 \ne 0. \end{aligned}$$
(29)

By the Glivenko–Cantelli theorem, \(\sup _{x } \vert F^{h}_n(x)- F^{h}(x) \vert \mathop {\longrightarrow }\limits ^{a.s}0\), which entails

$$\begin{aligned} D^{h}_n&\ge \vert F_n^{h}(t_h)+ F_n^{h}(-t_h) - 1 \vert \\&\ge \vert F ^{h}(t_h)+ F ^{h}(-t_h) - 1 \vert - \vert F^{h}_n(t_h)- F^{h}(t_h) \vert - \vert F^{h}_n(-t_h)- F^{h}(-t_h)\vert \\&\ge \frac{\delta (F^{h})}{2}, \end{aligned}$$

almost surely when \(n \rightarrow +\infty \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fraiman, R., Moreno, L. & Vallejo, S. Some hypothesis tests based on random projection. Comput Stat 32, 1165–1189 (2017). https://doi.org/10.1007/s00180-017-0732-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0732-4

Keywords