Skip to main content
Log in

On some consistent tests of mutual independence among several random vectors of arbitrary dimensions

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Testing for mutual independence among several random vectors is a challenging problem, and in recent years, it has gained significant attention in statistics and machine learning literature. Most of the existing tests of independence deal with only two random vectors, and they do not have straightforward generalizations for testing mutual independence among more than two random vectors of arbitrary dimensions. On the other hand, there are various tests for mutual independence among several random variables, but these univariate tests do not have natural multivariate extensions. In this article, we propose two general recipes, one based on inter-point distances and the other based on linear projections, for multivariate extensions of these univariate tests. Under appropriate regularity conditions, these resulting tests turn out to be consistent whenever we have consistency for the corresponding univariate tests. We carry out extensive numerical studies to compare the empirical performance of these proposed methods with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Bilodeau, M., Nangue, A.G.: Tests of mutual or serial independence of random vectors with applications. J. Mach. Learn. Res. 18, 2518–2557 (2017)

    MathSciNet  MATH  Google Scholar 

  • Biswas, M., Sarkar, S., Ghosh, A.K.: On some exact distribution-free tests of independence between two random vectors of arbitrary dimensions. J. Statist. Plan. Inf. 175, 78–86 (2016)

    Article  MathSciNet  Google Scholar 

  • Breitenberger, E.: Analogues of the normal distribution on the circle and the sphere. Biometrika 50, 81–88 (1963)

    Article  MathSciNet  Google Scholar 

  • Chakraborty, S., Zhang, X.: Distance metrics for measuring joint dependence with application to causal inference. J. Am. Stat. Assoc. 114, 1638–1650 (2019)

    Article  MathSciNet  Google Scholar 

  • Fan, Y., Lafaye de Micheaux, P., Penev, S., Salopek, D.: Multivariate nonparametric test of independence. J. Multivar. Anal. 153, 189–210 (2017)

    Article  MathSciNet  Google Scholar 

  • Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer, Berlin (2006)

    MATH  Google Scholar 

  • Friedman, J.H., Rafsky, L.C.: Graph-theoretic measures of multivariate association and prediction. Ann. Stat. 11, 377–391 (1983)

    Article  MathSciNet  Google Scholar 

  • Fukumizu, K., Gretton, A., Lanckriet, G.R., Schölkopf, B., Sriperumbudur, B.K.: Kernel choice and classifiability for rkhs embeddings of probability distributions. Adv. Neural Inf. Process. Syst. 1750–1758 (2009)

  • Gaißer, S., Ruppert, M., Schmid, F.: A multivariate version of Hoeffding’s phi-square. J. Multivar. Anal. 101, 2571–2586 (2010)

    Article  MathSciNet  Google Scholar 

  • Ghosh, A.K., Chaudhuri, P., Murthy, C.: On visualization and aggregation of nearest neighbor classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1592–1602 (2005)

    Article  Google Scholar 

  • Ghosh, A.K., Chaudhuri, P., Sengupta, D.: Classification using kernel density estimates: multiscale analysis and visualization. Technometrics 48, 120–132 (2006)

    Article  MathSciNet  Google Scholar 

  • Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)

    MathSciNet  MATH  Google Scholar 

  • Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., Smola, A.: A kernel statistical test of independence. Adv. Neural Inf. Process. Syst. 585–592 (2007)

  • Gretton, A., Gyorfi, L.: Consistent nonparametric tests of independence. J. Mach. Learn. Res. 11, 1391–1423 (2010)

    MathSciNet  MATH  Google Scholar 

  • Heller, R., Gorfine, M., Heller, Y.: A class of multivariate distribution-free tests of independence based on graphs. J. Stat. Plan. Inf. 142, 3097–3106 (2012)

    Article  MathSciNet  Google Scholar 

  • Heller, R., Heller, Y.: Multivariate tests of association based on univariate tests. Adv. Neural Inf. Process. Syst. 208–216 (2016)

  • Heller, R., Heller, Y., Gorfine, M.: A consistent multivariate test of association based on ranks of distances. Biometrika 100, 503–510 (2013)

    Article  MathSciNet  Google Scholar 

  • Huang, C., Huo, X.: A statistically and numerically efficient independence test based on random projections and distance covariance. arXiv preprint arXiv:1701.06054 (2017)

  • Jin, Z., Matteson, D.S.: Generalizing distance covariance to measure and test multivariate mutual dependence via complete and incomplete v-statistics. J. Multivar. Anal. 168, 304–322 (2018)

    Article  MathSciNet  Google Scholar 

  • Mardia, K.V., Jupp, P.E.: Directional Statistics. Wiley, New Yok (2009)

    MATH  Google Scholar 

  • McDonald, G.C., Schwing, R.C.: Instabilities of regression estimates relating air pollution to mortality. Technometrics 15, 463–481 (1973)

    Article  Google Scholar 

  • Nelsen, R.B.: Nonparametric measures of multivariate association. Lecture Notes-Monograph Series 223–232 (1996)

  • Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006)

    MATH  Google Scholar 

  • Newton, M.A.: Introducing the discussion paper by Székely and Rizzo. Ann. Appl. Stat. 3, 1233–1235 (2009)

    Article  MathSciNet  Google Scholar 

  • Pfister, N., Bühlmann, P., Schölkopf, B., Peters, J.: Kernel-based tests for joint independence. J. R. Stat. Soc. Ser. B 80, 5–31 (2018)

    Article  MathSciNet  Google Scholar 

  • Póczos, B., Ghahramani, Z., Schneider, J.: Copula-based kernel dependency measures. In: Proceedings of 29th International Conference on Machine Learning, pp. 1635–1642 (2012)

  • Quadrianto, N., Song, L., Smola, A.J.: Kernelized sorting. Adv. Neural Inf. Process. Syst. 1289–1296 (2009)

  • Rawat, R., Sitaram, A.: Injectivity sets for spherical means on \({\mathbb{R}}^n\) and on symmetric spaces. J. Fourier Anal. Appl. 6, 343–348 (2000)

    Article  MathSciNet  Google Scholar 

  • Roy, A., Ghosh, A. K., Goswami, A., Murthy, C. A.: Some new copula based distribution-free tests for independence among several random variables. Sankhya, Series A. To appear (2020). https://doi.org/10.1007/s13171-020-00207-2

  • Sarkar, S., Ghosh, A.K.: Some multivariate tests of independence based on ranks of nearest neighbors. Technometrics 60, 101–111 (2018)

    Article  MathSciNet  Google Scholar 

  • Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.: Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. 41, 2263–2291 (2013)

    Article  MathSciNet  Google Scholar 

  • Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007)

    Article  MathSciNet  Google Scholar 

  • Úbeda-Flores, M.: Multivariate versions of Blomqvist’s beta and Spearman’s footrule. Ann. Inst. Stat. Math. 57, 781–788 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are thankful to all anonymous reviewers for their careful reading of earlier versions of the article and for providing us with several helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angshuman Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 1

For any fixed \(\delta >0\), define \(p_{n,\sigma }(\delta ,F)= \sup _{\mathbf{a}} \Pr (\left| {{\mathbb {T}}}_{n,\sigma }(F^{\mathbf{a}})-{\mathbb T}_{\sigma }(F^{\mathbf{a}})\right| >\delta )\). If m(n) is a polynomial function of n, then \(\sum _{n=1}^\infty m(n)p_{n,\sigma }(\delta ,F)<\infty \).

Proof

Recall that \({{\mathbb {T}}}_{\sigma }(F^{\mathbf{a}})\) and \({\mathbb T}_{n,\sigma }(F^{\mathbf{a}})\) are defined as

$$\begin{aligned} {\mathbb T}_{\sigma }(F^{\mathbf{a}})=\frac{\gamma _{\sigma }(C^{\mathbf{a}},\varPi )}{\gamma _{\sigma }(M,\varPi )} ~\text{ and }~~ {\mathbb T}_{n,\sigma }=\frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M_n,\varPi _n)}, \end{aligned}$$

where \(C^{\mathbf{a}}\), \(M\), \(\varPi \) and \(C^{\mathbf{a}}_n\), \(M_n\), \(\varPi _n\) are as defined in Section 2.2.

Now, observe that

$$\begin{aligned}&\Pr (\left| {{\mathbb {T}}}_{n,\sigma }(F^\mathbf{a})-{\mathbb T}_\sigma (F^\mathbf{a})\right|>\delta ) \\&\quad =\Pr \left( \left| \frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M_n,\varPi _n)}-\frac{\gamma _{\sigma }(C^{\mathbf{a}},\varPi )}{\gamma _{\sigma }(M,\varPi )}\right|>\delta \right) \\&\quad \le \text {Pr}\left( \left| \frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M_n,\varPi _n)}-\frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M,\varPi )}\right|>\delta /2\right) \\&\qquad +\text {Pr}\left( \left| \frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M,\varPi )}-\frac{\gamma _{\sigma }(C^{\mathbf{a}},\varPi )}{\gamma _{\sigma }(M,\varPi )}\right| >\delta /2\right) . \end{aligned}$$

First consider the term

$$\begin{aligned}&\left| \frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M_n,\varPi _n)}-\frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M,\varPi )}\right| \\&\quad =\frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M_n,\varPi _n)\gamma _{\sigma }(M,\varPi )}~\times \left| \gamma _{\sigma }(M_n,\varPi _n)-\gamma _{\sigma }(M,\varPi )\right| . \end{aligned}$$

Note that \({\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}/\big ({\gamma _{\sigma }(M_n,\varPi _n)\gamma _{\sigma }(M,\varPi )}\big )\) is uniformly bounded and \(\left| \gamma _{\sigma }(M_n,\varPi _n)-\gamma _{\sigma }(M,\varPi )\right| \) is a non-random quantity that converges to 0 as n tends to infinity (see Roy et al. 2020, Lemma L2). Therefore, there exists \(n_0\ge 1\) such that for all \(n >n_0\),

$$\begin{aligned}&\left| \frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M_n,\varPi _n)}-\frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M,\varPi )}\right| < \delta /2, ~~\text{ i.e., }\\&\text {Pr}\left( \left| \frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M_n,\varPi _n)}-\frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M,\varPi )}\right| >\delta /2\right) =0. \end{aligned}$$

Note that this \(n_0\) does not depend on \(\mathbf{a}\). Again,

$$\begin{aligned}&\text {Pr}\left( \left| \frac{\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma }(M,\varPi )}-\frac{\gamma _{\sigma }(C^{\mathbf{a}},\varPi )}{\gamma _{\sigma }(M,\varPi )}\right|>\delta /2\right) \\&\quad =\text {Pr}\left( \left| \gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma }(C^{\mathbf{a}},\varPi )\right| >\delta ^{*}\right) , \end{aligned}$$

where \(\delta ^*=\gamma _{\sigma }(M,\varPi )\frac{\delta }{2}\) is a positive constant. So, it is enough to show the finiteness of

$$\begin{aligned} \sum _{n=1}^{\infty } m(n) \sup _{\mathbf{a}}\text {Pr}\Bigl (\left| \gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma }(C^{\mathbf{a}},\varPi )\right| >\delta ^{*}\Bigr ). \end{aligned}$$

Let \(F^{(\mathbf{a},1)},\ldots ,F^{(\mathbf{a},p)}\) be the distribution functions of \(X^{(\mathbf{a},1)},\ldots ,X^{(\mathbf{a},p)}\), respectively. Also define \(C^{\mathbf{a}*}_n\) to be the empirical joint distribution of \(\left( F^{(\mathbf{a},1)}(X^{(\mathbf{a},1)}_1), \ldots ,\right. \)\(\left. F^{(\mathbf{a},p)}(X^{(\mathbf{a},p)}_1)\right) ,\ldots ,\left( F^{(\mathbf{a},1)}(X^{(\mathbf{a},1)}_n),\ldots ,F^{(\mathbf{a},p)}(X^{(\mathbf{a},p)}_n)\right) .\) Then, following Theorem 6 of Roy et al. (2020), we get

$$\begin{aligned}&\text {Pr}\Bigl (\left| \gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma }(C^{\mathbf{a}},\varPi )\right|>\delta ^{*}\Bigr )\\&\quad \le \text {Pr}\left( \left| \gamma _{\sigma }^2(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma }^2(C^{\mathbf{a}}_n,\varPi )\right| ^{\frac{1}{2}}\right. \\&\qquad +\gamma _{\sigma }(C^{\mathbf{a}}_n,C^{\mathbf{a}*}_n)+\gamma _{\sigma }(C^{\mathbf{a}*}_n,C^{\mathbf{a}})>\delta ^{*}\Bigr )\\&\quad \le \text {Pr}\left( \left| \gamma _{\sigma }^2(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma }^2(C^{\mathbf{a}}_n,\varPi )\right| ^{\frac{1}{2}}>\delta ^{*}/3\right) \\&\qquad +\text {Pr}\Bigl (\gamma _{\sigma }(C^{\mathbf{a}}_n,C^{\mathbf{a}*}_n)>\delta ^{*}/3\Bigr )+\text {Pr}\Bigl (\gamma _{\sigma }(C^{\mathbf{a}*}_n,C^{\mathbf{a}})>\delta ^{*}/3\Bigr ). \end{aligned}$$

Now, from Lemma L2 in Roy et al. (2020), we can show that there exists \(n_1\ge 1\) (\(n_1\) does not depend on \(\mathbf{a}\)) such that for all \(n>n_1\), \(\text {Pr}\left( \left| \gamma _{\sigma }^2(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma }^2(C^{\mathbf{a}}_n,\varPi )\right| ^{\frac{1}{2}}> \delta ^{*}/3\right) =0\). Again, from Lemmas L3 and L4 in Roy et al. (2020), we have \(\text {Pr}\Bigl (\gamma _{\sigma }(C^{\mathbf{a}}_n,C^{\mathbf{a}*}_n)>\delta ^{*}/3\Bigr )<2p\exp \left( -\frac{n{\delta ^*}^2}{18pL^2}\right) \) and \(\text {Pr}\left( \gamma _{\sigma }(C^{\mathbf{a}*}_n,C^{\mathbf{a}})>\delta ^{*}/{3}\right) <\exp \left( -\frac{n}{2}\left( \frac{\delta ^{*}}{3}-\frac{2}{\sqrt{n}}\right) ^2\right) \), respectively, where L is a positive constant independent of \(\mathbf{a}\). So, to prove the lemma, it is sufficient to show that

$$\begin{aligned}&\sum _{n=1}^{\infty } m(n) \left[ 2p\exp \left( -\frac{n{\delta ^*}^2}{18pL^2}\right) \right. \\&\qquad \left. +\exp \left( -\frac{n}{2}\left( \frac{\delta ^{*}}{3}-\frac{2}{\sqrt{n}}\right) ^2\right) \right] <\infty . \end{aligned}$$

Clearly, this is true since m(n) is a polynomial function of n. \(\square \)

Lemma 2

Consider a sequence \(\{\sigma (n): n\ge 1\}\), which converges to \(\sigma _0>0\). For any fixed \(\delta >0\), define \({\tilde{p}}_n(\delta ,F)=\sup _{\mathbf{a}} \Pr (\left| {\mathbb T}_{n,\sigma (n)}(F^{\mathbf{a}})-{\mathbb T}_{\sigma _0}(F^{\mathbf{a}})\right| >\delta )\). If m(n) is a polynomial function of n, then \(\sum _{n=1}^\infty m(n){\tilde{p}}_{n}(\delta ,F)<\infty \).

Proof

Note that for any fixed \(\mathbf{a}\),

$$\begin{aligned}&\Pr (\left| {{\mathbb {T}}}_{n,\sigma (n)}(F^{\mathbf{a}})-{\mathbb T}_{\sigma _0}(F^{\mathbf{a}})\right|>\delta )\\&\quad \le \Pr (\left| {{\mathbb {T}}}_{n,\sigma (n)}(F^{\mathbf{a}})-{\mathbb T}_{n,\sigma _0}(F^{\mathbf{a}})\right|>\delta /2)\\&\qquad +\Pr (\left| {{\mathbb {T}}}_{n,\sigma _0}(F^{\mathbf{a}})-{\mathbb T}_{\sigma _0}(F^{\mathbf{a}})\right| >\delta /2). \end{aligned}$$

So, in view of Lemma 1, it is enough to show that \(\sum _{n=1}^{\infty } m(n) \sup _{\mathbf{a}} \Pr (\left| {\mathbb T}_{n,\sigma (n)}(F^{\mathbf{a}})-{\mathbb T}_{n,\sigma _0}(F^{\mathbf{a}})\right| >\delta /2)\) is finite. Now, note that

$$\begin{aligned}&\left| {{\mathbb {T}}}_{n,\sigma (n)}(F^{\mathbf{a}})-{{\mathbb {T}}}_{n,\sigma _0}(F^{\mathbf{a}})\right| \\&\quad =\left| \frac{\gamma _{\sigma (n)}(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma (n)}(M_n,\varPi _n)}-\frac{\gamma _{\sigma _0}(C^{\mathbf{a}}_n,\varPi _n)}{\gamma _{\sigma _0}(M_n,\varPi _n)}\right| \\&\quad \le \gamma _{\sigma (n)}(C^{\mathbf{a}}_n,\varPi _n) \times \left| \frac{1}{\gamma _{\sigma (n)}(M_n,\varPi _n)}-\frac{1}{\gamma _{\sigma _0}(M_n,\varPi _n)}\right| \\&\quad \quad +\frac{|\gamma _{\sigma (n)}(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma _0}(C^{\mathbf{a}}_n,\varPi _n)|}{\gamma _{\sigma _0}(M_n,\varPi _n)}\\&\quad =A_n + B_n, ~\text{(say) }. \end{aligned}$$

Note that while the term \(\gamma _{\sigma (n)}(C^{\mathbf{a}}_n,\varPi _n)\) is uniformly bounded, \(\left| \frac{1}{\gamma _{\sigma (n)}(M_n,\varPi _n)}-\frac{1}{\gamma _{\sigma _0}(M_n,\varPi _n)}\right| \) is a non-random quantity converging to 0 (follows from Lemma L6 in Roy et al. (2020)). Therefore, there exists a natural number \(n_0\) (independent of \(\mathbf{a}\)) such that for all \(n> n_0\), we have \(A_n \le \delta /4\) with probability one.

Now, \(\gamma _{\sigma _0}(M_n,\varPi _n)\) is a non-random quantity converging to \(\gamma _{\sigma _0}(M,\varPi )\). Also, from Lemma L6 in Roy et al. (2020), we get a non-random upper bound for \(|\gamma _{\sigma (n)}(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma _0}(C^{\mathbf{a}}_n,\varPi _n)|\), which converges to 0. Since this upper bound does not depend on \(\mathbf{a}\), we get a natural number \(n_1\) (independent of \(\mathbf{a}\)) such that for all \(n> n_1\), \(B_n \le \delta /4\) with probability one.

Using these two facts, for all \(n>\max \{n_0,n_1\}\), we have \(A_n+B_n < \delta /2\) with probability one, and hence,

$$\begin{aligned} \Pr \Big (\left| {{\mathbb {T}}}_{n,\sigma (n)}(F^{\mathbf{a}})-{\mathbb T}_{n,\sigma _0}(F^{\mathbf{a}})\right| >\delta /2\Big )=0. \end{aligned}$$

This implies the finiteness of \(\sum _{n=1}^{\infty } m(n) \sup _{\mathbf{a}} \Pr (|{\mathbb T}_{n,\sigma (n)}(F^{\mathbf{a}})-{\mathbb T}_{n,\sigma _0}(F^{\mathbf{a}})|>\delta /2)\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roy, A., Sarkar, S., Ghosh, A.K. et al. On some consistent tests of mutual independence among several random vectors of arbitrary dimensions. Stat Comput 30, 1707–1723 (2020). https://doi.org/10.1007/s11222-020-09967-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-020-09967-1

Keywords

Navigation