Distributed hypothesis testing for large dimensional two-sample mean vectors

Yan, Lu; Hu, Jiang; Wu, Lixiu

doi:10.1007/s11222-024-10489-3

Distributed hypothesis testing for large dimensional two-sample mean vectors

Original Paper
Published: 23 September 2024

Volume 34, article number 187, (2024)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Lu Yan¹,
Jiang Hu¹ &
Lixiu Wu¹

428 Accesses
Explore all metrics

Abstract

The advent of the big data era has brought massive datasets to the forefront of academic and industrial discussions. Due to the high communication cost and long calculation time, traditional statistical methods may be difficult to process data centrally on a single server. A robust distributed system can effectively mitigate communication costs and enhance computational efficiency. However, the classical two-sample hypothesis testing problem in statistical analysis has not yet been fully developed within a distributed system framework. This paper explores the challenges of performing two-sample mean tests in a distributed framework, especially in the presence of unequal covariance matrices. By distributing samples across various nodes, we introduce two distributed test statistics: the blockwise linear two-sample test and the distributed two-sample test. Even though the sample size of each node is less than the dimension, the proposed test statistics maintain robust statistical properties. Both statistics are designed to enhance communication efficiency and reduce communication costs compared to the full-sample statistic. Simulation experiments and empirical analyses further confirm the favorable statistical properties of the proposed test statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extended Hotelling $T^2$ test in distributed frameworks

Article 30 July 2024

Robust covariance estimation for distributed principal component analysis

Article 22 November 2021

Multi-sample hypothesis testing of high-dimensional mean vectors under covariance heterogeneity

Article 22 March 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

No datasets were generated or analysed during the current study.

References

Afek, Y., Giladi, G., Patt-Shamir, B.: Distributed computing with the cloud. Distrib. Comput. 37(1), 1–18 (2024). https://doi.org/10.1007/s00446-024-00460-w
Article MathSciNet Google Scholar
Bai, Z., Saranadasa, H.: Effect of high dimension: by an example of a two sample problem. Stat. Sin. 6, 311–329 (1996)
MathSciNet Google Scholar
Bayle, P., Fan, J., Lou, Z.: Communication-efficient distributed estimation and inference for Cox’s model (2023). arXiv preprint arXiv:2302.12111
Bolón-Canedo, V., Sechidis, K., Sánchez-Marono, N., Alonso-Betanzos, A., Brown, G.: Exploring the consequences of distributed feature selection in DNA microarray data. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1665–1672. IEEE (2017)
Chen, S.X., Peng, L.: Distributed statistical inference for massive data. Ann. Stat. 49(5), 2851–2869 (2021). https://doi.org/10.1214/21-AOS2062
Article MathSciNet Google Scholar
Chen, S., Qin, Y.: A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat. 38(2), 808–835 (2010). https://doi.org/10.1214/09-AOS716
Article MathSciNet Google Scholar
Fan, J., Guo, Y., Wang, K.: Communication-efficient accurate statistical estimation. J. Am. Stat. Assoc. 118(542), 1000–1010 (2023). https://doi.org/10.1080/01621459.2021.1969238
Article MathSciNet Google Scholar
Gregory, K.B., Carroll, R.J., Baladandayuthapani, V., Lahiri, S.N.: A two-sample test for equality of means in high dimension. J. Am. Stat. Assoc. 110(510), 837–849 (2015). https://doi.org/10.1080/01621459.2014.934826
Article MathSciNet Google Scholar
Guestrin, C., Bodik, P., Thibaux, R., Paskin, M., Madden, S.: Distributed regression: an efficient framework for modeling sensor network data. In: Proceedings of the 3rd International Symposium on Information Processing in Sensor networks(IPSN), pp. 1–10. IEEE (2004)
Hotelling, H.: The generalization of student’s ratio. Ann. Math. Stat. 2(3), 360–378 (1931). https://doi.org/10.1007/978-1-4612-0919-5_4
Article Google Scholar
Hu, J., Bai, Z., Wang, C., Wang, W.: On testing the equality of high dimensional mean vectors with unequal covariance matrices. Ann. Inst. Stat. Math. 69, 365–387 (2017). https://doi.org/10.1007/s10463-015-0543-8
Article MathSciNet Google Scholar
Huang, B., Liu, Y., Peng, L.: Distributed inference for two-sample u-statistics in massive data analysis. Scand. J. Stat. 50(3), 1090–1115 (2023). https://doi.org/10.1111/sjos.12620
Article MathSciNet Google Scholar
Jiang, Y., Wang, X., Wen, C., Jiang, Y., Zhang, H.: Nonparametric two-sample tests of high dimensional mean vectors via random integration. J. Am. Stat. Assoc. 119(545), 701–714 (2024). https://doi.org/10.1080/01621459.2022.2141636
Article MathSciNet Google Scholar
Kong, X., Harrar, S.W.: High-dimensional MANOVA under weak conditions. Statistics 55(2), 321–349 (2021). https://doi.org/10.1080/02331888.2021.1918693
Article MathSciNet Google Scholar
Kumar, N., Sonowal, S.: Email spam detection using machine learning algorithms. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 108–113. IEEE (2020). https://doi.org/10.1109/ICIRCA48905.2020.9183098
Ledoit, O., Wolf, M.: Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Stat. 30(4), 1081–1102 (2002). https://doi.org/10.1214/aos/1031689018
Article MathSciNet Google Scholar
Li, J., Chen, S.: Two sample tests for high-dimensional covariance matrices. Ann. Stat. 40(2), 908–940 (2012). https://doi.org/10.1214/12-AOS993
Article MathSciNet Google Scholar
Lopes, M., Jacob, L., Wainwright, M.J.: A more powerful two-sample test in high dimensions using random projection. Adv. Neural Inf. Process. Syst. 1(2), 1206–1214 (2011)
Google Scholar
Mondal, P.K., Biswas, M., Ghosh, A.K.: On high dimensional two-sample tests based on nearest neighbors. J. Multivar. Anal. 141, 168–178 (2015). https://doi.org/10.1016/j.jmva.2015.07.002
Article MathSciNet Google Scholar
Pan, R., Ren, T., Guo, B., Li, F., Li, G., Wang, H.: A note on distributed quantile regression by pilot sampling and one-step updating. J. Bus. Econ. Stat. 40(4), 1691–1700 (2022). https://doi.org/10.1080/07350015.2021.1961789
Article MathSciNet Google Scholar
Santos, B.D.I., Hortaçsu, A., Wildenbeest, M.R.: Testing models of consumer search using data on web browsing and purchasing behavior. Am. Econ. Rev. 102(6), 2955–2980 (2012). https://doi.org/10.1257/aer.102.6.2955
Article Google Scholar
Scherhag, U., Rathgeb, C., Busch, C.: Performance variation of morphed face image detection algorithms across different datasets. In: 2018 International Workshop on Biometrics and Forensics (IWBF), pp. 1–6. IEEE (2018)
Sharath, R., Nirupam, K., Sowmya, B., Srinivasa, K.: Data analytics to predict the income and economic hierarchy on census data. In: 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pp. 249–254. IEEE (2016)
Szabó, B., Vuursteen, L., Van Zanten, H.: Optimal high-dimensional and nonparametric distributed testing under communication constraints. Ann. Stat. 51(3), 909–934 (2023). https://doi.org/10.1214/23-AOS2269
Article MathSciNet Google Scholar
Thulin, M.: A high-dimensional two-sample test for the mean using random subspaces. Comput. Stat. Data Anal. 74, 26–38 (2014). https://doi.org/10.1016/j.csda.2013.12.003
Article MathSciNet Google Scholar
Wang, F., Zhu, Y., Huang, D., Qi, H., Wang, H.: Distributed one-step upgraded estimation for non-uniformly and non-randomly distributed data. Comput. Stat. Data Anal. 162, 107265 (2021). https://doi.org/10.1016/j.csda.2021.107265
Article MathSciNet Google Scholar
Xiaoyue, X., Shi, J., Song, K.: A distributed multiple sample testing for massive data. J. Appl. Stat. 50(3), 555–573 (2023). https://doi.org/10.1080/02664763.2021.1911967
Xu, G., Lin, L., Wei, P., Pan, W.: An adaptive two-sample test for high-dimensional means. Biometrika 103(3), 609–624 (2016). https://doi.org/10.1093/biomet/asw029
Article MathSciNet Google Scholar
Xue, K., Yao, F.: Distribution and correlation-free two-sample test of high-dimensional means. Ann. Stat. 48(3), 1304–1328 (2020). https://doi.org/10.1214/19-AOS1848
Article MathSciNet Google Scholar
Yu, J., Wang, H., Ai, M., Zhang, H.: Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. J. Am. Stat. Assoc. 117(537), 265–276 (2022). https://doi.org/10.1080/01621459.2020.1773832
Article MathSciNet Google Scholar
Zhang, J., Pan, M.: A high-dimension two-sample test for the mean using cluster subspaces. Comput. Stat. Data Anal. 97, 87–97 (2016). https://doi.org/10.1016/j.csda.2015.12.004
Article MathSciNet Google Scholar
Zhang, X., Liu, J., Zhu, Z.: Learning coefficient heterogeneity over networks: a distributed spanning-tree-based fused-lasso regression. J. Am. Stat. Assoc. 119(545), 485–497 (2024). https://doi.org/10.1080/01621459.2022.2126363
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the Editor and three referees for their constructive comments that have significantly improved the paper. Jiang Hu was partially supported by NSFC Grants No.12292980, No.12292982, No.12171078, No.12326606, National Key R & D Program of China No.2020YFA0714102, and Fundamental Research Funds for the Central Universities, China No.2412023YQ003.

Author information

Authors and Affiliations

KLASMOE and School of Mathematics and Statistics, Northeast Normal University, Renmin Street, Changchun, 130024, Jilin, China
Lu Yan, Jiang Hu & Lixiu Wu

Authors

Lu Yan
View author publications
You can also search for this author inPubMed Google Scholar
Jiang Hu
View author publications
You can also search for this author inPubMed Google Scholar
Lixiu Wu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors discussed the results and contributed to the final manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jiang Hu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Technical proofs

1.1 A.1 Proof of Theorem 1

Proof

On each computing node, we compute the local statistic.

$$\begin{aligned} T_{\textrm{dist1}}^{(k)}&=\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_{i}^{\prime } \varvec{X}_{j}}{n_k\left( n_k-1\right) }+\dfrac{\sum _{i,j\in \mathcal {S}_y^k,i \ne j} \varvec{Y}_{i}^{\prime } \varvec{Y}_{ j}}{m_{k}\left( m_{k}-1\right) }\nonumber \\ &\quad -2\dfrac{\sum _{i\in \mathcal {S}_x^k} \sum _{j\in \mathcal {S}_y^k} \varvec{X}_{i}^{\prime } \varvec{Y}_{j}}{n_k m_{k}} \end{aligned}$$

(A1)

$$\begin{aligned}&=\left( \varvec{\bar{X}}^{(k)}-\varvec{\bar{Y}}^{(k)}\right) ^{\prime }\left( \varvec{\bar{X}}^{(k)}-\varvec{\bar{Y}}^{(k)}\right) \nonumber \\ &\quad -n_k^{-1} {\text {tr}}\varvec{ S}_{x}^{(k)}-m_k^{-1} {\text {tr}} \varvec{S}_{y}^{(k)}. \end{aligned}$$

(A2)

Let’s prove why the above equation holds.

$$\begin{aligned} \dot{\varvec{I}}&=:\left( \varvec{\bar{X}}^{(k)}-\varvec{\bar{Y}}^{(k)}\right) ^{\prime }\left( \varvec{\bar{X}}^{(k)}-\varvec{\bar{Y}}^{(k)}\right) \\&=\varvec{\bar{X}}^{(k)\prime }\varvec{\bar{X}}^{(k)}-2\varvec{\bar{X}}^{(k)\prime }\varvec{\bar{Y}}^{(k)}+\varvec{\bar{Y}}^{(k)\prime }\varvec{\bar{Y}}^{(k)}\\&=\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_{i}^{\prime } \varvec{X}_{j}}{n_k^2}+\dfrac{\sum _{i\in \mathcal {S}_x^k}\varvec{ X}_{i}^{\prime } \varvec{X}_{i}}{n_k^2}\\ &\qquad -2\dfrac{\sum _{i\in \mathcal {S}_x^k} \sum _{j\in \mathcal {S}_y^k} \varvec{X}_{i}^{\prime } \varvec{Y}_{j}}{n_k m_{k}}\\&\qquad +\dfrac{\sum _{i,j\in \mathcal {S}_y^k,i \ne j} \varvec{Y}_{i}^{\prime } \varvec{Y}_{ j}}{m_{k}^2}+\dfrac{\sum _{j\in \mathcal {S}_y^k} \varvec{Y}_{j}^{\prime } \varvec{Y}_{ j}}{m_{k}^2}, \\ {\text {tr}}\varvec{ S}_{x}^{(k)}&=\dfrac{1}{n_k-1}{\text {tr}}\sum _{i\in \mathcal {S}_x^k}\left( \varvec{X}_i-\bar{\varvec{X}}^{(k)}\right) \left( \varvec{ X}_i-\bar{\varvec{X}}^{(k)}\right) ^{\prime }\\&=\dfrac{1}{n_k-1}\sum _{i\in \mathcal {S}_x^k}\left( \varvec{X}_i-\bar{\varvec{X}}^{(k)}\right) ^{\prime }\left( \varvec{ X}_i-\bar{\varvec{X}}^{(k)}\right) \\&=\dfrac{1}{n_k-1}\sum _{i\in \mathcal {S}_x^k}\varvec{X}_i^\prime \varvec{X}_i-\frac{n_k}{n_k-1}\bar{\varvec{X}}^{(k)\prime }\bar{\varvec{X}}^{(k)}\\&=\dfrac{1}{n_k-1}\sum _{i\in \mathcal {S}_x^k}\varvec{X}_i^\prime \varvec{X}_i-\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_{i}^{\prime } \varvec{X}_{j}}{n_k(n_k-1)}\\ &\quad -\dfrac{\sum _{i\in \mathcal {S}_x^k}\varvec{ X}_{i}^{\prime } \varvec{X}_{i}}{n_k( n_k-1)}\\&=\dfrac{1}{n_k}\sum _{i\in \mathcal {S}_x^k}\varvec{X}_i^\prime \varvec{X}_i-\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_{i}^{\prime } \varvec{X}_{j}}{n_k(n_k-1)}. \end{aligned}$$

Similarly,

$$\begin{aligned} {\text {tr}}\varvec{S}_{y}^{(k)}=\dfrac{1}{m_k}\sum _{j\in \mathcal {S}_y^k}\varvec{Y}_j^\prime \varvec{Y}_j-\dfrac{\sum _{i,j\in \mathcal {S}_y^k,i\ne j}\varvec{ Y}_{i}^{\prime } \varvec{Y}_{j}}{m_k(m_k-1)}. \end{aligned}$$

Bringing $\dot{\varvec{I}}$, ${\text {tr}}\varvec{ S}_{x}^{(k)}$, ${\text {tr}}\varvec{S}_{y}^{(k)}$ into Eq. A2, then Eq. A2 = A1. By Chen and Qin (2010), under $H_1$ and the Assumptions 1–4,

$$\begin{aligned} {\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) =\sigma _\textrm{dist1}^{(k)2}\left\{ 1+o(1)\right\} \end{aligned}$$

where under Assumption 2,

$$\begin{aligned} \sigma _\textrm{dist1}^{(k)2}&=\dfrac{2}{n_k\left( n_k-1\right) } {\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) +\dfrac{2}{m_k\left( m_k-1\right) } {\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) \nonumber \\ &\quad +\dfrac{4}{n_k m_k} {\text {tr}}\left( \varvec{\Sigma }_{x} \varvec{\Sigma }_{y}\right) , \end{aligned}$$

(A3)

and the o(1) term disappears under $H_0$.

$$\begin{aligned} T_{\textrm{dist1}}^*=\sum _{k=1}^{K}\omega _{k}^*T_{\textrm{dist1}}^{(k)}. \end{aligned}$$

On each node, there are

$$\begin{aligned} & \dfrac{T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1), \ \text {as}\ p\rightarrow \infty ,\ \\ & \quad M_k\rightarrow \infty , \forall k\in \left\{ 1,\dots ,K\right\} . \end{aligned}$$

We need to get:

$$\begin{aligned} \omega _{k}^*= & \arg \min _{\omega _{k}}\mathbb {E}\left( \sum _{k=1}^{K}\omega _{k}T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2, \quad \\ & \quad {k}=1,\dots ,K. \end{aligned}$$

Using the method of Lagrange multipliers, under constraint $\sum _{{k}=1}^{K}\omega _{k}=1$, there are

$$\begin{aligned} L_n\left( \omega _1,\dots ,\omega _K;\lambda \right)&=\mathbb {E}\left( \sum _{k=1}^{K}\omega _{k}T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2\\ &\quad +2\lambda \left( \sum _{{k}=1}^{K}\omega _{k}-1\right) \\&=\sum _{{k}=1}^{K}\omega _{k}^2\mathbb {E}\left( T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2\\ &\quad +2\lambda \left( \sum _{{k}=1}^{K}\omega _{k}-1\right) . \end{aligned}$$

The function $L_n\left( \omega _1,\dots ,\omega _K;\lambda \right) $ takes the partial derivatives for $\omega _{k}$, $k=1,\dots ,K$, and $\lambda $, respectively:

$$\begin{aligned} {\left\{ \begin{array}{ll} \dfrac{\partial L_n\left( \omega _1,\dots ,\omega _K;\lambda \right) }{\partial \omega _1}& =2\omega _1\mathbb {E}\left( T_{\textrm{dist1}}^{(1)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2\\ & \quad +2\lambda =0,\\ & \vdots \\ \dfrac{\partial L_n\left( \omega _1,\dots ,\omega _K;\lambda \right) }{\partial \omega _K}& =2\omega _K\mathbb {E}\left( T_{\textrm{dist1}}^{(K)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2\\ & \quad +2\lambda =0,\\ \dfrac{\partial L_n\left( \omega _1,\dots ,\omega _K;\lambda \right) }{\partial \lambda }& =\sum _{{k}=1}^{K}\omega _{k}-1=0. \end{array}\right. } \end{aligned}$$

Under $H_0$, then

$$\begin{aligned} \begin{aligned} \omega _{k}^*=&\dfrac{1}{\sigma _\text {dist1}^{(k)2}}/\left( \sum _{i=1}^{K}\dfrac{1}{\sigma _\text {dist1}^{(i)2}}\right) ,\quad k=1,\dots ,K, \quad \\ \lambda =&\dfrac{1}{\sum _{k=1}^{K}1/\sigma _\text {dist1}^{(k)2}}. \end{aligned} \end{aligned}$$

$\square $

1.2 A.2 Proof of Theorem 2

Proof

Because it contains unknown variables, we estimate it:

$$\begin{aligned} \begin{aligned} \hat{\omega }_k=\dfrac{1}{\hat{\sigma }_\text {dist1}^{(k)2}}/\left( \sum _{i=1}^{K}\dfrac{1}{\hat{\sigma }_\text {dist1}^{(i)2}}\right) ,\quad k=1,\dots ,K. \end{aligned} \end{aligned}$$

By Lemma 2 and Continuous Mapping Theorem:

$$\begin{aligned} \hat{\omega }_k{\mathop {\rightarrow }\limits ^{p}}\omega _k,\quad k=1,\dots ,K, \ \text {as}\ p\rightarrow \infty \ \text {and} \ M_k\rightarrow \infty . \end{aligned}$$

$\square $

1.3 A.3 Proof of Theorem 3

Proof

It’s on every node:

$$\begin{aligned} \dfrac{T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1), \end{aligned}$$

Nodes exist independently of each other. Then by Theorem 2, as $ p\rightarrow \infty $ and $M_k\rightarrow \infty $,

$$\begin{aligned} T_{\textrm{dist1}}&=\sum _{k=1}^{K}\hat{\omega }_{k}T_{\textrm{dist1}}^{(k)}\\&=\sum _{k=1}^{K}\left( \hat{\omega }_{k}-\omega _{k}^*\right) T_{\textrm{dist1}}^{(k)}+\sum _{k=1}^{K}\omega _{k}^*T_{\textrm{dist1}}^{(k)}\\&{\mathop {\rightarrow }\limits ^{d}}\sum _{k=1}^{K}\omega _{k}^*T_{\textrm{dist1}}^{(k)}.\\ \end{aligned}$$

Because $\sum _{k=1}^{K}\omega _{k}^*=1$, then as $ p\rightarrow \infty $ and $M_k\rightarrow \infty $, $k\in \left\{ 1,\dots ,K\right\} $

$$\begin{aligned} T_{\textrm{dist1}}{\mathop {\rightarrow }\limits ^{d}}\mathcal {N}\left( \Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2},\sum _{k=1}^{K}\omega _{k}^{*2}{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \right) , \end{aligned}$$

i.e.

$$\begin{aligned} \dfrac{T_{\textrm{dist1}}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{dist1}}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1), \end{aligned}$$

where

$$\begin{aligned} {\text {Var}}\left( T_{\textrm{dist1}}\right) =&\sum _{k=1}^{K}\omega _{k}^{*2}{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \\ =&\sum _{k=1}^{K}\dfrac{\left( 1/{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) ^2\right) }{\left( \sum _{k=1}^{K}1/{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \right) ^2}{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \\ =&\dfrac{1}{\left( \sum _{k=1}^{K}1/{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \right) }\\ \end{aligned}$$

Under $H_1$ and the Assumptions 1–4, ${\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) =\sigma _\textrm{dist1}^{(k)2}\left\{ 1+o(1)\right\} ,$ and the o(1) term disappears under $H_0$. $\square $

1.4 A.4 Proof of Theorem 4

Proof

Calculate $\mathbb {E}\left( T_{\textrm{dist2}}\right) $ and ${\text {Var}}\left( T_{\textrm{dist2}}\right) $.

$$\begin{aligned} T_{\textrm{dist2}}= & \left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) ^{\prime }\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) -n^{-2} \sum _{k=1}^{K}n_k{\text {tr}}\varvec{ S}_{x}^{(k)}\\ & -m^{-2} \sum _{\ell =1}^{L}m_\ell {\text {tr}}\varvec{S}_{y}^{(\ell )}, \end{aligned}$$

where

$$\begin{aligned} & {\text {tr}}\varvec{S}_x^{(k)}=\dfrac{1}{n_k}\sum _{i\in \mathcal {S}_x^k}\varvec{X}_i^{\prime }\varvec{X}_i-\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{X}_{i}^{\prime } \varvec{X}_{j}}{n_k\left( n_k-1\right) }, \\ & {\text {tr}}\varvec{S}_y^{(\ell )}=\dfrac{1}{m_\ell }\sum _{i\in \mathcal {S}_y^\ell }\varvec{Y}_i^{\prime }\varvec{Y}_i-\dfrac{\sum _{i,j\in \mathcal {S}_y^\ell ,i \ne j}\varvec{Y}_{i}^{\prime } \varvec{Y}_{j}}{m_\ell \left( m_\ell -1\right) }. \end{aligned}$$

Here

$$\begin{aligned}&\mathbb {E}\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) ^{\prime }\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) =\dfrac{1}{n}{\text {tr}}\varvec{\Sigma }_x\\ &\quad +\dfrac{1}{m}{\text {tr}}\varvec{\Sigma }_y+\varvec{\mu }_x^{\prime }\varvec{\mu }_x+\varvec{\mu }_y^{\prime }\varvec{\mu }_y-2\varvec{\mu }_x^{\prime }\varvec{\mu }_y, \\&\mathbb {E}{\text {tr}}\varvec{S}_x^{(k)}={\text {tr}}\varvec{\Sigma }_x,\quad \mathbb {E}{\text {tr}}\varvec{S}_y^{(\ell )}={\text {tr}}\varvec{\Sigma }_y. \end{aligned}$$

Thus

$$\begin{aligned} \mathbb {E}\left( T_{\textrm{dist2}}\right) =\varvec{\mu }_x^{\prime }\varvec{\mu }_x+\varvec{\mu }_y^{\prime }\varvec{\mu }_y-2\varvec{\mu }_x^{\prime }\varvec{\mu }_y=\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2. \end{aligned}$$

$$\begin{aligned} T_{\textrm{dist2}}&=\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) ^{\prime }\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) -n^{-2} \sum _{k=1}^{K}n_k{\text {tr}}\varvec{ S}_{x}^{(k)}\\ &\quad -m^{-2} \sum _{\ell =1}^{L}m_\ell {\text {tr}}\varvec{S}_{y}^{(\ell )}\\&=\dfrac{1}{n^2}\sum _{i,j\in \mathcal {S}_x,i\ne j}\varvec{ X}_i^{\prime }\varvec{ X}_j+\dfrac{1}{n^2}\sum _{k=1}^{K}\sum _{i,j\in \mathcal {S}_x^k,i \ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_k-1}\\ &\quad -2\dfrac{\sum _{i\in \mathcal {S}_x}\sum _{j\in \mathcal {S}_y}\varvec{X}_i^{\prime }\varvec{Y}_j}{nm}\\&\quad +\dfrac{1}{m^2}\sum _{i,j\in \mathcal {S}_y,i\ne j}\varvec{ Y}_i^{\prime }\varvec{ Y}_j+\dfrac{1}{m^2}\sum _{\ell =1}^{L}\sum _{i,j\in \mathcal {S}_y^\ell ,i \ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_\ell -1}.\\ \end{aligned}$$

Let

$$\begin{aligned} P_1= & \dfrac{1}{n^2}\sum _{i,j\in \mathcal {S}_x,i\ne j}\varvec{ X}_i^{\prime }\varvec{ X}_j,\quad \\ P_2= & \dfrac{1}{n^2}\sum _{k=1}^{K}\sum _{i,j\in \mathcal {S}_x^k,i \ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_k-1}, \\ P_3= & -2\dfrac{\sum _{i\in \mathcal {S}_x}\sum _{j\in \mathcal {S}_y}\varvec{X}_i^{\prime }\varvec{Y}_j}{nm}, \\ P_4= & \dfrac{1}{m^2}\sum _{i,j\in \mathcal {S}_y,i\ne j}\varvec{ Y}_i^{\prime }\varvec{ Y}_j, \quad \\ P_5= & \dfrac{1}{m^2}\sum _{\ell =1}^{L}\sum _{i,j\in \mathcal {S}_y^\ell ,i \ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_\ell -1}. \end{aligned}$$

Then

$$\begin{aligned} {\text {Var}}\left( P_1\right) & =\dfrac{2\left( n-1\right) }{n^3}{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{4\left( n-1\right) ^2\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x}{n^3}, \\ {\text {Var}}\left( P_2\right) & =\dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{4\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x}{n^3}, \\ {\text {Var}}\left( P_3\right) & =\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) +\dfrac{4\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_x}{m}+\dfrac{4\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_y}{n}, \\ {\text {Var}}\left( P_4\right) & =\dfrac{2\left( m-1\right) }{m^3}{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) +\dfrac{4\left( m-1\right) ^2\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y}{m^3}, \\ {\text {Var}}\left( P_5\right) & =\dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) +\dfrac{4\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y}{m^3}, \end{aligned}$$

Since samples $\varvec{\mathcal {X}}_{n}$ and $\varvec{\mathcal {Y}}_{m}$ are independent, the ${\text {Cov}}\left( P_1,P_4\right) =0$, ${\text {Cov}}\left( P_1,P_5\right) =0$, ${\text {Cov}}\left( P_2,P_4\right) =0$ and ${\text {Cov}}\left( P_2,P_5\right) =0$. And the samples are independent between different nodes, we have the following covariance results.

$$\begin{aligned}&{\text {Cov}}\left( P_1,P_2\right) \\ &\quad =\dfrac{1}{n^4}\sum _{k=1}^{K}{\text {Cov}}\left( \sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_i^{\prime }\varvec{ X}_j,\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_k-1}\right) \\&\qquad +\dfrac{1}{n^4}\sum _{k_1\ne k_2}^{K}{\text {Cov}}\left( \sum _{i,j\in \mathcal {S}_x^{k_1},i\ne j}\varvec{ X}_i^{\prime }\varvec{ X}_j,\sum _{i,j\in \mathcal {S}_x^{k_2},i\ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_{k_2}-1}\right) \\&\qquad +\dfrac{1}{n^4}{\text {Cov}}\left( \sum _{k_1\ne k_2}^{K}\sum _{i\in \mathcal {S}_x^{k_1}}\sum _{j\in \mathcal {S}_x^{k_2}}\varvec{X}_i^{\prime }\varvec{X}_j, \sum _{k=1}^{K}\sum _{i,j\in \mathcal {S}_x^{k},i\ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_k-1}\right) \\&\quad =\dfrac{2}{n^3}{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\sum _{k=1}^{K}4n_k\left( n_k-1\right) \varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x\\ &\qquad +\sum _{k_1\ne k_2}^{K}4n_{k_1}n_{k_2}\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x\\&\quad =\dfrac{2}{n^3}{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{4\left( n-1\right) }{n^3}\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x, \end{aligned}$$

$$\begin{aligned} {\text {Cov}}\left( P_1,P_3\right) & =-\dfrac{4\left( n-1\right) \varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_y}{n^2}, \\ {\text {Cov}}\left( P_2,P_3\right) & =-\dfrac{4\varvec{\mu }_x^{\prime }\varvec{\Sigma }_x\varvec{\mu }_y}{n^2}, \end{aligned}$$

$$\begin{aligned}&{\text {Cov}}\left( P_4,P_5\right) \\ &\quad =\dfrac{1}{m^4}\sum _{\ell =1}^{L}{\text {Cov}}\left( \sum _{i,j\in \mathcal {S}_y^\ell ,i\ne j}\varvec{ Y}_i^{\prime }\varvec{ Y}_j,\sum _{i,j\in \mathcal {S}_y^\ell ,i\ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_\ell -1}\right) \\&\qquad +\dfrac{1}{m^4}\sum _{\ell _1\ne \ell _2}^{L}{\text {Cov}}\left( \sum _{i,j\in \mathcal {S}_y^{\ell _1},i\ne j}\varvec{ Y}_i^{\prime }\varvec{ Y}_j,\sum _{i,j\in \mathcal {S}_y^{\ell _2},i\ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_{\ell _2}-1}\right) \\&\qquad +\dfrac{1}{m^4}{\text {Cov}}\left( \sum _{\ell _1\ne \ell _2}^{L}\sum _{i\in \mathcal {S}_y^{\ell _1}}\sum _{j\in \mathcal {S}_y^{\ell _2}}\varvec{Y}_i^{\prime }\varvec{Y}_j, \sum _{\ell =1}^{L}\sum _{i,j\in \mathcal {S}_y^{\ell },i\ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_\ell -1}\right) \\&\quad =\dfrac{2}{m^3}{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) +\sum _{\ell =1}^{L}4m_\ell \left( m_\ell -1\right) \varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y\\ &\qquad +\sum _{\ell _1\ne \ell _2}^{L}4m_{\ell _1}m_{\ell _2}\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y\\&\quad =\dfrac{2}{m^3}{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) +\dfrac{4\left( m-1\right) }{m^3}\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y, \end{aligned}$$

$$\begin{aligned} {\text {Cov}}\left( P_4,P_3\right) & =-\dfrac{4\left( m-1\right) \varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_x}{m^2},\\ {\text {Cov}}\left( P_5,P_3\right) & =-\dfrac{4\varvec{\mu }_y^{\prime }\varvec{\Sigma }_y\varvec{\mu }_x}{m^2}, \end{aligned}$$

In summary,

$$\begin{aligned}&{\text {Var}}\left( T_{\textrm{dist2}}\right) \\ &\quad =\left( \dfrac{2\left( n-1\right) }{n^3}+\dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}+\dfrac{4}{n^3}\right) {\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) \\ &\qquad +\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) \\&\qquad +\left( \dfrac{2\left( m-1\right) }{m^3}+\dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}+\dfrac{4}{m^3}\right) {\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) \\&\qquad +\left( \dfrac{4\left( n-1\right) ^2}{n^3}+\dfrac{4}{n^3}+\dfrac{8\left( n-1\right) }{n^3}\right) \varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x\\&\qquad +\dfrac{4\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_x}{m}+\dfrac{4\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_y}{n}-\dfrac{8}{n}\varvec{\mu }_x^{\prime }\varvec{\Sigma }_x\varvec{\mu }_y-\dfrac{8}{m}\varvec{\mu }_y^{\prime }\varvec{\Sigma }_y\varvec{\mu }_x\\&\qquad +\left( \dfrac{4\left( m-1\right) ^2}{m^3}+\dfrac{4}{m^3}+\dfrac{8\left( m-1\right) }{m^3}\right) \varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y\\&\quad =\dfrac{2}{n\left( n-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{2}{m\left( m-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) \\ &\qquad +\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) \\&\qquad +\dfrac{4}{n}\left( \varvec{\mu }_x-\varvec{\mu }_y\right) ^{\prime }\varvec{\Sigma }_x\left( \varvec{\mu }_x-\varvec{\mu }_y\right) \\ &\qquad +\dfrac{4}{m}\left( \varvec{\mu }_x-\varvec{\mu }_y\right) ^{\prime }\varvec{\Sigma }_y\left( \varvec{\mu }_x-\varvec{\mu }_y\right) \\&\qquad +\left( \dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}-\dfrac{2}{n^3\left( n-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) \\&\qquad +\left( \dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}-\dfrac{2}{m^3\left( m-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) .\\ \end{aligned}$$

Thus, under $H_0$,

$$\begin{aligned} {\text {Var}}\left( T_{\textrm{dist2}}\right)&=\dfrac{2}{n\left( n-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{2}{m\left( m-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) \\ &\quad +\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) \\&\quad +\left( \dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}-\dfrac{2}{n^3\left( n-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) \\&\quad +\left( \dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}-\dfrac{2}{m^3\left( m-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) .\\&=\dfrac{2}{n\left( n-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) (1+o(\dfrac{1}{n^2}))\\ &\quad +\dfrac{2}{m\left( m-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) (1+o(\dfrac{1}{m^2}))\\&\quad +\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) \\ \end{aligned}$$

Under $H_1$,

$$\begin{aligned} {\text {Var}}\left( T_{\textrm{dist2}}\right) =\sigma _\textrm{dist2}^2\left\{ 1+o(1)\right\} . \end{aligned}$$

where under Assumption 2,

$$\begin{aligned} \sigma _\textrm{dist2}^2&=\dfrac{2}{n\left( n-1\right) } {\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) +\dfrac{2}{m\left( m-1\right) } {\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) \\ &\quad +\dfrac{4}{nm} {\text {tr}}\left( \varvec{\Sigma }_{x} \varvec{\Sigma }_{y}\right) \\&\quad +\left( \dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}-\dfrac{2}{n^3\left( n-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) \\&\quad +\left( \dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}-\dfrac{2}{m^3\left( m-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) .\\ \end{aligned}$$

Asymptotic normality of $T_{\textrm{dist2}}$. Let

$$\begin{aligned} T&=T_{\textrm{dist2}}-T_{\textrm{cq}}\\&=n^{-1}{\text {tr}}\varvec{ S}_{x}+m^{-1}{\text {tr}}\varvec{S}_{y}-n^{-2} \sum _{k=1}^{K}n_k{\text {tr}}\varvec{ S}_{x}^{(k)}\\ &\quad -m^{-2} \sum _{\ell =1}^{L}m_\ell {\text {tr}}\varvec{S}_{y}^{(\ell )}\\&=n^{-1} \left( {\text {tr}}\varvec{ S}_{x}-\sum _{k=1}^{K}\dfrac{n_k}{n}{\text {tr}}\varvec{ S}_{x}^{(k)}\right) \\ &\quad +m^{-1} \left( {\text {tr}}\varvec{ S}_{y}-\sum _{\ell =1}^{L}\dfrac{m_\ell }{m}{\text {tr}}\varvec{ S}_{y}^{(\ell )}\right) \end{aligned}$$

we know that, as $n\rightarrow \infty $, $\varvec{ S}_{x}{\mathop {\rightarrow }\limits ^{p}}\varvec{\Sigma }_{x}$, and as $n_k\rightarrow \infty $, $\varvec{S}_{x}^{(k)}{\mathop {\rightarrow }\limits ^{p}}\varvec{\Sigma }_{x},$ then, as $n_k\rightarrow \infty $, $n=\sum _{k=1}^{K}n_k$,

$$\begin{aligned} {\text {tr}}\varvec{ S}_{x}-\sum _{k=1}^{K}\dfrac{n_k}{n}{\text {tr}}\varvec{ S}_{x}^{(k)}{\mathop {\rightarrow }\limits ^{p}}0. \end{aligned}$$

Similarly, as $m_\ell \rightarrow \infty $, $m=\sum _{\ell =1}^{L}m_\ell $,

$$\begin{aligned} {\text {tr}}\varvec{ S}_{y}-\sum _{\ell =1}^{L}\dfrac{m_\ell }{m}{\text {tr}}\varvec{ S}_{y}^{(\ell )}{\mathop {\rightarrow }\limits ^{p}}0. \end{aligned}$$

So

$$\begin{aligned} T_{\textrm{dist2}}-T_{\textrm{cq}}{\mathop {\rightarrow }\limits ^{p}}0, \end{aligned}$$

We know that $\dfrac{T_{\textrm{cq}}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{cq}}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1).$ Finally, by Slutsky theorem, we have

$$\begin{aligned} \dfrac{T_{\textrm{dist2}}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{dist2}}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1). \end{aligned}$$

$\square $

1.5 A.5 Proof of Theorem 5

Proof

By Lemma 2 in Hu et al. (2017), under Model II and Assumptions 1, 2, 5, 6, as $p\rightarrow \infty $, $n_k\rightarrow \infty $ and $m_\ell \rightarrow \infty $,

$$\begin{aligned} & \dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }^{(k)}}{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }{\mathop {\rightarrow }\limits ^{p}}1, \quad k=1,\dots , K,\quad \\ & \dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }^{(\ell )}}{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }{\mathop {\rightarrow }\limits ^{p}}1, \quad \ell =1,\dots , L, \end{aligned}$$

and

$$\begin{aligned} & \dfrac{\widehat{{\text {tr}} \left( \varvec{\Sigma }_{x} \varvec{\Sigma }_{y}\right) }_{d2}}{{\text {tr}} \left( \varvec{\Sigma }_{x} \varvec{\Sigma }_{y}\right) }{\mathop {\rightarrow }\limits ^{p}}1,\ \text {for }\ \forall k\in \{1,\dots ,K\} \ \text {and }\ \\ & \quad \forall \ell \in \{1,\dots ,L\}. \end{aligned}$$

Then as $p\rightarrow \infty $, $n_k\rightarrow \infty $ and $m_\ell \rightarrow \infty $,

$$\begin{aligned} & \dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }_{d2}}{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }=\sum _{k=1}^{K}\dfrac{n_k}{n}\dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }^{(k)}}{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }{\mathop {\rightarrow }\limits ^{p}}1, \quad k=1,\dots , K, \\ & \dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }_{d2}}{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }=\sum _{\ell =1}^{L}\dfrac{m_\ell }{m}\dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }^{(\ell )}}{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }{\mathop {\rightarrow }\limits ^{p}}1, \quad \ell =1,\dots , L. \end{aligned}$$

$\square $

Appendix B Supplementary figures

1.1 B.1 Supplementary figures of the impact of dimension

See Figs. 14, 15, 16, 17, 18, 19, 20 and 21.

1.2 B.2 Supplementary figures of the impact of the number of nodes

See Figs. 22, 23, 24, 25, 26, 27, 28 and 29.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yan, L., Hu, J. & Wu, L. Distributed hypothesis testing for large dimensional two-sample mean vectors. Stat Comput 34, 187 (2024). https://doi.org/10.1007/s11222-024-10489-3

Download citation

Received: 19 August 2024
Accepted: 23 August 2024
Published: 23 September 2024
DOI: https://doi.org/10.1007/s11222-024-10489-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed hypothesis testing for large dimensional two-sample mean vectors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Extended Hotelling \(T^2\) test in distributed frameworks

Robust covariance estimation for distributed principal component analysis

Multi-sample hypothesis testing of high-dimensional mean vectors under covariance heterogeneity

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A Technical proofs

1.1 A.1 Proof of Theorem 1

Proof

1.2 A.2 Proof of Theorem 2

Proof

1.3 A.3 Proof of Theorem 3

Proof

1.4 A.4 Proof of Theorem 4

Proof

1.5 A.5 Proof of Theorem 5

Proof

Appendix B Supplementary figures

1.1 B.1 Supplementary figures of the impact of dimension

1.2 B.2 Supplementary figures of the impact of the number of nodes

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now