Skip to main content

Advertisement

Log in

Distributed hypothesis testing for large dimensional two-sample mean vectors

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The advent of the big data era has brought massive datasets to the forefront of academic and industrial discussions. Due to the high communication cost and long calculation time, traditional statistical methods may be difficult to process data centrally on a single server. A robust distributed system can effectively mitigate communication costs and enhance computational efficiency. However, the classical two-sample hypothesis testing problem in statistical analysis has not yet been fully developed within a distributed system framework. This paper explores the challenges of performing two-sample mean tests in a distributed framework, especially in the presence of unequal covariance matrices. By distributing samples across various nodes, we introduce two distributed test statistics: the blockwise linear two-sample test and the distributed two-sample test. Even though the sample size of each node is less than the dimension, the proposed test statistics maintain robust statistical properties. Both statistics are designed to enhance communication efficiency and reduce communication costs compared to the full-sample statistic. Simulation experiments and empirical analyses further confirm the favorable statistical properties of the proposed test statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

No datasets were generated or analysed during the current study.

References

Download references

Acknowledgements

The authors would like to thank the Editor and three referees for their constructive comments that have significantly improved the paper. Jiang Hu was partially supported by NSFC Grants No.12292980, No.12292982, No.12171078, No.12326606, National Key R & D Program of China No.2020YFA0714102, and Fundamental Research Funds for the Central Universities, China No.2412023YQ003.

Author information

Authors and Affiliations

Authors

Contributions

All authors discussed the results and contributed to the final manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jiang Hu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Technical proofs

1.1 A.1 Proof of Theorem 1

Proof

On each computing node, we compute the local statistic.

$$\begin{aligned} T_{\textrm{dist1}}^{(k)}&=\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_{i}^{\prime } \varvec{X}_{j}}{n_k\left( n_k-1\right) }+\dfrac{\sum _{i,j\in \mathcal {S}_y^k,i \ne j} \varvec{Y}_{i}^{\prime } \varvec{Y}_{ j}}{m_{k}\left( m_{k}-1\right) }\nonumber \\ &\quad -2\dfrac{\sum _{i\in \mathcal {S}_x^k} \sum _{j\in \mathcal {S}_y^k} \varvec{X}_{i}^{\prime } \varvec{Y}_{j}}{n_k m_{k}} \end{aligned}$$
(A1)
$$\begin{aligned}&=\left( \varvec{\bar{X}}^{(k)}-\varvec{\bar{Y}}^{(k)}\right) ^{\prime }\left( \varvec{\bar{X}}^{(k)}-\varvec{\bar{Y}}^{(k)}\right) \nonumber \\ &\quad -n_k^{-1} {\text {tr}}\varvec{ S}_{x}^{(k)}-m_k^{-1} {\text {tr}} \varvec{S}_{y}^{(k)}. \end{aligned}$$
(A2)

Let’s prove why the above equation holds.

$$\begin{aligned} \dot{\varvec{I}}&=:\left( \varvec{\bar{X}}^{(k)}-\varvec{\bar{Y}}^{(k)}\right) ^{\prime }\left( \varvec{\bar{X}}^{(k)}-\varvec{\bar{Y}}^{(k)}\right) \\&=\varvec{\bar{X}}^{(k)\prime }\varvec{\bar{X}}^{(k)}-2\varvec{\bar{X}}^{(k)\prime }\varvec{\bar{Y}}^{(k)}+\varvec{\bar{Y}}^{(k)\prime }\varvec{\bar{Y}}^{(k)}\\&=\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_{i}^{\prime } \varvec{X}_{j}}{n_k^2}+\dfrac{\sum _{i\in \mathcal {S}_x^k}\varvec{ X}_{i}^{\prime } \varvec{X}_{i}}{n_k^2}\\ &\qquad -2\dfrac{\sum _{i\in \mathcal {S}_x^k} \sum _{j\in \mathcal {S}_y^k} \varvec{X}_{i}^{\prime } \varvec{Y}_{j}}{n_k m_{k}}\\&\qquad +\dfrac{\sum _{i,j\in \mathcal {S}_y^k,i \ne j} \varvec{Y}_{i}^{\prime } \varvec{Y}_{ j}}{m_{k}^2}+\dfrac{\sum _{j\in \mathcal {S}_y^k} \varvec{Y}_{j}^{\prime } \varvec{Y}_{ j}}{m_{k}^2}, \\ {\text {tr}}\varvec{ S}_{x}^{(k)}&=\dfrac{1}{n_k-1}{\text {tr}}\sum _{i\in \mathcal {S}_x^k}\left( \varvec{X}_i-\bar{\varvec{X}}^{(k)}\right) \left( \varvec{ X}_i-\bar{\varvec{X}}^{(k)}\right) ^{\prime }\\&=\dfrac{1}{n_k-1}\sum _{i\in \mathcal {S}_x^k}\left( \varvec{X}_i-\bar{\varvec{X}}^{(k)}\right) ^{\prime }\left( \varvec{ X}_i-\bar{\varvec{X}}^{(k)}\right) \\&=\dfrac{1}{n_k-1}\sum _{i\in \mathcal {S}_x^k}\varvec{X}_i^\prime \varvec{X}_i-\frac{n_k}{n_k-1}\bar{\varvec{X}}^{(k)\prime }\bar{\varvec{X}}^{(k)}\\&=\dfrac{1}{n_k-1}\sum _{i\in \mathcal {S}_x^k}\varvec{X}_i^\prime \varvec{X}_i-\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_{i}^{\prime } \varvec{X}_{j}}{n_k(n_k-1)}\\ &\quad -\dfrac{\sum _{i\in \mathcal {S}_x^k}\varvec{ X}_{i}^{\prime } \varvec{X}_{i}}{n_k( n_k-1)}\\&=\dfrac{1}{n_k}\sum _{i\in \mathcal {S}_x^k}\varvec{X}_i^\prime \varvec{X}_i-\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_{i}^{\prime } \varvec{X}_{j}}{n_k(n_k-1)}. \end{aligned}$$

Similarly,

$$\begin{aligned} {\text {tr}}\varvec{S}_{y}^{(k)}=\dfrac{1}{m_k}\sum _{j\in \mathcal {S}_y^k}\varvec{Y}_j^\prime \varvec{Y}_j-\dfrac{\sum _{i,j\in \mathcal {S}_y^k,i\ne j}\varvec{ Y}_{i}^{\prime } \varvec{Y}_{j}}{m_k(m_k-1)}. \end{aligned}$$

Bringing \(\dot{\varvec{I}}\), \({\text {tr}}\varvec{ S}_{x}^{(k)}\), \({\text {tr}}\varvec{S}_{y}^{(k)}\) into Eq. A2, then Eq. A2 = A1. By Chen and Qin (2010), under \(H_1\) and the Assumptions 14,

$$\begin{aligned} {\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) =\sigma _\textrm{dist1}^{(k)2}\left\{ 1+o(1)\right\} \end{aligned}$$

where under Assumption 2,

$$\begin{aligned} \sigma _\textrm{dist1}^{(k)2}&=\dfrac{2}{n_k\left( n_k-1\right) } {\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) +\dfrac{2}{m_k\left( m_k-1\right) } {\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) \nonumber \\ &\quad +\dfrac{4}{n_k m_k} {\text {tr}}\left( \varvec{\Sigma }_{x} \varvec{\Sigma }_{y}\right) , \end{aligned}$$
(A3)

and the o(1) term disappears under \(H_0\).

$$\begin{aligned} T_{\textrm{dist1}}^*=\sum _{k=1}^{K}\omega _{k}^*T_{\textrm{dist1}}^{(k)}. \end{aligned}$$

On each node, there are

$$\begin{aligned} & \dfrac{T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1), \ \text {as}\ p\rightarrow \infty ,\ \\ & \quad M_k\rightarrow \infty , \forall k\in \left\{ 1,\dots ,K\right\} . \end{aligned}$$

We need to get:

$$\begin{aligned} \omega _{k}^*= & \arg \min _{\omega _{k}}\mathbb {E}\left( \sum _{k=1}^{K}\omega _{k}T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2, \quad \\ & \quad {k}=1,\dots ,K. \end{aligned}$$

Using the method of Lagrange multipliers, under constraint \(\sum _{{k}=1}^{K}\omega _{k}=1\), there are

$$\begin{aligned} L_n\left( \omega _1,\dots ,\omega _K;\lambda \right)&=\mathbb {E}\left( \sum _{k=1}^{K}\omega _{k}T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2\\ &\quad +2\lambda \left( \sum _{{k}=1}^{K}\omega _{k}-1\right) \\&=\sum _{{k}=1}^{K}\omega _{k}^2\mathbb {E}\left( T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2\\ &\quad +2\lambda \left( \sum _{{k}=1}^{K}\omega _{k}-1\right) . \end{aligned}$$

The function \(L_n\left( \omega _1,\dots ,\omega _K;\lambda \right) \) takes the partial derivatives for \(\omega _{k}\), \(k=1,\dots ,K\), and \(\lambda \), respectively:

$$\begin{aligned} {\left\{ \begin{array}{ll} \dfrac{\partial L_n\left( \omega _1,\dots ,\omega _K;\lambda \right) }{\partial \omega _1}& =2\omega _1\mathbb {E}\left( T_{\textrm{dist1}}^{(1)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2\\ & \quad +2\lambda =0,\\ & \vdots \\ \dfrac{\partial L_n\left( \omega _1,\dots ,\omega _K;\lambda \right) }{\partial \omega _K}& =2\omega _K\mathbb {E}\left( T_{\textrm{dist1}}^{(K)}-\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2\right) ^2\\ & \quad +2\lambda =0,\\ \dfrac{\partial L_n\left( \omega _1,\dots ,\omega _K;\lambda \right) }{\partial \lambda }& =\sum _{{k}=1}^{K}\omega _{k}-1=0. \end{array}\right. } \end{aligned}$$

Under \(H_0\), then

$$\begin{aligned} \begin{aligned} \omega _{k}^*=&\dfrac{1}{\sigma _\text {dist1}^{(k)2}}/\left( \sum _{i=1}^{K}\dfrac{1}{\sigma _\text {dist1}^{(i)2}}\right) ,\quad k=1,\dots ,K, \quad \\ \lambda =&\dfrac{1}{\sum _{k=1}^{K}1/\sigma _\text {dist1}^{(k)2}}. \end{aligned} \end{aligned}$$

\(\square \)

1.2 A.2 Proof of Theorem 2

Proof

Because it contains unknown variables, we estimate it:

$$\begin{aligned} \begin{aligned} \hat{\omega }_k=\dfrac{1}{\hat{\sigma }_\text {dist1}^{(k)2}}/\left( \sum _{i=1}^{K}\dfrac{1}{\hat{\sigma }_\text {dist1}^{(i)2}}\right) ,\quad k=1,\dots ,K. \end{aligned} \end{aligned}$$

By Lemma 2 and Continuous Mapping Theorem:

$$\begin{aligned} \hat{\omega }_k{\mathop {\rightarrow }\limits ^{p}}\omega _k,\quad k=1,\dots ,K, \ \text {as}\ p\rightarrow \infty \ \text {and} \ M_k\rightarrow \infty . \end{aligned}$$

\(\square \)

1.3 A.3 Proof of Theorem 3

Proof

It’s on every node:

$$\begin{aligned} \dfrac{T_{\textrm{dist1}}^{(k)}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1), \end{aligned}$$

Nodes exist independently of each other. Then by Theorem 2, as \( p\rightarrow \infty \) and \(M_k\rightarrow \infty \),

$$\begin{aligned} T_{\textrm{dist1}}&=\sum _{k=1}^{K}\hat{\omega }_{k}T_{\textrm{dist1}}^{(k)}\\&=\sum _{k=1}^{K}\left( \hat{\omega }_{k}-\omega _{k}^*\right) T_{\textrm{dist1}}^{(k)}+\sum _{k=1}^{K}\omega _{k}^*T_{\textrm{dist1}}^{(k)}\\&{\mathop {\rightarrow }\limits ^{d}}\sum _{k=1}^{K}\omega _{k}^*T_{\textrm{dist1}}^{(k)}.\\ \end{aligned}$$

Because \(\sum _{k=1}^{K}\omega _{k}^*=1\), then as \( p\rightarrow \infty \) and \(M_k\rightarrow \infty \), \(k\in \left\{ 1,\dots ,K\right\} \)

$$\begin{aligned} T_{\textrm{dist1}}{\mathop {\rightarrow }\limits ^{d}}\mathcal {N}\left( \Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2},\sum _{k=1}^{K}\omega _{k}^{*2}{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \right) , \end{aligned}$$

i.e.

$$\begin{aligned} \dfrac{T_{\textrm{dist1}}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{dist1}}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1), \end{aligned}$$

where

$$\begin{aligned} {\text {Var}}\left( T_{\textrm{dist1}}\right) =&\sum _{k=1}^{K}\omega _{k}^{*2}{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \\ =&\sum _{k=1}^{K}\dfrac{\left( 1/{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) ^2\right) }{\left( \sum _{k=1}^{K}1/{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \right) ^2}{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \\ =&\dfrac{1}{\left( \sum _{k=1}^{K}1/{\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) \right) }\\ \end{aligned}$$

Under \(H_1\) and the Assumptions 14, \({\text {Var}}\left( T_{\textrm{dist1}}^{(k)}\right) =\sigma _\textrm{dist1}^{(k)2}\left\{ 1+o(1)\right\} ,\) and the o(1) term disappears under \(H_0\). \(\square \)

1.4 A.4 Proof of Theorem 4

Proof

Calculate \(\mathbb {E}\left( T_{\textrm{dist2}}\right) \) and \({\text {Var}}\left( T_{\textrm{dist2}}\right) \).

$$\begin{aligned} T_{\textrm{dist2}}= & \left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) ^{\prime }\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) -n^{-2} \sum _{k=1}^{K}n_k{\text {tr}}\varvec{ S}_{x}^{(k)}\\ & -m^{-2} \sum _{\ell =1}^{L}m_\ell {\text {tr}}\varvec{S}_{y}^{(\ell )}, \end{aligned}$$

where

$$\begin{aligned} & {\text {tr}}\varvec{S}_x^{(k)}=\dfrac{1}{n_k}\sum _{i\in \mathcal {S}_x^k}\varvec{X}_i^{\prime }\varvec{X}_i-\dfrac{\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{X}_{i}^{\prime } \varvec{X}_{j}}{n_k\left( n_k-1\right) }, \\ & {\text {tr}}\varvec{S}_y^{(\ell )}=\dfrac{1}{m_\ell }\sum _{i\in \mathcal {S}_y^\ell }\varvec{Y}_i^{\prime }\varvec{Y}_i-\dfrac{\sum _{i,j\in \mathcal {S}_y^\ell ,i \ne j}\varvec{Y}_{i}^{\prime } \varvec{Y}_{j}}{m_\ell \left( m_\ell -1\right) }. \end{aligned}$$

Here

$$\begin{aligned}&\mathbb {E}\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) ^{\prime }\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) =\dfrac{1}{n}{\text {tr}}\varvec{\Sigma }_x\\ &\quad +\dfrac{1}{m}{\text {tr}}\varvec{\Sigma }_y+\varvec{\mu }_x^{\prime }\varvec{\mu }_x+\varvec{\mu }_y^{\prime }\varvec{\mu }_y-2\varvec{\mu }_x^{\prime }\varvec{\mu }_y, \\&\mathbb {E}{\text {tr}}\varvec{S}_x^{(k)}={\text {tr}}\varvec{\Sigma }_x,\quad \mathbb {E}{\text {tr}}\varvec{S}_y^{(\ell )}={\text {tr}}\varvec{\Sigma }_y. \end{aligned}$$

Thus

$$\begin{aligned} \mathbb {E}\left( T_{\textrm{dist2}}\right) =\varvec{\mu }_x^{\prime }\varvec{\mu }_x+\varvec{\mu }_y^{\prime }\varvec{\mu }_y-2\varvec{\mu }_x^{\prime }\varvec{\mu }_y=\Vert \varvec{\mu }_x-\varvec{\mu }_y\Vert ^2. \end{aligned}$$
$$\begin{aligned} T_{\textrm{dist2}}&=\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) ^{\prime }\left( \varvec{\bar{X}}-\varvec{\bar{Y}}\right) -n^{-2} \sum _{k=1}^{K}n_k{\text {tr}}\varvec{ S}_{x}^{(k)}\\ &\quad -m^{-2} \sum _{\ell =1}^{L}m_\ell {\text {tr}}\varvec{S}_{y}^{(\ell )}\\&=\dfrac{1}{n^2}\sum _{i,j\in \mathcal {S}_x,i\ne j}\varvec{ X}_i^{\prime }\varvec{ X}_j+\dfrac{1}{n^2}\sum _{k=1}^{K}\sum _{i,j\in \mathcal {S}_x^k,i \ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_k-1}\\ &\quad -2\dfrac{\sum _{i\in \mathcal {S}_x}\sum _{j\in \mathcal {S}_y}\varvec{X}_i^{\prime }\varvec{Y}_j}{nm}\\&\quad +\dfrac{1}{m^2}\sum _{i,j\in \mathcal {S}_y,i\ne j}\varvec{ Y}_i^{\prime }\varvec{ Y}_j+\dfrac{1}{m^2}\sum _{\ell =1}^{L}\sum _{i,j\in \mathcal {S}_y^\ell ,i \ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_\ell -1}.\\ \end{aligned}$$

Let

$$\begin{aligned} P_1= & \dfrac{1}{n^2}\sum _{i,j\in \mathcal {S}_x,i\ne j}\varvec{ X}_i^{\prime }\varvec{ X}_j,\quad \\ P_2= & \dfrac{1}{n^2}\sum _{k=1}^{K}\sum _{i,j\in \mathcal {S}_x^k,i \ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_k-1}, \\ P_3= & -2\dfrac{\sum _{i\in \mathcal {S}_x}\sum _{j\in \mathcal {S}_y}\varvec{X}_i^{\prime }\varvec{Y}_j}{nm}, \\ P_4= & \dfrac{1}{m^2}\sum _{i,j\in \mathcal {S}_y,i\ne j}\varvec{ Y}_i^{\prime }\varvec{ Y}_j, \quad \\ P_5= & \dfrac{1}{m^2}\sum _{\ell =1}^{L}\sum _{i,j\in \mathcal {S}_y^\ell ,i \ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_\ell -1}. \end{aligned}$$

Then

$$\begin{aligned} {\text {Var}}\left( P_1\right) & =\dfrac{2\left( n-1\right) }{n^3}{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{4\left( n-1\right) ^2\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x}{n^3}, \\ {\text {Var}}\left( P_2\right) & =\dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{4\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x}{n^3}, \\ {\text {Var}}\left( P_3\right) & =\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) +\dfrac{4\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_x}{m}+\dfrac{4\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_y}{n}, \\ {\text {Var}}\left( P_4\right) & =\dfrac{2\left( m-1\right) }{m^3}{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) +\dfrac{4\left( m-1\right) ^2\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y}{m^3}, \\ {\text {Var}}\left( P_5\right) & =\dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) +\dfrac{4\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y}{m^3}, \end{aligned}$$

Since samples \(\varvec{\mathcal {X}}_{n}\) and \(\varvec{\mathcal {Y}}_{m}\) are independent, the \({\text {Cov}}\left( P_1,P_4\right) =0\), \({\text {Cov}}\left( P_1,P_5\right) =0\), \({\text {Cov}}\left( P_2,P_4\right) =0\) and \({\text {Cov}}\left( P_2,P_5\right) =0\). And the samples are independent between different nodes, we have the following covariance results.

$$\begin{aligned}&{\text {Cov}}\left( P_1,P_2\right) \\ &\quad =\dfrac{1}{n^4}\sum _{k=1}^{K}{\text {Cov}}\left( \sum _{i,j\in \mathcal {S}_x^k,i\ne j}\varvec{ X}_i^{\prime }\varvec{ X}_j,\sum _{i,j\in \mathcal {S}_x^k,i\ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_k-1}\right) \\&\qquad +\dfrac{1}{n^4}\sum _{k_1\ne k_2}^{K}{\text {Cov}}\left( \sum _{i,j\in \mathcal {S}_x^{k_1},i\ne j}\varvec{ X}_i^{\prime }\varvec{ X}_j,\sum _{i,j\in \mathcal {S}_x^{k_2},i\ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_{k_2}-1}\right) \\&\qquad +\dfrac{1}{n^4}{\text {Cov}}\left( \sum _{k_1\ne k_2}^{K}\sum _{i\in \mathcal {S}_x^{k_1}}\sum _{j\in \mathcal {S}_x^{k_2}}\varvec{X}_i^{\prime }\varvec{X}_j, \sum _{k=1}^{K}\sum _{i,j\in \mathcal {S}_x^{k},i\ne j}\dfrac{\varvec{ X}_i^{\prime }\varvec{ X}_j}{n_k-1}\right) \\&\quad =\dfrac{2}{n^3}{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\sum _{k=1}^{K}4n_k\left( n_k-1\right) \varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x\\ &\qquad +\sum _{k_1\ne k_2}^{K}4n_{k_1}n_{k_2}\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x\\&\quad =\dfrac{2}{n^3}{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{4\left( n-1\right) }{n^3}\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x, \end{aligned}$$
$$\begin{aligned} {\text {Cov}}\left( P_1,P_3\right) & =-\dfrac{4\left( n-1\right) \varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_y}{n^2}, \\ {\text {Cov}}\left( P_2,P_3\right) & =-\dfrac{4\varvec{\mu }_x^{\prime }\varvec{\Sigma }_x\varvec{\mu }_y}{n^2}, \end{aligned}$$
$$\begin{aligned}&{\text {Cov}}\left( P_4,P_5\right) \\ &\quad =\dfrac{1}{m^4}\sum _{\ell =1}^{L}{\text {Cov}}\left( \sum _{i,j\in \mathcal {S}_y^\ell ,i\ne j}\varvec{ Y}_i^{\prime }\varvec{ Y}_j,\sum _{i,j\in \mathcal {S}_y^\ell ,i\ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_\ell -1}\right) \\&\qquad +\dfrac{1}{m^4}\sum _{\ell _1\ne \ell _2}^{L}{\text {Cov}}\left( \sum _{i,j\in \mathcal {S}_y^{\ell _1},i\ne j}\varvec{ Y}_i^{\prime }\varvec{ Y}_j,\sum _{i,j\in \mathcal {S}_y^{\ell _2},i\ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_{\ell _2}-1}\right) \\&\qquad +\dfrac{1}{m^4}{\text {Cov}}\left( \sum _{\ell _1\ne \ell _2}^{L}\sum _{i\in \mathcal {S}_y^{\ell _1}}\sum _{j\in \mathcal {S}_y^{\ell _2}}\varvec{Y}_i^{\prime }\varvec{Y}_j, \sum _{\ell =1}^{L}\sum _{i,j\in \mathcal {S}_y^{\ell },i\ne j}\dfrac{\varvec{ Y}_i^{\prime }\varvec{ Y}_j}{m_\ell -1}\right) \\&\quad =\dfrac{2}{m^3}{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) +\sum _{\ell =1}^{L}4m_\ell \left( m_\ell -1\right) \varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y\\ &\qquad +\sum _{\ell _1\ne \ell _2}^{L}4m_{\ell _1}m_{\ell _2}\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y\\&\quad =\dfrac{2}{m^3}{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) +\dfrac{4\left( m-1\right) }{m^3}\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y, \end{aligned}$$
$$\begin{aligned} {\text {Cov}}\left( P_4,P_3\right) & =-\dfrac{4\left( m-1\right) \varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_x}{m^2},\\ {\text {Cov}}\left( P_5,P_3\right) & =-\dfrac{4\varvec{\mu }_y^{\prime }\varvec{\Sigma }_y\varvec{\mu }_x}{m^2}, \end{aligned}$$

In summary,

$$\begin{aligned}&{\text {Var}}\left( T_{\textrm{dist2}}\right) \\ &\quad =\left( \dfrac{2\left( n-1\right) }{n^3}+\dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}+\dfrac{4}{n^3}\right) {\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) \\ &\qquad +\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) \\&\qquad +\left( \dfrac{2\left( m-1\right) }{m^3}+\dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}+\dfrac{4}{m^3}\right) {\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) \\&\qquad +\left( \dfrac{4\left( n-1\right) ^2}{n^3}+\dfrac{4}{n^3}+\dfrac{8\left( n-1\right) }{n^3}\right) \varvec{\mu }_x^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_x\\&\qquad +\dfrac{4\varvec{\mu }_x^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_x}{m}+\dfrac{4\varvec{\mu }_y^{\prime }\varvec{\Sigma }_{x}\varvec{\mu }_y}{n}-\dfrac{8}{n}\varvec{\mu }_x^{\prime }\varvec{\Sigma }_x\varvec{\mu }_y-\dfrac{8}{m}\varvec{\mu }_y^{\prime }\varvec{\Sigma }_y\varvec{\mu }_x\\&\qquad +\left( \dfrac{4\left( m-1\right) ^2}{m^3}+\dfrac{4}{m^3}+\dfrac{8\left( m-1\right) }{m^3}\right) \varvec{\mu }_y^{\prime }\varvec{\Sigma }_{y}\varvec{\mu }_y\\&\quad =\dfrac{2}{n\left( n-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{2}{m\left( m-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) \\ &\qquad +\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) \\&\qquad +\dfrac{4}{n}\left( \varvec{\mu }_x-\varvec{\mu }_y\right) ^{\prime }\varvec{\Sigma }_x\left( \varvec{\mu }_x-\varvec{\mu }_y\right) \\ &\qquad +\dfrac{4}{m}\left( \varvec{\mu }_x-\varvec{\mu }_y\right) ^{\prime }\varvec{\Sigma }_y\left( \varvec{\mu }_x-\varvec{\mu }_y\right) \\&\qquad +\left( \dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}-\dfrac{2}{n^3\left( n-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) \\&\qquad +\left( \dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}-\dfrac{2}{m^3\left( m-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) .\\ \end{aligned}$$
Fig. 14
figure 14

Impact of dimension on size when the sample obeys \((\chi _2^2-2)/2\) (Case 1)

Thus, under \(H_0\),

$$\begin{aligned} {\text {Var}}\left( T_{\textrm{dist2}}\right)&=\dfrac{2}{n\left( n-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) +\dfrac{2}{m\left( m-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) \\ &\quad +\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) \\&\quad +\left( \dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}-\dfrac{2}{n^3\left( n-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) \\&\quad +\left( \dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}-\dfrac{2}{m^3\left( m-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) .\\&=\dfrac{2}{n\left( n-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) (1+o(\dfrac{1}{n^2}))\\ &\quad +\dfrac{2}{m\left( m-1\right) }{\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) (1+o(\dfrac{1}{m^2}))\\&\quad +\dfrac{4}{nm}{\text {tr}}\left( \varvec{\Sigma }_{x}\varvec{\Sigma }_{y}\right) \\ \end{aligned}$$

Under \(H_1\),

$$\begin{aligned} {\text {Var}}\left( T_{\textrm{dist2}}\right) =\sigma _\textrm{dist2}^2\left\{ 1+o(1)\right\} . \end{aligned}$$

where under Assumption 2,

$$\begin{aligned} \sigma _\textrm{dist2}^2&=\dfrac{2}{n\left( n-1\right) } {\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) +\dfrac{2}{m\left( m-1\right) } {\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) \\ &\quad +\dfrac{4}{nm} {\text {tr}}\left( \varvec{\Sigma }_{x} \varvec{\Sigma }_{y}\right) \\&\quad +\left( \dfrac{1}{n^4}\sum _{k=1}^{K}\dfrac{2n_k}{n_k-1}-\dfrac{2}{n^3\left( n-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{x}^2\right) \\&\quad +\left( \dfrac{1}{m^4}\sum _{\ell =1}^{L}\dfrac{2m_\ell }{m_\ell -1}-\dfrac{2}{m^3\left( m-1\right) }\right) {\text {tr}}\left( \varvec{\Sigma }_{y}^2\right) .\\ \end{aligned}$$

Asymptotic normality of \(T_{\textrm{dist2}}\). Let

$$\begin{aligned} T&=T_{\textrm{dist2}}-T_{\textrm{cq}}\\&=n^{-1}{\text {tr}}\varvec{ S}_{x}+m^{-1}{\text {tr}}\varvec{S}_{y}-n^{-2} \sum _{k=1}^{K}n_k{\text {tr}}\varvec{ S}_{x}^{(k)}\\ &\quad -m^{-2} \sum _{\ell =1}^{L}m_\ell {\text {tr}}\varvec{S}_{y}^{(\ell )}\\&=n^{-1} \left( {\text {tr}}\varvec{ S}_{x}-\sum _{k=1}^{K}\dfrac{n_k}{n}{\text {tr}}\varvec{ S}_{x}^{(k)}\right) \\ &\quad +m^{-1} \left( {\text {tr}}\varvec{ S}_{y}-\sum _{\ell =1}^{L}\dfrac{m_\ell }{m}{\text {tr}}\varvec{ S}_{y}^{(\ell )}\right) \end{aligned}$$

we know that, as \(n\rightarrow \infty \), \(\varvec{ S}_{x}{\mathop {\rightarrow }\limits ^{p}}\varvec{\Sigma }_{x}\), and as \(n_k\rightarrow \infty \), \(\varvec{S}_{x}^{(k)}{\mathop {\rightarrow }\limits ^{p}}\varvec{\Sigma }_{x},\) then, as \(n_k\rightarrow \infty \), \(n=\sum _{k=1}^{K}n_k\),

$$\begin{aligned} {\text {tr}}\varvec{ S}_{x}-\sum _{k=1}^{K}\dfrac{n_k}{n}{\text {tr}}\varvec{ S}_{x}^{(k)}{\mathop {\rightarrow }\limits ^{p}}0. \end{aligned}$$

Similarly, as \(m_\ell \rightarrow \infty \), \(m=\sum _{\ell =1}^{L}m_\ell \),

$$\begin{aligned} {\text {tr}}\varvec{ S}_{y}-\sum _{\ell =1}^{L}\dfrac{m_\ell }{m}{\text {tr}}\varvec{ S}_{y}^{(\ell )}{\mathop {\rightarrow }\limits ^{p}}0. \end{aligned}$$

So

$$\begin{aligned} T_{\textrm{dist2}}-T_{\textrm{cq}}{\mathop {\rightarrow }\limits ^{p}}0, \end{aligned}$$

We know that \(\dfrac{T_{\textrm{cq}}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{cq}}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1).\) Finally, by Slutsky theorem, we have

$$\begin{aligned} \dfrac{T_{\textrm{dist2}}-\Vert \varvec{\mu }_{x}-\varvec{\mu }_{y}\Vert ^{2}}{\sqrt{{\text {Var}}\left( T_{\textrm{dist2}}\right) }} {\mathop {\rightarrow }\limits ^{d}} \mathcal {N}(0,1). \end{aligned}$$

\(\square \)

1.5 A.5 Proof of Theorem 5

Proof

By Lemma 2 in Hu et al. (2017), under Model II and Assumptions 1, 2, 5, 6, as \(p\rightarrow \infty \), \(n_k\rightarrow \infty \) and \(m_\ell \rightarrow \infty \),

$$\begin{aligned} & \dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }^{(k)}}{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }{\mathop {\rightarrow }\limits ^{p}}1, \quad k=1,\dots , K,\quad \\ & \dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }^{(\ell )}}{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }{\mathop {\rightarrow }\limits ^{p}}1, \quad \ell =1,\dots , L, \end{aligned}$$

and

$$\begin{aligned} & \dfrac{\widehat{{\text {tr}} \left( \varvec{\Sigma }_{x} \varvec{\Sigma }_{y}\right) }_{d2}}{{\text {tr}} \left( \varvec{\Sigma }_{x} \varvec{\Sigma }_{y}\right) }{\mathop {\rightarrow }\limits ^{p}}1,\ \text {for }\ \forall k\in \{1,\dots ,K\} \ \text {and }\ \\ & \quad \forall \ell \in \{1,\dots ,L\}. \end{aligned}$$

Then as \(p\rightarrow \infty \), \(n_k\rightarrow \infty \) and \(m_\ell \rightarrow \infty \),

$$\begin{aligned} & \dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }_{d2}}{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }=\sum _{k=1}^{K}\dfrac{n_k}{n}\dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }^{(k)}}{{\text {tr}}\left( \varvec{\Sigma }_{x}^{2}\right) }{\mathop {\rightarrow }\limits ^{p}}1, \quad k=1,\dots , K, \\ & \dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }_{d2}}{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }=\sum _{\ell =1}^{L}\dfrac{m_\ell }{m}\dfrac{\widehat{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }^{(\ell )}}{{\text {tr}}\left( \varvec{\Sigma }_{y}^{2}\right) }{\mathop {\rightarrow }\limits ^{p}}1, \quad \ell =1,\dots , L. \end{aligned}$$

\(\square \)

Appendix B Supplementary figures

1.1 B.1 Supplementary figures of the impact of dimension

See Figs. 14, 15, 16, 17, 18, 19, 20 and 21.

Fig. 15
figure 15

Impact of dimension on power when the sample obeys \((\chi _2^2-2)/2\) (Case 1)

Fig. 16
figure 16

Impact of dimension on size when the sample obeys \((\chi _2^2-2)/2\) (Case 2)

Fig. 17
figure 17

Impact of dimension on power when the sample obeys \((\chi _2^2-2)/2\) (Case 2)

Fig. 18
figure 18

Impact of dimension on size when the sample obeys \((\chi _8^2-8)/4\) (Case 1)

Fig. 19
figure 19

Impact of dimension on power when the sample obeys \((\chi _8^2-8)/4\) (Case 1)

Fig. 20
figure 20

Impact of dimension on size when the sample obeys \((\chi _8^2-8)/4\) (Case 2)

Fig. 21
figure 21

Impact of dimension on power when the sample obeys \((\chi _8^2-8)/4\) (Case 2)

Fig. 22
figure 22

Impact of the number of nodes on size when the sample obeys \((\chi _2^2-2)/2\) (Case 1)

Fig. 23
figure 23

Impact of the number of nodes on power when the sample obeys \((\chi _2^2-2)/2\) (Case 1)

Fig. 24
figure 24

Impact of the number of nodes on size when the sample obeys \((\chi _2^2-2)/2\) (Case 2)

Fig. 25
figure 25

Impact of the number of nodes on power when the sample obeys \((\chi _2^2-2)/2\) (Case 2)

Fig. 26
figure 26

Impact of the number of nodes on size when the sample obeys \((\chi _8^2-8)/4\) (Case 1)

Fig. 27
figure 27

Impact of the number of nodes on power when the sample obeys \((\chi _8^2-8)/4\) (Case 1)

Fig. 28
figure 28

Impact of the number of nodes on size when the sample obeys \((\chi _8^2-8)/4\) (Case 2)

Fig. 29
figure 29

Impact of the number of nodes on power when the sample obeys \((\chi _8^2-8)/4\) (Case 2)

1.2 B.2 Supplementary figures of the impact of the number of nodes

See Figs. 22, 23, 24, 25, 26, 27, 28 and 29.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, L., Hu, J. & Wu, L. Distributed hypothesis testing for large dimensional two-sample mean vectors. Stat Comput 34, 187 (2024). https://doi.org/10.1007/s11222-024-10489-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-024-10489-3

Keywords

Navigation