Distributed identification of heterogeneous treatment effects

Zhang, Shuang; Feng, Xingdong

doi:10.1007/s00180-021-01114-2

Distributed identification of heterogeneous treatment effects

Original paper
Published: 29 May 2021

Volume 37, pages 57–89, (2022)
Cite this article

Computational Statistics Aims and scope Submit manuscript

489 Accesses
2 Citations
Explore all metrics

Abstract

In many areas including precise medical treatments and financial investments, analysis of heterogeneous treatment effects has become important. In this paper, we focus on identifying subgroups by combining data in a distributed storage system. We propose a distributed algorithm based on the alternating direction method of multipliers, which can well preserve privacy of subjects. This method can deal with large-scale data, and perform well in identifying subgroups if there exist sufficient samples in a whole distributed storage system but not necessarily in every computing node. Our numerical study indicates that the proposed method is promising in many interesting cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Communication-efficient estimation for distributed subset selection

Article 16 October 2023

Multiple imputation for analysis of incomplete data in distributed health data networks

Article Open access 29 October 2020

DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models

Article Open access 30 March 2022

References

Abe M, Nakagawa K (2019) Deep learning for multi-factor models in regional and global stock markets. In: JSAI international symposium on artificial intelligence. Springer, pp 87–102
Abe M, Nakagawa K (2020) Cross-sectional stock price prediction using deep learning for actual investment management. arXiv preprint arXiv:2002.06975
Becker SO, Egger PH, Von Ehrlich M (2013) Absorptive capacity and the growth and investment effects of regional transfers: a regression discontinuity design with heterogeneous treatment effects. Am Econ J Econ Policy 5(4):29–77
Article Google Scholar
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010. Springer, pp 177–186
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn 3(1):1–122
MATH Google Scholar
Dewandaru G, Rizvi SAR, Masih R, Masih M, Alhabshi SO (2014) Stock market co-movements: Islamic versus conventional equity indices with multi-timescales analysis. Econ Syst 38(4):553–571
Article Google Scholar
Eryiğit M, Eryiğit R (2009) Network structure of cross-correlations among the world market indices. Physica A 388(17):3551–3562
Article Google Scholar
Everitt B, Hand D (1981) Finite mixture distributions. Chapman and Hall, New York
Book Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38:1041–1046
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin
Book Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Lee JYL, Brown JJ, Ryan LM (2017) Sufficiency revisited: rethinking statistical algorithms in the big data era. Am Stat 71:202–208
Article MathSciNet Google Scholar
Lu Z, Leen TK (2007) Penalized probabilistic clustering. Neural Comput 19(6):1528–1567
Article MathSciNet Google Scholar
Ma S, Huang J (2017) A concave pairwise fusion approach to subgroup analysis. J Am Stat Assoc 112(517):410–423
Article MathSciNet Google Scholar
McAuliffe JD, Blei DM, Jordan MI (2006) Nonparametric empirical Bayes for the Dirichlet process mixture model. Stat Comput 16(1):5–14
Article MathSciNet Google Scholar
Ouyang H, He N, Tran L, Gray A (2013) Stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 80–88
Pauler DK, Laird NM (2000) A mixture model for longitudinal data with application to assessment of noncompliance. Biometrics 56(2):464–472
Article Google Scholar
Qiao H, Xia Y, Li Y (2016) Can network linkage effects determine return? Evidence from Chinese stock market. PLoS ONE 11(6):e0156784
Article Google Scholar
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B (Stat Methodol) 59(4):731–792
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
MathSciNet MATH Google Scholar
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494
Article MathSciNet Google Scholar
Valentino SW, Moore JE, Cleveland MJ, Greenberg MT, Tan X (2014) Profiles of financial stress over time using subgroup analysis. J Fam Econ Issues 35(1):51–64
Article Google Scholar
Wang H, Banerjee A (2013) Online alternating direction method (longer version). Technical report. arXiv:1306.3721
Wang H, Li R, Tsai CL (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568
Article MathSciNet Google Scholar
Wang X, Grimson WEL, Westin CF (2011) Tractography segmentation using a hierarchical Dirichlet processes mixture model. NeuroImage 54(1):290–302
Article Google Scholar
Wu Z, Yu J (2019) A multi-level descriptor using ultra-deep feature for image retrieval. Multimed Tools Appl 78(18):25655–25672
Article Google Scholar
Xu L, Krzyzak A, Oja E (1993) Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Trans Neural Netw 4(4):636–649
Article Google Scholar
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Article MathSciNet Google Scholar
Zhang R, Huang C, Zhang W, Chen S et al (2018) Multi factor stock selection model based on LSTM. Int J Econ Financ 10(8):1–36
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to anonymous reviewers for their comments and suggestions on an earlier draft of the paper.

Author information

Authors and Affiliations

School of Statistics and Management & Institute of Data Science and Statistics, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai, 200433, China
Shuang Zhang & Xingdong Feng

Authors

Shuang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xingdong Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingdong Feng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xingdong Feng is the Corresponding Author, and his research was partially supported by National Natural Science Foundation of China (11971292 and 11690012), and Program for Innovative Research Team of Shanghai University of Finance and Economics.

Appendix

1.1 The computing details of the algorithm DisSRADMM

In this section we give the details on solving problems (10)–(13) in the proposed algorithm DisSRADMM.

Update $\mathbf {z}_m^{k+1}$ : consider

$$\begin{aligned} \begin{aligned} \mathbf {z}^{k+1}_{m}&= \arg \min _{\mathbf {z}_m}\mathcal {L}(\varvec{\mu }^k,\varvec{\beta }^k,\varvec{\eta }^k,\mathbf {z}_m^k,\mathbf {z}_{-m}^k,\varvec{\beta }_1^k,...,\varvec{\beta }_M^k, \varvec{\mu }_1^k,...,\varvec{\mu }_M^k, \\&\qquad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k). \end{aligned} \end{aligned}$$

So we have

$$\begin{aligned} \begin{aligned} \mathbf {z}_m^{k+1}&= \arg \min _{\mathbf {z}_m}\bigg \{\frac{1}{2}\Vert \mathbf {z}_m\Vert ^2 + \frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m\Vert ^2\\&\quad +\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m,\mathbf {u}_m^k \rangle \bigg \}. \end{aligned} \end{aligned}$$

The previous problem can be solved elementwisely, so for any $i=1,2,...,n_m$, we carry out the following updating procedure

$$\begin{aligned} \begin{aligned} z_{m,i}^{k+1}&= \arg \min _{z_{m,i}}\bigg \{\frac{1}{2}\Vert z_{m,i}\Vert ^2 +\frac{d_{m}}{2}\Vert \mathbf {y}_{m,i} - \mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k - \mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-\mathbf {z}_{m,i}\Vert ^2\\&\quad +\langle \mathbf {y}_{m,i}- \mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k - \mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-\mathbf {z}_{m,i},\mathbf {u}_{m,i}^k \rangle \bigg \}, \end{aligned} \end{aligned}$$

where $\mathbf {A}_{m,i}$ represents the ith row vector of the matrix $\mathbf {A}_m$. Thus, we consider the following equations

$$\begin{aligned} \begin{aligned} 0&=\frac{\partial }{\partial \mathbf {z}_m}\bigg \{\frac{1}{2}\Vert z_{m,i}\Vert ^2 +\frac{d_{m}}{2}\Vert \mathbf {y}_{m,i} - \mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k - \mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-\mathbf {z}_{m,i}\Vert ^2\\&\quad +\langle \mathbf {y}_{m,i}- \mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k - \mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-\mathbf {z}_{m,i},\mathbf {u}_{m,i}^k \rangle \bigg \}\\&=z_{m,i} - d_m(y_{m,i}-\mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k-\mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-z_{m,i})-u_{m,i}^k. \end{aligned} \end{aligned}$$

It then follows that

$$\begin{aligned} z_{m,i} = \left\{ d_m(y_{m,i}-\mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k-\mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k)+u_{m,i}^k\right\} \bigg /\left( d_m+1 \right) . \end{aligned}$$

Update $\varvec{\beta }_m^{k+1}$: note that

$$\begin{aligned} \begin{aligned} \varvec{\beta }^{k+1}_{m}&= \arg \min _{\varvec{\beta }_m}\mathcal {L}(\varvec{\mu }^k,\varvec{\beta },\varvec{\beta }_m,\varvec{\beta }_{-m}^k,\varvec{\eta }^k,\varvec{\mu }_1^k,...,\varvec{\mu }_M^k, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\quad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \begin{aligned} \varvec{\beta }_m^{k+1}&= \arg \min _{\varvec{\beta }_m}\bigg \{\frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m^k\Vert ^2\\&\quad +\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m^k,\mathbf {u}_m^k \rangle + \frac{b_{m}}{2}\Vert \varvec{\beta }_{m}-\varvec{\beta }^k\Vert ^2\\&\quad + \langle \varvec{\beta }_{m}-\varvec{\beta }^k,\mathbf {v}_m^k \rangle \bigg \}. \end{aligned} \end{aligned}$$

We then have the following equations

$$\begin{aligned} \begin{aligned} 0&= \frac{\partial }{\partial \varvec{\beta }_m}\bigg \{\frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m^k\Vert ^2+\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m^k,\mathbf {u}_m^k \rangle \\&\quad + \frac{b_{m}}{2}\Vert \varvec{\beta }_{m}-\varvec{\beta }^k\Vert ^2+ \langle \varvec{\beta }_{m}-\varvec{\beta }^k,\mathbf {v}_m^k \rangle \bigg \}\\&=-d_m\left( \mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\mathbf {X}_m\varvec{\beta }_m^k - \mathbf {X}_m^{\top }\mathbf {A}_m\varvec{\mu }_m-\mathbf {X}_m^{\top }\mathbf {z}_m^k \right) - \mathbf {X}_m^{\top }\mathbf {u}_m^k\\&\quad +b_m(\varvec{\beta }_m-\varvec{\beta }^k) + \mathbf {v}_m^k\\&=(d_m\mathbf {X}_m^{\top }\mathbf {X}_m + b_m\mathbf {I})\varvec{\beta }_m - d_m(\mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\mathbf {A}_m\varvec{\mu }_m^k-\mathbf {X}_m^{\top }\mathbf {z}_m^k)-\mathbf {X}_m^{\top }\mathbf {u}_m^k - b_m\varvec{\beta }^k + \mathbf {v}_m^k. \end{aligned} \end{aligned}$$

Hence, it follows that

$$\begin{aligned} \begin{aligned} (d_m\mathbf {X}_m^{\top }\mathbf {X}_m + b_m\mathbf {I})\varvec{\beta }_m&= d_m[\mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\mathbf {A}_m\varvec{\mu }_m^k-\mathbf {X}_m^{\top }\mathbf {z}_m^k]+\mathbf {X}_m^{\top }\mathbf {u}_m^k+ b_m\varvec{\beta }^k\\&\quad -\mathbf {v}_m^k, \end{aligned} \end{aligned}$$

which is equivalent to

$$\begin{aligned} \begin{aligned} \left( \frac{d_m}{b_m}\mathbf {X}_m^{\top }\mathbf {X}_m + \mathbf {I}\right) \varvec{\beta }_m&= \frac{d_m}{b_m}(\mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\mathbf {A}_m\varvec{\mu }_m^k-\mathbf {X}_m^{\top }\mathbf {z}_m)+\frac{1}{b_m}\mathbf {X}_m^{\top }\mathbf {u}_m^k\\&\quad + \varvec{\beta }^k-\frac{1}{b_m}\mathbf {v}_m^k. \end{aligned} \end{aligned}$$

Let $o_1=n_1+n_2+...+n_{m-1}$ and $o_2=o_1+m_m$. $\varvec{\mu }_{m,in}$ represents the elements between $o_1$th and $o_2$th and $\varvec{\mu }_{m,out}$ represents all the other elements in $\varvec{\mu }_m$. Then, we have

$$\begin{aligned} \begin{aligned} \varvec{\beta }_m^{k+1}&= \left( \frac{d_m}{b_m}\mathbf {X}_m^{\top }\mathbf {X}_m + \mathbf {I}\right) ^{-1}\bigg \{\frac{d_m}{b_m}(\mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\varvec{\mu }_{m,in}^k-\mathbf {X}_m^{\top }\mathbf {z}_m)+\frac{1}{b_m}\mathbf {X}_m^{\top }\mathbf {u}_m^k\\&\quad + \varvec{\beta }^k-\frac{1}{b_m}\mathbf {v}_m^k\bigg \}. \end{aligned} \end{aligned}$$

Update $\varvec{\mu }_m^{k+1}$: consider

$$\begin{aligned} \begin{aligned} \varvec{\mu }^{k+1}_{m}&= \arg \min _{\varvec{\mu }_m}\mathcal {L}(\varvec{\mu }^k,\varvec{\beta }^k,\varvec{\eta }^k,\varvec{\mu }_m,\varvec{\mu }_{-m}^k,\varvec{\beta }_1^k,...,\varvec{\beta }_M^k, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\quad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \begin{aligned} \varvec{\mu }_m^{k+1}&= \arg \min _{\varvec{\mu }_m}\bigg \{\frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m-\mathbf {z}_m^k\Vert ^2\\&\quad +\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m-\mathbf {z}_m^k,\mathbf {u}_m^k \rangle + \frac{a_{m}}{2}\Vert \varvec{\mu }_{m}-\varvec{\mu }^k\Vert ^2\\&\quad + \langle \varvec{\mu }_{m}-\varvec{\mu }^k,\mathbf {q}_m^k \rangle \bigg \}. \end{aligned} \end{aligned}$$

This optimization problem can be solved by considering the following equations

$$\begin{aligned} \begin{aligned} 0&= \frac{\partial }{\partial \varvec{\mu }_m}\bigg \{\frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m-\mathbf {z}_m^k\Vert ^2+\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m-\mathbf {z}_m^k,\mathbf {u}_m^k \rangle \\&\quad + \frac{a_{m}}{2}\Vert \varvec{\mu }_{m}-\varvec{\mu }\Vert ^2+ \langle \varvec{\mu }_{m}-\varvec{\mu },\mathbf {q}_m^k \rangle \bigg \}\\&=-d_m\left( \mathbf {A}_m^{\top }\mathbf {y}_m - \mathbf {A}_m^{\top }\mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m^{\top }\mathbf {A}_m\varvec{\mu }_m-\mathbf {A}_m^{\top }\mathbf {z}_m^k \right) - \mathbf {A}_m^{\top }\mathbf {u}_m^k\\&\quad +a_m(\varvec{\mu }_m-\varvec{\mu }^k) + \mathbf {q}_m^k\\&=(d_m\mathbf {A}_m^{\top }\mathbf {A}_m + a_m\mathbf {I})\varvec{\mu }_m - d_m(\mathbf {A}_m^{\top }\mathbf {y}_m - \mathbf {A}_m^{\top }\mathbf {X}_m\varvec{\beta }_m^k-\mathbf {A}_m^{\top }\mathbf {z}_m^k)-\mathbf {A}_m^{\top }\mathbf {u}_m^k - a_m\varvec{\mu }^k + \mathbf {q}_m^k. \end{aligned} \end{aligned}$$

Hence, we obtain that

$$\begin{aligned} \begin{aligned}&\left( \frac{d_m}{a_m}\mathbf {A}_m^{\top }\mathbf {A}_m + \mathbf {I}\right) \varvec{\mu }_m = \frac{d_m}{a_m}(\mathbf {A}_m^{\top }\mathbf {y}_m - \mathbf {A}_m^{\top }\mathbf {X}_m\varvec{\beta }_m^k-\mathbf {A}_m^{\top }\mathbf {z}_m)+\frac{1}{a_m}\mathbf {A}_m^{\top }\mathbf {u}_m^k\\&\quad + \varvec{\mu }^k-\frac{1}{a_m}\mathbf {q}_m^k. \end{aligned} \end{aligned}$$

Although it involves computing the inverse of large matrices, there are some tricks. Suppose the samples are arranged by nodes. Then the element at ith row and jth column of the matrix $\mathbf {A}_m$ is given by

$$\begin{aligned} A_{m,ij}=\left\{ \begin{array}{lcl} 1&{}&{}o_{m1}\le i=j\le o_{m2}\\ 0&{}&{} \textit{otherwise} \end{array}, \right. \end{aligned}$$

where $o_{m1}=n_1+n_2+\cdots +n_{m-1}$ and $o_{m2}=o_{m1}+n_m$. Due to these special structures, we divide $\varvec{\mu }_m$ into two parts $\varvec{\mu }_{m,in}$ and $\varvec{\mu }_{m,out}$, where $\varvec{\mu }_{m,in}$ is composed of the $o_{m1}$th to the $o_{m2}$th elements, and $\varvec{\mu }_{m,out}$ contains all the rest elements. Similar segmentation is also applied to $\varvec{\mu }^k$ and $\mathbf {q}_m^k$, which is $\varvec{\mu }_{in}^k$ , $\varvec{\mu }_{out}^k$, $\mathbf {q}_{m,in}^k$ and $\mathbf {q}_{m,out}^k$. Then we have

$$\begin{aligned} \varvec{\mu }_{m,in}^{k+1} = \frac{a_m}{d_m+a_m}\left\{ \frac{d_m}{a_m}(\mathbf {y}_m-\mathbf {X}_m\varvec{\beta }_m^k-\mathbf {z}_m^k)+\frac{1}{a_m}\mathbf {u}_m^k + \varvec{\mu }_{in}^k - \mathbf {q}_{m,in}^k\right\} . \end{aligned}$$

Since the problem can also be solved elementwisely, then for any $i=1,2,...,n_m$, we obtain that

$$\begin{aligned} \mu _{m,in,i}^{k+1} = \frac{d_m}{d_m+a_m}(y_{m,i}-\mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k-z_{m,i})+\frac{1}{d_m+a_m}(u_{m,in,i}^k+a_m\mu _{in,i}^k-q_{m,in,i}^k). \end{aligned}$$

Similarly, we can update $\varvec{\mu }_{m,out}^k$ elementwisely

$$\begin{aligned} \mu _{m,out,i}^{k+1}=\mu _{out,i}-\frac{1}{a_m}q_{m,out,i}^k. \end{aligned}$$

Update $\varvec{\mu }^{k+1}$: note that

$$\begin{aligned} \begin{aligned} \varvec{\mu }^{k+1}&= \arg \min _{\varvec{\mu }}\mathcal {L}(\varvec{\mu },\varvec{\beta }^k,\varvec{\eta }^k,\varvec{\mu }_1^k,...,\varvec{\mu }_M^k,\varvec{\beta }_1^k,...,\varvec{\beta }_M^k, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\qquad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \begin{aligned} \varvec{\mu }^{k+1}&= \arg \min _{\varvec{\mu }}\frac{c}{2}\Vert \mathbf {W}\varvec{\mu }-\varvec{\eta }\Vert ^2+<\mathbf {W}\varvec{\mu }-\varvec{\eta },\mathbf {s}>+\sum _{m=1}^{M}\Big (\frac{a_{m}}{2}\Vert \varvec{\mu }_{m}-\varvec{\mu }\Vert ^2+ \\&\qquad \langle \varvec{\mu }_{m}-\varvec{\mu },\mathbf {q}_m \rangle \Big ). \end{aligned} \end{aligned}$$

Thus, we can solve the following equations

$$\begin{aligned} \begin{aligned} 0&= c\left( \mathbf {W}^{\top }\mathbf {W}\varvec{\mu }-\mathbf {W}^{\top }\varvec{\eta }^k\right) +\mathbf {W}^{\top }\mathbf {s}^k+\sum _{m=1}^M\left( a_m\varvec{\mu }-a_m\varvec{\mu }_m^{k+1}-\mathbf {q}_m^{k+1}\right) \\&=c\left( \mathbf {W}^{\top }\mathbf {W}\varvec{\mu }-\mathbf {W}^{\top }\varvec{\eta }^k\right) +\mathbf {W}^{\top }\mathbf {s}^k+\sum _{m=1}^M a_m\varvec{\mu }-\sum _{m=1}^M a_m\varvec{\mu }_m^{k+1}-\sum _{m=1}^M \mathbf {q}_m^{k+1}. \end{aligned} \end{aligned}$$

We then have

$$\begin{aligned} \left( \mathbf {W}^{\top }\mathbf {W}+\frac{1}{c}\sum _{m=1}^Ma_m\mathbf {I}\right) \varvec{\mu }= c\mathbf {W}^{\top }\varvec{\eta }^k-\mathbf {W}^{\top }\mathbf {s}^k+\sum _{m=1}^M a_m\varvec{\mu }_m^{k+1}+\sum _{m=1}^M \mathbf {q}_m^{k+1}. \end{aligned}$$

Let $\varvec{\mu }_s^{k+1}=\sum _{m=1}^Ma_m\varvec{\mu }_m^{k+1}$ and $\mathbf {q}_s^{k+1} = \sum _{m=1}^Mq_m^{k+1}$. Note that $\mathbf {W}^{\top }\mathbf {W}= N\mathbf {I}- \mathbf {1}\mathbf {1}^{\top }$, and $\mathbf {W}$ is a sparse matrix. Let $r_{ij}$ denote the element at the ith row and the jth column of the matrix $\left( \mathbf {W}^{\top }\mathbf {W}++\frac{1}{c}\sum _{m=1}^Ma_m\mathbf {I}\right) ^{-1}$ and $S=N-1+\frac{1}{c}\sum _{m=1}^Ma_m$, we have

$$\begin{aligned} r_{ij}=\left\{ \begin{array}{lcl} \frac{RC + N - 2}{((N-1)-RC\times (N-2)-RC\times RC)\times (\sum _{m=1}^Ma_m-c)}&{}&{} i=j\\ \frac{1}{((N-1)-RC\times (N-2)-RC\times RC)\times (\sum _{m=1}^Ma_m-c)}&{}&{} i\ne j \end{array}, \right. \end{aligned}$$

The detail of updating $\varvec{\mu }^{k+1}$ is given in Algorithm 2, and the calculation complexity of updating $\varvec{\mu }^{k+1}$ is $O(N^2)$.

Update $\varvec{\beta }$: note that

$$\begin{aligned} \begin{aligned} \varvec{\beta }^{k+1}&=\arg \min _{\varvec{\beta }}\mathcal {L}(\varvec{\mu }^{k+1},\varvec{\beta },\varvec{\eta }^k,\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1}, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\qquad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \varvec{\beta }^{k+1}=\arg \min _{\varvec{\beta }}\sum _{m=1}^{M}\Big (\frac{b_{m}}{2}\Vert \varvec{\beta }_{m}-\varvec{\beta }\Vert ^2+ \langle \varvec{\beta }_{m}-\varvec{\beta },\mathbf {v}_m \rangle \Big ). \end{aligned}$$

We then solve the following equation

$$\begin{aligned} 0=\sum _{m=1}^Mb_m\varvec{\beta }-\sum _{m=1}^Mb_m\varvec{\beta }_m^{k+1}-\sum _{m=1}^M\mathbf {v}_m^{k+1}, \end{aligned}$$

which leads to

$$\begin{aligned} \varvec{\beta }^{k+1} = \frac{1}{\sum _{m=1}^Mb_m}\left( \sum _{m=1}^Mb_m\varvec{\beta }_m^{k+1}+\sum _{m=1}^M\mathbf {v}_m^{k+1}\right) . \end{aligned}$$

Update $\varvec{\eta }$: note that

$$\begin{aligned} \begin{aligned} \varvec{\eta }^{k+1}&=\arg \min _{\varvec{\eta }}\mathcal {L}(\varvec{\mu }^{k+1},\varvec{\beta }^{k+1},\varvec{\eta },\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1}, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\quad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \varvec{\eta }^{k+1} = \arg \min _{\varvec{\eta }}\left( P_N(|\varvec{\eta }|,\varvec{\pi }) + \frac{c}{2} \Vert \mathbf {W}\varvec{\mu }^{k+1} - \varvec{\eta }\Vert ^2 + \langle \mathbf {W}\varvec{\mu }^{k+1} - \varvec{\eta }, \mathbf {s}^k\rangle \right) . \end{aligned}$$

Since the penalty function is convex in $\eta _{ij}$, then 0 must be contained in the subdifferential of this function at point $\eta ^{k+1}_{ij}$.

$$\begin{aligned} \begin{aligned}&0 \in \partial \bigg \{P_N(\varvec{\eta }, \varvec{\pi })+\frac{c}{2}{\Vert W\varvec{\beta }^{k+1}-\varvec{\eta }\Vert }^2+<W\varvec{\eta }^{k+1}-\varvec{\eta },\mathbf {s}^k>\bigg \}\bigg |_{\varvec{\eta }=\varvec{\eta }^{k+1}}\\ \Rightarrow&0\in \partial P_N(\varvec{\eta }, \varvec{\pi })|_{\varvec{\eta }=\varvec{\eta }^{k+1}} + c\varvec{\eta }^{k+1}-cW\varvec{\mu }^{k+1}-\mathbf {s}^k. \end{aligned} \end{aligned}$$

Let $\delta _{ij}=\mathbf {W}_{ij}^T\varvec{\mu }^{k+1}+\frac{1}{c}s_{ij}$, we then have $L_1$ penalty:

$$\begin{aligned} {\hat{\eta }}_{ij}=T(\delta _{ij},\pi /c); \end{aligned}$$

MCP:

$$\begin{aligned} {\hat{\eta }}_{ij}=\left\{ \begin{array}{lcl} \frac{T(\delta _{ij},\pi /c)}{1-1/(\alpha c)}&{}&{}if |\delta _{ij}|\le \alpha \pi \\ \delta _{ij}&{}&{}if |\delta _{ij}|>\alpha \pi \end{array}; \right. \end{aligned}$$

SCAD:

$$\begin{aligned} {\hat{\eta }}_{ij}=\left\{ \begin{array}{lcl} T(\delta _{ij},\pi /c)&{}&{}if|\delta _{ij}|\le \pi +\pi /c\\ \frac{T(\delta _{ij},\alpha \pi /\{(\alpha -1)c\})}{1-1/\{(\alpha -1)c\}}&{}&{}if \pi +\pi /c<|\delta _{ij}|\le \alpha \pi \\ \delta _{ij}&{}&{}if |\delta _{ij}|>\alpha \pi \end{array}, \right. \end{aligned}$$

where $T(t,\pi )=sign(t)(|t|-\pi )_+$.

1.2 Proof of Theorem 1

Here we take $(\mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta })$ as a whole because their updates can be implemented elementwisely. By the definition of

$$\begin{aligned} \big (\mathbf {z}_1^{k+1},...,\mathbf {z}_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1},\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\eta }^{k+1}\big ), \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned}&\mathcal {L}(\varvec{\beta }^{k+1},\varvec{\mu }^{k+1}, \mathbf {z}_1^{k+1},...,\mathbf {z}_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1},\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\eta }^{k+1},\\&\qquad \mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\mathbf {q}_1^{k},...,\mathbf {q}_M^{k})\\&\quad \le \mathcal {L}(\varvec{\beta }^{k+1}, \varvec{\mu }^{k+1}, \mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta },\mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\\&\qquad \mathbf {q}_1^{k},...,\mathbf {q}_M^{k}), \end{aligned} \end{aligned}$$

for any $(\mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta })$.

It then follows that

$$\begin{aligned} \begin{aligned}&\mathcal {L}(\varvec{\beta }^{k+1},\varvec{\mu }^{k+1}, \mathbf {z}_1^{k+1},...,\mathbf {z}_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1},\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\eta }^{k+1},\\&\qquad \mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\mathbf {q}_1^{k},...,\mathbf {q}_M^{k})\\&\quad \le \inf _{\begin{array}{c} \mathbf {z},\varvec{\beta },\varvec{\mu },\varvec{\eta } \end{array} }\mathcal {L}(\varvec{\beta }^{k+1}, \mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta }, \mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\mathbf {q}_1^{k},...,\mathbf {q}_M^{k})\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {y}_m-\mathbf {X}_m\varvec{\beta }^{k+1}-\mathbf {A}_m\varvec{\mu }^{k+1}\Vert ^2 + P_N((\mathbf {W}\varvec{\mu }^{k+1}), \varvec{\pi })\triangleq l^{k+1}. \end{aligned} \end{aligned}$$

We write the objective function $\mathcal {L}$ in the following form

$$\begin{aligned} \mathcal {L} = \mathcal {L}_0+\mathcal {L}_1, \end{aligned}$$

where

$$\begin{aligned} \begin{aligned}&\mathcal {L}_0(\varvec{\beta },\varvec{\mu },\mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta },\mathbf {s},\mathbf {u}_1,...,\mathbf {u}_M,\mathbf {v}_1,...,\mathbf {v}_M,\mathbf {q}_1,...,\mathbf {q}_M)\\&\quad =\frac{c}{2}\Vert \mathbf {W}\varvec{\mu }-\varvec{\eta }\Vert ^2 +<\mathbf {W}\varvec{\mu }-\varvec{\eta }, \mathbf {s}>\\&\qquad +\sum _{m=1}^M\bigg (\frac{d_m}{2}\Vert \mathbf {y}_m-\mathbf {X}_m\varvec{\beta }_m-\mathbf {A}_m\mu _m-\mathbf {z}_m\Vert ^2\\&\qquad +<\mathbf {y}_m-\mathbf {X}_m\varvec{\beta }_m-\mathbf {A}_m\mu _m-\mathbf {z}_m,\mathbf {u}_m>\bigg )\\&\qquad +\sum _{m=1}^M\left( \frac{b_m}{2}\Vert \varvec{\beta }-\varvec{\beta }_m\Vert ^2+<\varvec{\beta }-\varvec{\beta }_m,\mathbf {v}_m>\right) \\&\qquad +\sum _{m=1}^M\left( \frac{a_m}{2}\Vert \varvec{\mu }-\varvec{\mu }_m\Vert ^2+<\varvec{\mu }-\varvec{\mu }_m,\mathbf {q}_m>\right) ,\\ \text {and}\\&\qquad \mathcal {L}_1(\varvec{\beta },\varvec{\mu },\mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta },\mathbf {s},\mathbf {u}_1,...,\mathbf {u}_M,\mathbf {v}_1,...,\mathbf {v}_M,\mathbf {q}_1,...,\mathbf {q}_M)\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {z}_m\Vert ^2 + P_N(\varvec{\eta }, \varvec{\pi }). \end{aligned} \end{aligned}$$

Since $\mathcal {L}_0$ is differentiable with respect to $(\mathbf {z}_m,\varvec{\beta }_m,\varvec{\mu }_m)$, where $m=1,...M$ and $\mathcal {L}_1$ is convex with respect to $\varvec{\eta }$. By Lemma 3.1 and Theorem 4.1 of Tseng (2001), we know that there exists a stationary point of $\mathcal {L}$, denoted as $(\varvec{\beta }^*,\mathbf {s}^*,\mathbf {u}_m^*, \mathbf {v}_m^*)$, where $m = 1, ... M$ and let $l^*$ be its corresponding value of the objective function $\mathcal {L}$.

Therefore we have

$$\begin{aligned} l^* = \lim _{k\rightarrow \infty }l^{k} = \lim _{k\rightarrow \infty }l^{k+t}, \end{aligned}$$

for any positive integer t. Note that

$$\begin{aligned} \left( \begin{array}{c} \mathbf {s}^{k+t}\\ \mathbf {u}_1^{k+t}\\ \vdots \\ \mathbf {u}_M^{k+t}\\ \mathbf {v}_1^{k+t}\\ \vdots \\ \mathbf {v}_M^{k+t}\\ \mathbf {q}_1^{k+t}\\ \vdots \\ \mathbf {q}_M^{k+t} \end{array}\right) = \left( \begin{array}{c} \mathbf {s}^{k}\\ \mathbf {u}_1^{k}\\ \vdots \\ \mathbf {u}_M^{k}\\ \mathbf {v}_1^{k}\\ \vdots \\ \mathbf {v}_M^{k}\\ \mathbf {q}_1^{k}\\ \vdots \\ \mathbf {q}_M^{k} \end{array}\right) + \sum _{i=1}^{t-1}\left( \begin{array}{c} c(\mathbf {W}\varvec{\mu }^{k+i}-\varvec{\eta }^{k+i})\\ d_1(\mathbf {y}_1 - \mathbf {X}_1\varvec{\beta }_1^{k+i} -\mathbf {A}_m\mu _m^{k+i}- \mathbf {z}_1^{k+i})\\ \vdots \\ d_M(\mathbf {y}_M - \mathbf {X}_M\varvec{\beta }_M^{k+i} -\mathbf {A}_m\mu _m^{k+i}- \mathbf {z}_M^{k+i})\\ b_1(\varvec{\beta }^{k+i}-\varvec{\beta }_1^{k+i})\\ \vdots \\ b_M(\varvec{\beta }^{k+i}-\varvec{\beta }_M^{k+i})\\ a_1(\varvec{\mu }^{k+i}-\varvec{\mu }_1^{k+i})\\ \vdots \\ a_M(\varvec{\mu }^{k+i}-\varvec{\mu }_M^{k+i}) \end{array} \right) , \end{aligned}$$

so we have

$$\begin{aligned}&\mathcal {L}(\varvec{\beta }^{k+1},\varvec{\mu }^{k+1}, \mathbf {z}_1^{k+t+1},...,\mathbf {z}_M^{k+t+1},\varvec{\beta }_1^{k+t+1},...,\varvec{\beta }_M^{k+t+1},\varvec{\mu }_1^{k+t+1},...,\varvec{\mu }_M^{k+t+1},\varvec{\eta }^{k+t+1},\\&\qquad \mathbf {s}^{k+t},\mathbf {u}_1^{k+t},...,\mathbf {u}_M^{k+t},\mathbf {v}_1^{k+t},...,\mathbf {v}_M^{k+t},\mathbf {q}_1^{k+t},...,\mathbf {q}_M^{k+t})\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {z}_m^{k+t+1}\Vert ^2 + P_N(\varvec{\eta }_j^{k+t+1}, \varvec{\pi })\\&\qquad +\frac{d_1}{2}\Vert \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {A}_m\mu _1^{k+t+1}-\mathbf {z}_1^{k+t+1}\Vert ^2\\&\qquad +...+\frac{d_M}{2}\Vert \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {A}_M\mu _M^{k+t+1}-\mathbf {z}_M^{k+t+1}\Vert ^2\\&\qquad +\frac{b_1}{2}\Vert \varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1}\Vert ^2+...+\frac{b_M}{2}\Vert \varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1}\Vert ^2 +\frac{c}{2}\Vert \mathbf {W}\varvec{\beta }^{k+t+1}-\varvec{\eta }^{k+t+1}\Vert ^2\\&\qquad +\frac{a_1}{2}\Vert \varvec{\mu }^{k+t+1}-\varvec{\mu }_1^{k+t+1}\Vert ^2+...+\frac{a_M}{2}\Vert \varvec{\mu }^{k+t+1}-\varvec{\mu }_M^{k+t+1}\Vert ^2\\&\qquad +\mathbf {u}_1^{k+tT}(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {z}_1^{k+t+1})+...+\mathbf {u}_M^{k+tT}(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {z}_M^{k+t+1})\\&\qquad +\mathbf {v}_1^{k+tT}(\varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1})+...+\mathbf {v}_M^{k+tT}(\varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1})\\&\qquad +\mathbf {q}_1^{k+tT}(\varvec{\mu }^{k+t+1}-\varvec{\mu }_1^{k+t+1})+...+\mathbf {q}_M^{k+tT}(\varvec{\mu }^{k+t+1}-\varvec{\mu }_M^{k+t+1})\\&\qquad +\mathbf {s}^{k+tT}(\mathbf {W}\varvec{\mu }^{k+t+1}-\varvec{\eta }^{k+t+1})\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {z}_m^{k+t+1}\Vert ^2 + P_N(\varvec{\eta }_j^{k+t+1}, \varvec{\pi })\\&\qquad +\frac{a_1}{2}\Vert \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {z}_1^{k+t+1}\Vert ^2+...+\frac{a_M}{2}\Vert \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {z}_M^{k+t+1}\Vert ^2\\&\qquad +\frac{b_1}{2}\Vert \varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1}\Vert ^2+...+\frac{b_M}{2}\Vert \varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1}\Vert ^2 +\frac{c}{2}\Vert \mathbf {W}\varvec{\beta }^{k+t+1}-\varvec{\eta }^{k+t+1}\Vert ^2\\&\qquad +\mathbf {u}_1^{kT}(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {z}_1^{k+t+1})+...+\mathbf {u}_M^{kT}(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {z}_M^{k+t+1})\\&\qquad +\mathbf {v}_1^{kT}(\varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1})+...+\mathbf {v}_M^{kT}(\varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1})+\mathbf {s}^{kT}(\mathbf {W}\varvec{\beta }^{k+t+1}-\varvec{\eta }^{k+t+1})\\&\qquad +\sum _{i=1}^{t-1}d_1(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+i}-\mathbf {A}_1\mu _1^{k+i}-\mathbf {z}_1^{k+i})^{\top }(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {A}_1\mu _1^{k+t+1}-\mathbf {z}_1^{k+t+1})\\&\qquad \vdots \\&\qquad +\sum _{i=1}^{t-1}d_M(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+i}-\mathbf {A}_M\mu _M^{k+i}-\mathbf {z}_M^{k+i})^{\top }(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {A}_M\mu _M^{k+t+1}\\&\qquad -\mathbf {z}_M^{k+t+1})+\sum _{i=1}^{t-1}b_1(\varvec{\beta }^{k+i}-\varvec{\beta }_1^{k+i})^{\top }(\varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1})\\&\qquad \vdots \\&\qquad +\sum _{i=1}^{t-1}b_M(\varvec{\beta }^{k+i}-\varvec{\beta }_M^{k+i})^{\top }(\varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1})+\sum _{i=1}^{t-1}a_1(\varvec{\mu }^{k+i}-\varvec{\mu }_1^{k+i})^{\top }(\varvec{\mu }^{k+t+1}\\&\qquad -\varvec{\mu }_1^{k+t+1})\\&\qquad \vdots \\&\qquad +\sum _{i=1}^{t-1}a_M(\varvec{\mu }^{k+i}-\varvec{\mu }_M^{k+i})^{\top }(\varvec{\mu }^{k+t+1}-\varvec{\mu }_M^{k+t+1})+\sum _{i=1}^{t-1}c(\mathbf {W}\varvec{\mu }^{k+i}-\varvec{\eta }^{k+i})^{\top }(\mathbf {W}\varvec{\mu }^{k+t+1}\\&\qquad -\varvec{\eta }^{k+t+1})\\&\quad \le l^{k+t+1}. \end{aligned}$$

Hence, we obtain that

$$\begin{aligned}&\lim _{k\rightarrow \infty }\mathcal {L}(\varvec{\beta }^{k+1},\varvec{\mu }^{k+1}, \mathbf {z}_1^{k+t+1},...,\mathbf {z}_M^{k+t+1},\varvec{\beta }_1^{k+t+1},...,\varvec{\beta }_M^{k+t+1},\varvec{\mu }_1^{k+t+1},...,\varvec{\mu }_M^{k+t+1},\varvec{\eta }^{k+t+1},\\&\qquad \mathbf {s}^{k+t},\mathbf {u}_1^{k+t},...,\mathbf {u}_M^{k+t},\mathbf {v}_1^{k+t},...,\mathbf {v}_M^{k+t},\mathbf {q}_1^{k+t},...,\mathbf {q}_M^{k+t})\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {z}_m^*\Vert ^2 + P_N(\varvec{\eta }_j^*, \varvec{\pi })\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {u}_1^{k\top }(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {A}_1\varvec{\mu }_1^*-\mathbf {z}_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {u}_M^{k\top }(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {A}_M\varvec{\mu }_M^*\\&\qquad -\mathbf {z}_M^*)+\lim _{k\rightarrow \infty }\mathbf {v}_1^{k\top }(\varvec{\beta }^*-\varvec{\beta }_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {v}_M^{k\top }(\varvec{\beta }^*-\varvec{\beta }_M^*)\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {q}_1^{k\top }(\varvec{\mu }^*-\varvec{\mu }_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {q}_M^{k\top }(\varvec{\mu }^*-\varvec{\mu }_M^*)\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {s}^{k\top }(\mathbf {W}\varvec{\mu }^*-\varvec{\eta }^*)\\&\qquad +\left( t-\frac{1}{2}\right) (a_1\Vert \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {z}_1^*\Vert ^2+...+a_M\Vert \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {z}_M^*\Vert ^2)\\&\qquad +\left( t-\frac{1}{2}\right) (b_1\Vert \varvec{\beta }^*-\varvec{\beta }_1^*\Vert ^2+...+b_M\Vert \varvec{\beta }^*-\varvec{\beta }_M^*\Vert ^2)\\&\qquad +\left( t-\frac{1}{2}\right) (c\Vert \mathbf {W}\varvec{\beta }^*-\varvec{\eta }^*\Vert ^2)\\&\quad =l^*+\lim _{k\rightarrow \infty }\mathbf {u}_1^{k\top }(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {A}_1\varvec{\mu }_1^*-\mathbf {z}_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {u}_M^{k\top }(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {A}_M\varvec{\mu }_m^*\\&\qquad -\mathbf {z}_M^*)+\lim _{k\rightarrow \infty }\mathbf {v}_1^{k\top }(\varvec{\beta }^*-\varvec{\beta }_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {v}_M^{k\top }(\varvec{\beta }^*-\varvec{\beta }_M^*)\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {q}_1^{k\top }(\varvec{\mu }^*-\varvec{\mu }_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {q}_M^{k\top }(\varvec{\mu }^*-\varvec{\mu }_M^*)\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {s}^{k\top }(\mathbf {W}\varvec{\mu }^*-\varvec{\eta }^*)\\&\qquad +\left( t-\frac{1}{2}\right) (d_1\Vert \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {A}_1\varvec{\mu }_1^*-\mathbf {z}_1^*\Vert ^2+...+d_M\Vert \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {A}_M\varvec{\mu }_M^*\\&\qquad -\mathbf {z}_M^*\Vert ^2)+\left( t-\frac{1}{2}\right) (b_1\Vert \varvec{\beta }^*-\varvec{\beta }_1^*\Vert ^2+...+b_M\Vert \varvec{\beta }^*-\varvec{\beta }_M^*\Vert ^2)\\&\qquad +\left( t-\frac{1}{2}\right) (a_1\Vert \varvec{\mu }^*-\varvec{\mu }_1^*\Vert ^2+...+a_M\Vert \varvec{\mu }^*-\varvec{\mu }_M^*\Vert ^2)\\&\qquad +\left( t-\frac{1}{2}\right) (c\Vert \mathbf {W}\varvec{\mu }^*-\varvec{\eta }^*\Vert ^2)\\&\quad \le l^*, \end{aligned}$$

for any positive integer t. It then follows that

$$\begin{aligned} \lim _{k\rightarrow \infty }\Vert \mathbf {r}^{k} \Vert =\lim _{k\rightarrow \infty }\left\Vert \left( \begin{array}{c} \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k}-\mathbf {A}_1\varvec{\mu }_1^{k}-\mathbf {z}_1^{k}\\ ...\\ \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k}-\mathbf {A}_M\varvec{\mu }_M^{k}-\mathbf {z}_M^{k}\\ \varvec{\beta }^{k} - \varvec{\beta }_1^{k}\\ ...\\ \varvec{\beta }^{k} - \varvec{\beta }_M^{k}\\ \varvec{\mu }^{k} - \varvec{\mu }_1^{k}\\ ...\\ \varvec{\mu }^{k} - \varvec{\mu }_M^{k}\\ \mathbf {W}\varvec{\mu }^{k} - \varvec{\eta }^{k} \end{array} \right) \right\Vert =\left\Vert \left( \begin{array}{c} \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {A}_1\varvec{\mu }_1^*-\mathbf {z}_1^*\\ ...\\ \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {A}_M\varvec{\mu }_M^*-\mathbf {z}_M^*\\ \varvec{\beta }^* - \varvec{\beta }_1^*\\ ...\\ \varvec{\beta }^* - \varvec{\beta }_M^*\\ \varvec{\mu }^* - \varvec{\mu }_1^*\\ ...\\ \varvec{\mu }^* - \varvec{\mu }_M^*\\ \mathbf {W}\varvec{\mu }^* - \varvec{\eta }^* \end{array} \right) \right\Vert =0. \end{aligned}$$

Thus, the primal feasibility condition is met.

For the dual residual, by definition, $\varvec{\mu }^{{k+1}}$ minimizes

$$\begin{aligned} \mathcal {L}(\varvec{\beta },\mathbf {z}_1^{k},...,\mathbf {z}_M^{k},\varvec{\beta }_1^{k},...,\varvec{\beta }_M^{k},\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta }^{k},\mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\mathbf {q}_1^{k},...,\mathbf {q}_M^{k}), \end{aligned}$$

so we obtain that

$$\begin{aligned} \mathbf {0}_p\in \left( \frac{\partial \mathcal {L}}{\partial \varvec{\mu }}\right) . \end{aligned}$$

Therefore, we have

$$\begin{aligned} \begin{aligned} \mathbf {0}_p&=c\mathbf {W}^{\top }(\mathbf {W}\varvec{\mu }^{k+1}-\varvec{\eta }^{k}) + \sum _{m=1}^Ma_m(\varvec{\mu }^{k+1}-\varvec{\mu }_m^{k}) + \mathbf {W}^{\top }\mathbf {s}^{k}+\sum _{m=1}^M\mathbf {q}_m^{k}\\&=c\mathbf {W}^{\top }(\mathbf {W}\varvec{\mu }^{k+1}-\varvec{\eta }^{k+1}) + \sum _{m=1}^Ma_m(\varvec{\mu }^{k+1}-\varvec{\mu }_m^{k+1}) + \mathbf {W}^{\top }\mathbf {s}^{k+1}+\sum _{m=1}^M\mathbf {q}_m^{k+1}\\&\quad +c\mathbf {W}^{\top }(\varvec{\eta }^{k+1}-\varvec{\eta }^{k}) + \sum _{m=1}^Ma_m(\varvec{\mu }_m^{k+1}-\varvec{\mu }_m^{k})- \mathbf {W}^{\top }(\mathbf {s}^{k+1}-\mathbf {s}^{k})-\sum _{m=1}^M(\mathbf {q}_m^{k+1}-\mathbf {q}_m^{k})\\&=\mathbf {W}^{\top }\mathbf {s}^{k+1}+\sum _{m=1}^M\mathbf {q}_m^{k+1}+c\mathbf {W}^{\top }(\varvec{\eta }^{k+1}-\varvec{\eta }^{k}) + \sum _{m=1}^Ma_m(\varvec{\mu }_m^{k+1}-\varvec{\mu }_m^{k}). \end{aligned} \end{aligned}$$

Thus, the dual residual is given by

$$\begin{aligned} \mathbf {d}^{k+1} = -c\mathbf {W}^{\top }(\varvec{\eta }^{k+1}-\varvec{\eta }^{k}) - \sum _{m=1}^Ma_m(\varvec{\mu }_m^{k+1}-\varvec{\mu }_m^{k}). \end{aligned}$$

Furthermore, we obtain that

$$\begin{aligned} \lim _{k\rightarrow \infty }\mathbf {d}^{k+1} = -c\mathbf {W}^{\top }(\varvec{\eta }^*-\varvec{\eta }^*) - \sum _{m=1}^Mb_m(\varvec{\mu }_m^*-\varvec{\mu }_m^*) = \mathbf {0}_p, \end{aligned}$$

which implies that the dual feasibility is satisfied. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, S., Feng, X. Distributed identification of heterogeneous treatment effects. Comput Stat 37, 57–89 (2022). https://doi.org/10.1007/s00180-021-01114-2

Download citation

Received: 05 April 2020
Accepted: 17 May 2021
Published: 29 May 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00180-021-01114-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed identification of heterogeneous treatment effects

Abstract

Access this article

Similar content being viewed by others

Communication-efficient estimation for distributed subset selection

Multiple imputation for analysis of incomplete data in distributed health data networks

DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 The computing details of the algorithm DisSRADMM

1.2 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed identification of heterogeneous treatment effects

Abstract

Access this article

Similar content being viewed by others

Communication-efficient estimation for distributed subset selection

Multiple imputation for analysis of incomplete data in distributed health data networks

DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 The computing details of the algorithm DisSRADMM

1.2 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation