Skip to main content
Log in

Distributed identification of heterogeneous treatment effects

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In many areas including precise medical treatments and financial investments, analysis of heterogeneous treatment effects has become important. In this paper, we focus on identifying subgroups by combining data in a distributed storage system. We propose a distributed algorithm based on the alternating direction method of multipliers, which can well preserve privacy of subjects. This method can deal with large-scale data, and perform well in identifying subgroups if there exist sufficient samples in a whole distributed storage system but not necessarily in every computing node. Our numerical study indicates that the proposed method is promising in many interesting cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Abe M, Nakagawa K (2019) Deep learning for multi-factor models in regional and global stock markets. In: JSAI international symposium on artificial intelligence. Springer, pp 87–102

  • Abe M, Nakagawa K (2020) Cross-sectional stock price prediction using deep learning for actual investment management. arXiv preprint arXiv:2002.06975

  • Becker SO, Egger PH, Von Ehrlich M (2013) Absorptive capacity and the growth and investment effects of regional transfers: a regression discontinuity design with heterogeneous treatment effects. Am Econ J Econ Policy 5(4):29–77

    Article  Google Scholar 

  • Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010. Springer, pp 177–186

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn 3(1):1–122

    MATH  Google Scholar 

  • Dewandaru G, Rizvi SAR, Masih R, Masih M, Alhabshi SO (2014) Stock market co-movements: Islamic versus conventional equity indices with multi-timescales analysis. Econ Syst 38(4):553–571

    Article  Google Scholar 

  • Eryiğit M, Eryiğit R (2009) Network structure of cross-correlations among the world market indices. Physica A 388(17):3551–3562

    Article  Google Scholar 

  • Everitt B, Hand D (1981) Finite mixture distributions. Chapman and Hall, New York

    Book  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38:1041–1046

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin

    Book  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • Lee JYL, Brown JJ, Ryan LM (2017) Sufficiency revisited: rethinking statistical algorithms in the big data era. Am Stat 71:202–208

    Article  MathSciNet  Google Scholar 

  • Lu Z, Leen TK (2007) Penalized probabilistic clustering. Neural Comput 19(6):1528–1567

    Article  MathSciNet  Google Scholar 

  • Ma S, Huang J (2017) A concave pairwise fusion approach to subgroup analysis. J Am Stat Assoc 112(517):410–423

    Article  MathSciNet  Google Scholar 

  • McAuliffe JD, Blei DM, Jordan MI (2006) Nonparametric empirical Bayes for the Dirichlet process mixture model. Stat Comput 16(1):5–14

    Article  MathSciNet  Google Scholar 

  • Ouyang H, He N, Tran L, Gray A (2013) Stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 80–88

  • Pauler DK, Laird NM (2000) A mixture model for longitudinal data with application to assessment of noncompliance. Biometrics 56(2):464–472

    Article  Google Scholar 

  • Qiao H, Xia Y, Li Y (2016) Can network linkage effects determine return? Evidence from Chinese stock market. PLoS ONE 11(6):e0156784

    Article  Google Scholar 

  • Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B (Stat Methodol) 59(4):731–792

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494

    Article  MathSciNet  Google Scholar 

  • Valentino SW, Moore JE, Cleveland MJ, Greenberg MT, Tan X (2014) Profiles of financial stress over time using subgroup analysis. J Fam Econ Issues 35(1):51–64

    Article  Google Scholar 

  • Wang H, Banerjee A (2013) Online alternating direction method (longer version). Technical report. arXiv:1306.3721

  • Wang H, Li R, Tsai CL (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568

    Article  MathSciNet  Google Scholar 

  • Wang X, Grimson WEL, Westin CF (2011) Tractography segmentation using a hierarchical Dirichlet processes mixture model. NeuroImage 54(1):290–302

    Article  Google Scholar 

  • Wu Z, Yu J (2019) A multi-level descriptor using ultra-deep feature for image retrieval. Multimed Tools Appl 78(18):25655–25672

    Article  Google Scholar 

  • Xu L, Krzyzak A, Oja E (1993) Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Trans Neural Netw 4(4):636–649

    Article  Google Scholar 

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942

    Article  MathSciNet  Google Scholar 

  • Zhang R, Huang C, Zhang W, Chen S et al (2018) Multi factor stock selection model based on LSTM. Int J Econ Financ 10(8):1–36

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to anonymous reviewers for their comments and suggestions on an earlier draft of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingdong Feng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xingdong Feng is the Corresponding Author, and his research was partially supported by National Natural Science Foundation of China (11971292 and 11690012), and Program for Innovative Research Team of Shanghai University of Finance and Economics.

Appendix

Appendix

1.1 The computing details of the algorithm DisSRADMM

In this section we give the details on solving problems (10)–(13) in the proposed algorithm DisSRADMM.

Update \(\mathbf {z}_m^{k+1}\) : consider

$$\begin{aligned} \begin{aligned} \mathbf {z}^{k+1}_{m}&= \arg \min _{\mathbf {z}_m}\mathcal {L}(\varvec{\mu }^k,\varvec{\beta }^k,\varvec{\eta }^k,\mathbf {z}_m^k,\mathbf {z}_{-m}^k,\varvec{\beta }_1^k,...,\varvec{\beta }_M^k, \varvec{\mu }_1^k,...,\varvec{\mu }_M^k, \\&\qquad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k). \end{aligned} \end{aligned}$$

So we have

$$\begin{aligned} \begin{aligned} \mathbf {z}_m^{k+1}&= \arg \min _{\mathbf {z}_m}\bigg \{\frac{1}{2}\Vert \mathbf {z}_m\Vert ^2 + \frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m\Vert ^2\\&\quad +\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m,\mathbf {u}_m^k \rangle \bigg \}. \end{aligned} \end{aligned}$$

The previous problem can be solved elementwisely, so for any \(i=1,2,...,n_m\), we carry out the following updating procedure

$$\begin{aligned} \begin{aligned} z_{m,i}^{k+1}&= \arg \min _{z_{m,i}}\bigg \{\frac{1}{2}\Vert z_{m,i}\Vert ^2 +\frac{d_{m}}{2}\Vert \mathbf {y}_{m,i} - \mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k - \mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-\mathbf {z}_{m,i}\Vert ^2\\&\quad +\langle \mathbf {y}_{m,i}- \mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k - \mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-\mathbf {z}_{m,i},\mathbf {u}_{m,i}^k \rangle \bigg \}, \end{aligned} \end{aligned}$$

where \(\mathbf {A}_{m,i}\) represents the ith row vector of the matrix \(\mathbf {A}_m\). Thus, we consider the following equations

$$\begin{aligned} \begin{aligned} 0&=\frac{\partial }{\partial \mathbf {z}_m}\bigg \{\frac{1}{2}\Vert z_{m,i}\Vert ^2 +\frac{d_{m}}{2}\Vert \mathbf {y}_{m,i} - \mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k - \mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-\mathbf {z}_{m,i}\Vert ^2\\&\quad +\langle \mathbf {y}_{m,i}- \mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k - \mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-\mathbf {z}_{m,i},\mathbf {u}_{m,i}^k \rangle \bigg \}\\&=z_{m,i} - d_m(y_{m,i}-\mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k-\mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k-z_{m,i})-u_{m,i}^k. \end{aligned} \end{aligned}$$

It then follows that

$$\begin{aligned} z_{m,i} = \left\{ d_m(y_{m,i}-\mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k-\mathbf {A}_{m,i}^{\top }\varvec{\mu }_m^k)+u_{m,i}^k\right\} \bigg /\left( d_m+1 \right) . \end{aligned}$$

Update \(\varvec{\beta }_m^{k+1}\): note that

$$\begin{aligned} \begin{aligned} \varvec{\beta }^{k+1}_{m}&= \arg \min _{\varvec{\beta }_m}\mathcal {L}(\varvec{\mu }^k,\varvec{\beta },\varvec{\beta }_m,\varvec{\beta }_{-m}^k,\varvec{\eta }^k,\varvec{\mu }_1^k,...,\varvec{\mu }_M^k, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\quad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \begin{aligned} \varvec{\beta }_m^{k+1}&= \arg \min _{\varvec{\beta }_m}\bigg \{\frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m^k\Vert ^2\\&\quad +\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m^k,\mathbf {u}_m^k \rangle + \frac{b_{m}}{2}\Vert \varvec{\beta }_{m}-\varvec{\beta }^k\Vert ^2\\&\quad + \langle \varvec{\beta }_{m}-\varvec{\beta }^k,\mathbf {v}_m^k \rangle \bigg \}. \end{aligned} \end{aligned}$$

We then have the following equations

$$\begin{aligned} \begin{aligned} 0&= \frac{\partial }{\partial \varvec{\beta }_m}\bigg \{\frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m^k\Vert ^2+\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m - \mathbf {A}_m\varvec{\mu }_m^k-\mathbf {z}_m^k,\mathbf {u}_m^k \rangle \\&\quad + \frac{b_{m}}{2}\Vert \varvec{\beta }_{m}-\varvec{\beta }^k\Vert ^2+ \langle \varvec{\beta }_{m}-\varvec{\beta }^k,\mathbf {v}_m^k \rangle \bigg \}\\&=-d_m\left( \mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\mathbf {X}_m\varvec{\beta }_m^k - \mathbf {X}_m^{\top }\mathbf {A}_m\varvec{\mu }_m-\mathbf {X}_m^{\top }\mathbf {z}_m^k \right) - \mathbf {X}_m^{\top }\mathbf {u}_m^k\\&\quad +b_m(\varvec{\beta }_m-\varvec{\beta }^k) + \mathbf {v}_m^k\\&=(d_m\mathbf {X}_m^{\top }\mathbf {X}_m + b_m\mathbf {I})\varvec{\beta }_m - d_m(\mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\mathbf {A}_m\varvec{\mu }_m^k-\mathbf {X}_m^{\top }\mathbf {z}_m^k)-\mathbf {X}_m^{\top }\mathbf {u}_m^k - b_m\varvec{\beta }^k + \mathbf {v}_m^k. \end{aligned} \end{aligned}$$

Hence, it follows that

$$\begin{aligned} \begin{aligned} (d_m\mathbf {X}_m^{\top }\mathbf {X}_m + b_m\mathbf {I})\varvec{\beta }_m&= d_m[\mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\mathbf {A}_m\varvec{\mu }_m^k-\mathbf {X}_m^{\top }\mathbf {z}_m^k]+\mathbf {X}_m^{\top }\mathbf {u}_m^k+ b_m\varvec{\beta }^k\\&\quad -\mathbf {v}_m^k, \end{aligned} \end{aligned}$$

which is equivalent to

$$\begin{aligned} \begin{aligned} \left( \frac{d_m}{b_m}\mathbf {X}_m^{\top }\mathbf {X}_m + \mathbf {I}\right) \varvec{\beta }_m&= \frac{d_m}{b_m}(\mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\mathbf {A}_m\varvec{\mu }_m^k-\mathbf {X}_m^{\top }\mathbf {z}_m)+\frac{1}{b_m}\mathbf {X}_m^{\top }\mathbf {u}_m^k\\&\quad + \varvec{\beta }^k-\frac{1}{b_m}\mathbf {v}_m^k. \end{aligned} \end{aligned}$$

Let \(o_1=n_1+n_2+...+n_{m-1}\) and \(o_2=o_1+m_m\). \(\varvec{\mu }_{m,in}\) represents the elements between \(o_1\)th and \(o_2\)th and \(\varvec{\mu }_{m,out}\) represents all the other elements in \(\varvec{\mu }_m\). Then, we have

$$\begin{aligned} \begin{aligned} \varvec{\beta }_m^{k+1}&= \left( \frac{d_m}{b_m}\mathbf {X}_m^{\top }\mathbf {X}_m + \mathbf {I}\right) ^{-1}\bigg \{\frac{d_m}{b_m}(\mathbf {X}_m^{\top }\mathbf {y}_m - \mathbf {X}_m^{\top }\varvec{\mu }_{m,in}^k-\mathbf {X}_m^{\top }\mathbf {z}_m)+\frac{1}{b_m}\mathbf {X}_m^{\top }\mathbf {u}_m^k\\&\quad + \varvec{\beta }^k-\frac{1}{b_m}\mathbf {v}_m^k\bigg \}. \end{aligned} \end{aligned}$$

Update \(\varvec{\mu }_m^{k+1}\): consider

$$\begin{aligned} \begin{aligned} \varvec{\mu }^{k+1}_{m}&= \arg \min _{\varvec{\mu }_m}\mathcal {L}(\varvec{\mu }^k,\varvec{\beta }^k,\varvec{\eta }^k,\varvec{\mu }_m,\varvec{\mu }_{-m}^k,\varvec{\beta }_1^k,...,\varvec{\beta }_M^k, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\quad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \begin{aligned} \varvec{\mu }_m^{k+1}&= \arg \min _{\varvec{\mu }_m}\bigg \{\frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m-\mathbf {z}_m^k\Vert ^2\\&\quad +\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m-\mathbf {z}_m^k,\mathbf {u}_m^k \rangle + \frac{a_{m}}{2}\Vert \varvec{\mu }_{m}-\varvec{\mu }^k\Vert ^2\\&\quad + \langle \varvec{\mu }_{m}-\varvec{\mu }^k,\mathbf {q}_m^k \rangle \bigg \}. \end{aligned} \end{aligned}$$

This optimization problem can be solved by considering the following equations

$$\begin{aligned} \begin{aligned} 0&= \frac{\partial }{\partial \varvec{\mu }_m}\bigg \{\frac{d_{m}}{2}\Vert \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m-\mathbf {z}_m^k\Vert ^2+\langle \mathbf {y}_m - \mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m\varvec{\mu }_m-\mathbf {z}_m^k,\mathbf {u}_m^k \rangle \\&\quad + \frac{a_{m}}{2}\Vert \varvec{\mu }_{m}-\varvec{\mu }\Vert ^2+ \langle \varvec{\mu }_{m}-\varvec{\mu },\mathbf {q}_m^k \rangle \bigg \}\\&=-d_m\left( \mathbf {A}_m^{\top }\mathbf {y}_m - \mathbf {A}_m^{\top }\mathbf {X}_m\varvec{\beta }_m^k - \mathbf {A}_m^{\top }\mathbf {A}_m\varvec{\mu }_m-\mathbf {A}_m^{\top }\mathbf {z}_m^k \right) - \mathbf {A}_m^{\top }\mathbf {u}_m^k\\&\quad +a_m(\varvec{\mu }_m-\varvec{\mu }^k) + \mathbf {q}_m^k\\&=(d_m\mathbf {A}_m^{\top }\mathbf {A}_m + a_m\mathbf {I})\varvec{\mu }_m - d_m(\mathbf {A}_m^{\top }\mathbf {y}_m - \mathbf {A}_m^{\top }\mathbf {X}_m\varvec{\beta }_m^k-\mathbf {A}_m^{\top }\mathbf {z}_m^k)-\mathbf {A}_m^{\top }\mathbf {u}_m^k - a_m\varvec{\mu }^k + \mathbf {q}_m^k. \end{aligned} \end{aligned}$$

Hence, we obtain that

$$\begin{aligned} \begin{aligned}&\left( \frac{d_m}{a_m}\mathbf {A}_m^{\top }\mathbf {A}_m + \mathbf {I}\right) \varvec{\mu }_m = \frac{d_m}{a_m}(\mathbf {A}_m^{\top }\mathbf {y}_m - \mathbf {A}_m^{\top }\mathbf {X}_m\varvec{\beta }_m^k-\mathbf {A}_m^{\top }\mathbf {z}_m)+\frac{1}{a_m}\mathbf {A}_m^{\top }\mathbf {u}_m^k\\&\quad + \varvec{\mu }^k-\frac{1}{a_m}\mathbf {q}_m^k. \end{aligned} \end{aligned}$$

Although it involves computing the inverse of large matrices, there are some tricks. Suppose the samples are arranged by nodes. Then the element at ith row and jth column of the matrix \(\mathbf {A}_m\) is given by

$$\begin{aligned} A_{m,ij}=\left\{ \begin{array}{lcl} 1&{}&{}o_{m1}\le i=j\le o_{m2}\\ 0&{}&{} \textit{otherwise} \end{array}, \right. \end{aligned}$$

where \(o_{m1}=n_1+n_2+\cdots +n_{m-1}\) and \(o_{m2}=o_{m1}+n_m\). Due to these special structures, we divide \(\varvec{\mu }_m\) into two parts \(\varvec{\mu }_{m,in}\) and \(\varvec{\mu }_{m,out}\), where \(\varvec{\mu }_{m,in}\) is composed of the \(o_{m1}\)th to the \(o_{m2}\)th elements, and \(\varvec{\mu }_{m,out}\) contains all the rest elements. Similar segmentation is also applied to \(\varvec{\mu }^k\) and \(\mathbf {q}_m^k\), which is \(\varvec{\mu }_{in}^k\) , \(\varvec{\mu }_{out}^k\), \(\mathbf {q}_{m,in}^k\) and \(\mathbf {q}_{m,out}^k\). Then we have

$$\begin{aligned} \varvec{\mu }_{m,in}^{k+1} = \frac{a_m}{d_m+a_m}\left\{ \frac{d_m}{a_m}(\mathbf {y}_m-\mathbf {X}_m\varvec{\beta }_m^k-\mathbf {z}_m^k)+\frac{1}{a_m}\mathbf {u}_m^k + \varvec{\mu }_{in}^k - \mathbf {q}_{m,in}^k\right\} . \end{aligned}$$

Since the problem can also be solved elementwisely, then for any \(i=1,2,...,n_m\), we obtain that

$$\begin{aligned} \mu _{m,in,i}^{k+1} = \frac{d_m}{d_m+a_m}(y_{m,i}-\mathbf {x}_{m,i}^{\top }\varvec{\beta }_m^k-z_{m,i})+\frac{1}{d_m+a_m}(u_{m,in,i}^k+a_m\mu _{in,i}^k-q_{m,in,i}^k). \end{aligned}$$

Similarly, we can update \(\varvec{\mu }_{m,out}^k\) elementwisely

$$\begin{aligned} \mu _{m,out,i}^{k+1}=\mu _{out,i}-\frac{1}{a_m}q_{m,out,i}^k. \end{aligned}$$

Update \(\varvec{\mu }^{k+1}\): note that

$$\begin{aligned} \begin{aligned} \varvec{\mu }^{k+1}&= \arg \min _{\varvec{\mu }}\mathcal {L}(\varvec{\mu },\varvec{\beta }^k,\varvec{\eta }^k,\varvec{\mu }_1^k,...,\varvec{\mu }_M^k,\varvec{\beta }_1^k,...,\varvec{\beta }_M^k, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\qquad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \begin{aligned} \varvec{\mu }^{k+1}&= \arg \min _{\varvec{\mu }}\frac{c}{2}\Vert \mathbf {W}\varvec{\mu }-\varvec{\eta }\Vert ^2+<\mathbf {W}\varvec{\mu }-\varvec{\eta },\mathbf {s}>+\sum _{m=1}^{M}\Big (\frac{a_{m}}{2}\Vert \varvec{\mu }_{m}-\varvec{\mu }\Vert ^2+ \\&\qquad \langle \varvec{\mu }_{m}-\varvec{\mu },\mathbf {q}_m \rangle \Big ). \end{aligned} \end{aligned}$$

Thus, we can solve the following equations

$$\begin{aligned} \begin{aligned} 0&= c\left( \mathbf {W}^{\top }\mathbf {W}\varvec{\mu }-\mathbf {W}^{\top }\varvec{\eta }^k\right) +\mathbf {W}^{\top }\mathbf {s}^k+\sum _{m=1}^M\left( a_m\varvec{\mu }-a_m\varvec{\mu }_m^{k+1}-\mathbf {q}_m^{k+1}\right) \\&=c\left( \mathbf {W}^{\top }\mathbf {W}\varvec{\mu }-\mathbf {W}^{\top }\varvec{\eta }^k\right) +\mathbf {W}^{\top }\mathbf {s}^k+\sum _{m=1}^M a_m\varvec{\mu }-\sum _{m=1}^M a_m\varvec{\mu }_m^{k+1}-\sum _{m=1}^M \mathbf {q}_m^{k+1}. \end{aligned} \end{aligned}$$

We then have

$$\begin{aligned} \left( \mathbf {W}^{\top }\mathbf {W}+\frac{1}{c}\sum _{m=1}^Ma_m\mathbf {I}\right) \varvec{\mu }= c\mathbf {W}^{\top }\varvec{\eta }^k-\mathbf {W}^{\top }\mathbf {s}^k+\sum _{m=1}^M a_m\varvec{\mu }_m^{k+1}+\sum _{m=1}^M \mathbf {q}_m^{k+1}. \end{aligned}$$

Let \(\varvec{\mu }_s^{k+1}=\sum _{m=1}^Ma_m\varvec{\mu }_m^{k+1}\) and \(\mathbf {q}_s^{k+1} = \sum _{m=1}^Mq_m^{k+1}\). Note that \(\mathbf {W}^{\top }\mathbf {W}= N\mathbf {I}- \mathbf {1}\mathbf {1}^{\top }\), and \(\mathbf {W}\) is a sparse matrix. Let \(r_{ij}\) denote the element at the ith row and the jth column of the matrix \(\left( \mathbf {W}^{\top }\mathbf {W}++\frac{1}{c}\sum _{m=1}^Ma_m\mathbf {I}\right) ^{-1}\) and \(S=N-1+\frac{1}{c}\sum _{m=1}^Ma_m\), we have

$$\begin{aligned} r_{ij}=\left\{ \begin{array}{lcl} \frac{RC + N - 2}{((N-1)-RC\times (N-2)-RC\times RC)\times (\sum _{m=1}^Ma_m-c)}&{}&{} i=j\\ \frac{1}{((N-1)-RC\times (N-2)-RC\times RC)\times (\sum _{m=1}^Ma_m-c)}&{}&{} i\ne j \end{array}, \right. \end{aligned}$$

The detail of updating \(\varvec{\mu }^{k+1}\) is given in Algorithm 2, and the calculation complexity of updating \(\varvec{\mu }^{k+1}\) is \(O(N^2)\).

Update \(\varvec{\beta }\): note that

$$\begin{aligned} \begin{aligned} \varvec{\beta }^{k+1}&=\arg \min _{\varvec{\beta }}\mathcal {L}(\varvec{\mu }^{k+1},\varvec{\beta },\varvec{\eta }^k,\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1}, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\qquad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \varvec{\beta }^{k+1}=\arg \min _{\varvec{\beta }}\sum _{m=1}^{M}\Big (\frac{b_{m}}{2}\Vert \varvec{\beta }_{m}-\varvec{\beta }\Vert ^2+ \langle \varvec{\beta }_{m}-\varvec{\beta },\mathbf {v}_m \rangle \Big ). \end{aligned}$$

We then solve the following equation

$$\begin{aligned} 0=\sum _{m=1}^Mb_m\varvec{\beta }-\sum _{m=1}^Mb_m\varvec{\beta }_m^{k+1}-\sum _{m=1}^M\mathbf {v}_m^{k+1}, \end{aligned}$$

which leads to

$$\begin{aligned} \varvec{\beta }^{k+1} = \frac{1}{\sum _{m=1}^Mb_m}\left( \sum _{m=1}^Mb_m\varvec{\beta }_m^{k+1}+\sum _{m=1}^M\mathbf {v}_m^{k+1}\right) . \end{aligned}$$

Update \(\varvec{\eta }\): note that

$$\begin{aligned} \begin{aligned} \varvec{\eta }^{k+1}&=\arg \min _{\varvec{\eta }}\mathcal {L}(\varvec{\mu }^{k+1},\varvec{\beta }^{k+1},\varvec{\eta },\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1}, \mathbf {z}_1^k,...,\mathbf {z}_M^k, \\&\quad \mathbf {s}^k, \mathbf {u}_1^k,...\mathbf {u}_{M}^k, \mathbf {q}_1^k,...\mathbf {q}_{M}^k,\mathbf {v}_1^k,...\mathbf {v}_{M}^k), \end{aligned} \end{aligned}$$

which is given by

$$\begin{aligned} \varvec{\eta }^{k+1} = \arg \min _{\varvec{\eta }}\left( P_N(|\varvec{\eta }|,\varvec{\pi }) + \frac{c}{2} \Vert \mathbf {W}\varvec{\mu }^{k+1} - \varvec{\eta }\Vert ^2 + \langle \mathbf {W}\varvec{\mu }^{k+1} - \varvec{\eta }, \mathbf {s}^k\rangle \right) . \end{aligned}$$

Since the penalty function is convex in \(\eta _{ij}\), then 0 must be contained in the subdifferential of this function at point \(\eta ^{k+1}_{ij}\).

$$\begin{aligned} \begin{aligned}&0 \in \partial \bigg \{P_N(\varvec{\eta }, \varvec{\pi })+\frac{c}{2}{\Vert W\varvec{\beta }^{k+1}-\varvec{\eta }\Vert }^2+<W\varvec{\eta }^{k+1}-\varvec{\eta },\mathbf {s}^k>\bigg \}\bigg |_{\varvec{\eta }=\varvec{\eta }^{k+1}}\\ \Rightarrow&0\in \partial P_N(\varvec{\eta }, \varvec{\pi })|_{\varvec{\eta }=\varvec{\eta }^{k+1}} + c\varvec{\eta }^{k+1}-cW\varvec{\mu }^{k+1}-\mathbf {s}^k. \end{aligned} \end{aligned}$$

Let \(\delta _{ij}=\mathbf {W}_{ij}^T\varvec{\mu }^{k+1}+\frac{1}{c}s_{ij}\), we then have \(L_1\) penalty:

$$\begin{aligned} {\hat{\eta }}_{ij}=T(\delta _{ij},\pi /c); \end{aligned}$$

MCP:

$$\begin{aligned} {\hat{\eta }}_{ij}=\left\{ \begin{array}{lcl} \frac{T(\delta _{ij},\pi /c)}{1-1/(\alpha c)}&{}&{}if |\delta _{ij}|\le \alpha \pi \\ \delta _{ij}&{}&{}if |\delta _{ij}|>\alpha \pi \end{array}; \right. \end{aligned}$$

SCAD:

$$\begin{aligned} {\hat{\eta }}_{ij}=\left\{ \begin{array}{lcl} T(\delta _{ij},\pi /c)&{}&{}if|\delta _{ij}|\le \pi +\pi /c\\ \frac{T(\delta _{ij},\alpha \pi /\{(\alpha -1)c\})}{1-1/\{(\alpha -1)c\}}&{}&{}if \pi +\pi /c<|\delta _{ij}|\le \alpha \pi \\ \delta _{ij}&{}&{}if |\delta _{ij}|>\alpha \pi \end{array}, \right. \end{aligned}$$

where \(T(t,\pi )=sign(t)(|t|-\pi )_+\).

1.2 Proof of Theorem 1

Here we take \((\mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta })\) as a whole because their updates can be implemented elementwisely. By the definition of

$$\begin{aligned} \big (\mathbf {z}_1^{k+1},...,\mathbf {z}_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1},\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\eta }^{k+1}\big ), \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned}&\mathcal {L}(\varvec{\beta }^{k+1},\varvec{\mu }^{k+1}, \mathbf {z}_1^{k+1},...,\mathbf {z}_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1},\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\eta }^{k+1},\\&\qquad \mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\mathbf {q}_1^{k},...,\mathbf {q}_M^{k})\\&\quad \le \mathcal {L}(\varvec{\beta }^{k+1}, \varvec{\mu }^{k+1}, \mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta },\mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\\&\qquad \mathbf {q}_1^{k},...,\mathbf {q}_M^{k}), \end{aligned} \end{aligned}$$

for any \((\mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta })\).

It then follows that

$$\begin{aligned} \begin{aligned}&\mathcal {L}(\varvec{\beta }^{k+1},\varvec{\mu }^{k+1}, \mathbf {z}_1^{k+1},...,\mathbf {z}_M^{k+1},\varvec{\beta }_1^{k+1},...,\varvec{\beta }_M^{k+1},\varvec{\mu }_1^{k+1},...,\varvec{\mu }_M^{k+1},\varvec{\eta }^{k+1},\\&\qquad \mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\mathbf {q}_1^{k},...,\mathbf {q}_M^{k})\\&\quad \le \inf _{\begin{array}{c} \mathbf {z},\varvec{\beta },\varvec{\mu },\varvec{\eta } \end{array} }\mathcal {L}(\varvec{\beta }^{k+1}, \mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta }, \mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\mathbf {q}_1^{k},...,\mathbf {q}_M^{k})\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {y}_m-\mathbf {X}_m\varvec{\beta }^{k+1}-\mathbf {A}_m\varvec{\mu }^{k+1}\Vert ^2 + P_N((\mathbf {W}\varvec{\mu }^{k+1}), \varvec{\pi })\triangleq l^{k+1}. \end{aligned} \end{aligned}$$

We write the objective function \(\mathcal {L}\) in the following form

$$\begin{aligned} \mathcal {L} = \mathcal {L}_0+\mathcal {L}_1, \end{aligned}$$

where

$$\begin{aligned} \begin{aligned}&\mathcal {L}_0(\varvec{\beta },\varvec{\mu },\mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta },\mathbf {s},\mathbf {u}_1,...,\mathbf {u}_M,\mathbf {v}_1,...,\mathbf {v}_M,\mathbf {q}_1,...,\mathbf {q}_M)\\&\quad =\frac{c}{2}\Vert \mathbf {W}\varvec{\mu }-\varvec{\eta }\Vert ^2 +<\mathbf {W}\varvec{\mu }-\varvec{\eta }, \mathbf {s}>\\&\qquad +\sum _{m=1}^M\bigg (\frac{d_m}{2}\Vert \mathbf {y}_m-\mathbf {X}_m\varvec{\beta }_m-\mathbf {A}_m\mu _m-\mathbf {z}_m\Vert ^2\\&\qquad +<\mathbf {y}_m-\mathbf {X}_m\varvec{\beta }_m-\mathbf {A}_m\mu _m-\mathbf {z}_m,\mathbf {u}_m>\bigg )\\&\qquad +\sum _{m=1}^M\left( \frac{b_m}{2}\Vert \varvec{\beta }-\varvec{\beta }_m\Vert ^2+<\varvec{\beta }-\varvec{\beta }_m,\mathbf {v}_m>\right) \\&\qquad +\sum _{m=1}^M\left( \frac{a_m}{2}\Vert \varvec{\mu }-\varvec{\mu }_m\Vert ^2+<\varvec{\mu }-\varvec{\mu }_m,\mathbf {q}_m>\right) ,\\ \text {and}\\&\qquad \mathcal {L}_1(\varvec{\beta },\varvec{\mu },\mathbf {z}_1,...,\mathbf {z}_M,\varvec{\beta }_1,...,\varvec{\beta }_M,\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta },\mathbf {s},\mathbf {u}_1,...,\mathbf {u}_M,\mathbf {v}_1,...,\mathbf {v}_M,\mathbf {q}_1,...,\mathbf {q}_M)\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {z}_m\Vert ^2 + P_N(\varvec{\eta }, \varvec{\pi }). \end{aligned} \end{aligned}$$

Since \(\mathcal {L}_0\) is differentiable with respect to \((\mathbf {z}_m,\varvec{\beta }_m,\varvec{\mu }_m)\), where \(m=1,...M\) and \(\mathcal {L}_1\) is convex with respect to \(\varvec{\eta }\). By Lemma 3.1 and Theorem 4.1 of Tseng (2001), we know that there exists a stationary point of \(\mathcal {L}\), denoted as \((\varvec{\beta }^*,\mathbf {s}^*,\mathbf {u}_m^*, \mathbf {v}_m^*)\), where \(m = 1, ... M\) and let \(l^*\) be its corresponding value of the objective function \(\mathcal {L}\).

Therefore we have

$$\begin{aligned} l^* = \lim _{k\rightarrow \infty }l^{k} = \lim _{k\rightarrow \infty }l^{k+t}, \end{aligned}$$

for any positive integer t. Note that

$$\begin{aligned} \left( \begin{array}{c} \mathbf {s}^{k+t}\\ \mathbf {u}_1^{k+t}\\ \vdots \\ \mathbf {u}_M^{k+t}\\ \mathbf {v}_1^{k+t}\\ \vdots \\ \mathbf {v}_M^{k+t}\\ \mathbf {q}_1^{k+t}\\ \vdots \\ \mathbf {q}_M^{k+t} \end{array}\right) = \left( \begin{array}{c} \mathbf {s}^{k}\\ \mathbf {u}_1^{k}\\ \vdots \\ \mathbf {u}_M^{k}\\ \mathbf {v}_1^{k}\\ \vdots \\ \mathbf {v}_M^{k}\\ \mathbf {q}_1^{k}\\ \vdots \\ \mathbf {q}_M^{k} \end{array}\right) + \sum _{i=1}^{t-1}\left( \begin{array}{c} c(\mathbf {W}\varvec{\mu }^{k+i}-\varvec{\eta }^{k+i})\\ d_1(\mathbf {y}_1 - \mathbf {X}_1\varvec{\beta }_1^{k+i} -\mathbf {A}_m\mu _m^{k+i}- \mathbf {z}_1^{k+i})\\ \vdots \\ d_M(\mathbf {y}_M - \mathbf {X}_M\varvec{\beta }_M^{k+i} -\mathbf {A}_m\mu _m^{k+i}- \mathbf {z}_M^{k+i})\\ b_1(\varvec{\beta }^{k+i}-\varvec{\beta }_1^{k+i})\\ \vdots \\ b_M(\varvec{\beta }^{k+i}-\varvec{\beta }_M^{k+i})\\ a_1(\varvec{\mu }^{k+i}-\varvec{\mu }_1^{k+i})\\ \vdots \\ a_M(\varvec{\mu }^{k+i}-\varvec{\mu }_M^{k+i}) \end{array} \right) , \end{aligned}$$

so we have

$$\begin{aligned}&\mathcal {L}(\varvec{\beta }^{k+1},\varvec{\mu }^{k+1}, \mathbf {z}_1^{k+t+1},...,\mathbf {z}_M^{k+t+1},\varvec{\beta }_1^{k+t+1},...,\varvec{\beta }_M^{k+t+1},\varvec{\mu }_1^{k+t+1},...,\varvec{\mu }_M^{k+t+1},\varvec{\eta }^{k+t+1},\\&\qquad \mathbf {s}^{k+t},\mathbf {u}_1^{k+t},...,\mathbf {u}_M^{k+t},\mathbf {v}_1^{k+t},...,\mathbf {v}_M^{k+t},\mathbf {q}_1^{k+t},...,\mathbf {q}_M^{k+t})\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {z}_m^{k+t+1}\Vert ^2 + P_N(\varvec{\eta }_j^{k+t+1}, \varvec{\pi })\\&\qquad +\frac{d_1}{2}\Vert \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {A}_m\mu _1^{k+t+1}-\mathbf {z}_1^{k+t+1}\Vert ^2\\&\qquad +...+\frac{d_M}{2}\Vert \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {A}_M\mu _M^{k+t+1}-\mathbf {z}_M^{k+t+1}\Vert ^2\\&\qquad +\frac{b_1}{2}\Vert \varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1}\Vert ^2+...+\frac{b_M}{2}\Vert \varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1}\Vert ^2 +\frac{c}{2}\Vert \mathbf {W}\varvec{\beta }^{k+t+1}-\varvec{\eta }^{k+t+1}\Vert ^2\\&\qquad +\frac{a_1}{2}\Vert \varvec{\mu }^{k+t+1}-\varvec{\mu }_1^{k+t+1}\Vert ^2+...+\frac{a_M}{2}\Vert \varvec{\mu }^{k+t+1}-\varvec{\mu }_M^{k+t+1}\Vert ^2\\&\qquad +\mathbf {u}_1^{k+tT}(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {z}_1^{k+t+1})+...+\mathbf {u}_M^{k+tT}(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {z}_M^{k+t+1})\\&\qquad +\mathbf {v}_1^{k+tT}(\varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1})+...+\mathbf {v}_M^{k+tT}(\varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1})\\&\qquad +\mathbf {q}_1^{k+tT}(\varvec{\mu }^{k+t+1}-\varvec{\mu }_1^{k+t+1})+...+\mathbf {q}_M^{k+tT}(\varvec{\mu }^{k+t+1}-\varvec{\mu }_M^{k+t+1})\\&\qquad +\mathbf {s}^{k+tT}(\mathbf {W}\varvec{\mu }^{k+t+1}-\varvec{\eta }^{k+t+1})\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {z}_m^{k+t+1}\Vert ^2 + P_N(\varvec{\eta }_j^{k+t+1}, \varvec{\pi })\\&\qquad +\frac{a_1}{2}\Vert \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {z}_1^{k+t+1}\Vert ^2+...+\frac{a_M}{2}\Vert \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {z}_M^{k+t+1}\Vert ^2\\&\qquad +\frac{b_1}{2}\Vert \varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1}\Vert ^2+...+\frac{b_M}{2}\Vert \varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1}\Vert ^2 +\frac{c}{2}\Vert \mathbf {W}\varvec{\beta }^{k+t+1}-\varvec{\eta }^{k+t+1}\Vert ^2\\&\qquad +\mathbf {u}_1^{kT}(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {z}_1^{k+t+1})+...+\mathbf {u}_M^{kT}(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {z}_M^{k+t+1})\\&\qquad +\mathbf {v}_1^{kT}(\varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1})+...+\mathbf {v}_M^{kT}(\varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1})+\mathbf {s}^{kT}(\mathbf {W}\varvec{\beta }^{k+t+1}-\varvec{\eta }^{k+t+1})\\&\qquad +\sum _{i=1}^{t-1}d_1(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+i}-\mathbf {A}_1\mu _1^{k+i}-\mathbf {z}_1^{k+i})^{\top }(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k+t+1}-\mathbf {A}_1\mu _1^{k+t+1}-\mathbf {z}_1^{k+t+1})\\&\qquad \vdots \\&\qquad +\sum _{i=1}^{t-1}d_M(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+i}-\mathbf {A}_M\mu _M^{k+i}-\mathbf {z}_M^{k+i})^{\top }(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k+t+1}-\mathbf {A}_M\mu _M^{k+t+1}\\&\qquad -\mathbf {z}_M^{k+t+1})+\sum _{i=1}^{t-1}b_1(\varvec{\beta }^{k+i}-\varvec{\beta }_1^{k+i})^{\top }(\varvec{\beta }^{k+t+1}-\varvec{\beta }_1^{k+t+1})\\&\qquad \vdots \\&\qquad +\sum _{i=1}^{t-1}b_M(\varvec{\beta }^{k+i}-\varvec{\beta }_M^{k+i})^{\top }(\varvec{\beta }^{k+t+1}-\varvec{\beta }_M^{k+t+1})+\sum _{i=1}^{t-1}a_1(\varvec{\mu }^{k+i}-\varvec{\mu }_1^{k+i})^{\top }(\varvec{\mu }^{k+t+1}\\&\qquad -\varvec{\mu }_1^{k+t+1})\\&\qquad \vdots \\&\qquad +\sum _{i=1}^{t-1}a_M(\varvec{\mu }^{k+i}-\varvec{\mu }_M^{k+i})^{\top }(\varvec{\mu }^{k+t+1}-\varvec{\mu }_M^{k+t+1})+\sum _{i=1}^{t-1}c(\mathbf {W}\varvec{\mu }^{k+i}-\varvec{\eta }^{k+i})^{\top }(\mathbf {W}\varvec{\mu }^{k+t+1}\\&\qquad -\varvec{\eta }^{k+t+1})\\&\quad \le l^{k+t+1}. \end{aligned}$$

Hence, we obtain that

$$\begin{aligned}&\lim _{k\rightarrow \infty }\mathcal {L}(\varvec{\beta }^{k+1},\varvec{\mu }^{k+1}, \mathbf {z}_1^{k+t+1},...,\mathbf {z}_M^{k+t+1},\varvec{\beta }_1^{k+t+1},...,\varvec{\beta }_M^{k+t+1},\varvec{\mu }_1^{k+t+1},...,\varvec{\mu }_M^{k+t+1},\varvec{\eta }^{k+t+1},\\&\qquad \mathbf {s}^{k+t},\mathbf {u}_1^{k+t},...,\mathbf {u}_M^{k+t},\mathbf {v}_1^{k+t},...,\mathbf {v}_M^{k+t},\mathbf {q}_1^{k+t},...,\mathbf {q}_M^{k+t})\\&\quad =\frac{1}{M}\sum _{m=1}^M\Vert \mathbf {z}_m^*\Vert ^2 + P_N(\varvec{\eta }_j^*, \varvec{\pi })\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {u}_1^{k\top }(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {A}_1\varvec{\mu }_1^*-\mathbf {z}_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {u}_M^{k\top }(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {A}_M\varvec{\mu }_M^*\\&\qquad -\mathbf {z}_M^*)+\lim _{k\rightarrow \infty }\mathbf {v}_1^{k\top }(\varvec{\beta }^*-\varvec{\beta }_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {v}_M^{k\top }(\varvec{\beta }^*-\varvec{\beta }_M^*)\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {q}_1^{k\top }(\varvec{\mu }^*-\varvec{\mu }_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {q}_M^{k\top }(\varvec{\mu }^*-\varvec{\mu }_M^*)\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {s}^{k\top }(\mathbf {W}\varvec{\mu }^*-\varvec{\eta }^*)\\&\qquad +\left( t-\frac{1}{2}\right) (a_1\Vert \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {z}_1^*\Vert ^2+...+a_M\Vert \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {z}_M^*\Vert ^2)\\&\qquad +\left( t-\frac{1}{2}\right) (b_1\Vert \varvec{\beta }^*-\varvec{\beta }_1^*\Vert ^2+...+b_M\Vert \varvec{\beta }^*-\varvec{\beta }_M^*\Vert ^2)\\&\qquad +\left( t-\frac{1}{2}\right) (c\Vert \mathbf {W}\varvec{\beta }^*-\varvec{\eta }^*\Vert ^2)\\&\quad =l^*+\lim _{k\rightarrow \infty }\mathbf {u}_1^{k\top }(\mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {A}_1\varvec{\mu }_1^*-\mathbf {z}_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {u}_M^{k\top }(\mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {A}_M\varvec{\mu }_m^*\\&\qquad -\mathbf {z}_M^*)+\lim _{k\rightarrow \infty }\mathbf {v}_1^{k\top }(\varvec{\beta }^*-\varvec{\beta }_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {v}_M^{k\top }(\varvec{\beta }^*-\varvec{\beta }_M^*)\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {q}_1^{k\top }(\varvec{\mu }^*-\varvec{\mu }_1^*)+...+\lim _{k\rightarrow \infty }\mathbf {q}_M^{k\top }(\varvec{\mu }^*-\varvec{\mu }_M^*)\\&\qquad +\lim _{k\rightarrow \infty }\mathbf {s}^{k\top }(\mathbf {W}\varvec{\mu }^*-\varvec{\eta }^*)\\&\qquad +\left( t-\frac{1}{2}\right) (d_1\Vert \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {A}_1\varvec{\mu }_1^*-\mathbf {z}_1^*\Vert ^2+...+d_M\Vert \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {A}_M\varvec{\mu }_M^*\\&\qquad -\mathbf {z}_M^*\Vert ^2)+\left( t-\frac{1}{2}\right) (b_1\Vert \varvec{\beta }^*-\varvec{\beta }_1^*\Vert ^2+...+b_M\Vert \varvec{\beta }^*-\varvec{\beta }_M^*\Vert ^2)\\&\qquad +\left( t-\frac{1}{2}\right) (a_1\Vert \varvec{\mu }^*-\varvec{\mu }_1^*\Vert ^2+...+a_M\Vert \varvec{\mu }^*-\varvec{\mu }_M^*\Vert ^2)\\&\qquad +\left( t-\frac{1}{2}\right) (c\Vert \mathbf {W}\varvec{\mu }^*-\varvec{\eta }^*\Vert ^2)\\&\quad \le l^*, \end{aligned}$$

for any positive integer t. It then follows that

$$\begin{aligned} \lim _{k\rightarrow \infty }\Vert \mathbf {r}^{k} \Vert =\lim _{k\rightarrow \infty }\left\Vert \left( \begin{array}{c} \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^{k}-\mathbf {A}_1\varvec{\mu }_1^{k}-\mathbf {z}_1^{k}\\ ...\\ \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^{k}-\mathbf {A}_M\varvec{\mu }_M^{k}-\mathbf {z}_M^{k}\\ \varvec{\beta }^{k} - \varvec{\beta }_1^{k}\\ ...\\ \varvec{\beta }^{k} - \varvec{\beta }_M^{k}\\ \varvec{\mu }^{k} - \varvec{\mu }_1^{k}\\ ...\\ \varvec{\mu }^{k} - \varvec{\mu }_M^{k}\\ \mathbf {W}\varvec{\mu }^{k} - \varvec{\eta }^{k} \end{array} \right) \right\Vert =\left\Vert \left( \begin{array}{c} \mathbf {y}_1-\mathbf {X}_1\varvec{\beta }_1^*-\mathbf {A}_1\varvec{\mu }_1^*-\mathbf {z}_1^*\\ ...\\ \mathbf {y}_M-\mathbf {X}_M\varvec{\beta }_M^*-\mathbf {A}_M\varvec{\mu }_M^*-\mathbf {z}_M^*\\ \varvec{\beta }^* - \varvec{\beta }_1^*\\ ...\\ \varvec{\beta }^* - \varvec{\beta }_M^*\\ \varvec{\mu }^* - \varvec{\mu }_1^*\\ ...\\ \varvec{\mu }^* - \varvec{\mu }_M^*\\ \mathbf {W}\varvec{\mu }^* - \varvec{\eta }^* \end{array} \right) \right\Vert =0. \end{aligned}$$

Thus, the primal feasibility condition is met.

For the dual residual, by definition, \(\varvec{\mu }^{{k+1}}\) minimizes

$$\begin{aligned} \mathcal {L}(\varvec{\beta },\mathbf {z}_1^{k},...,\mathbf {z}_M^{k},\varvec{\beta }_1^{k},...,\varvec{\beta }_M^{k},\varvec{\mu }_1,...,\varvec{\mu }_M,\varvec{\eta }^{k},\mathbf {s}^{k},\mathbf {u}_1^{k},...,\mathbf {u}_M^{k},\mathbf {v}_1^{k},...,\mathbf {v}_M^{k},\mathbf {q}_1^{k},...,\mathbf {q}_M^{k}), \end{aligned}$$

so we obtain that

$$\begin{aligned} \mathbf {0}_p\in \left( \frac{\partial \mathcal {L}}{\partial \varvec{\mu }}\right) . \end{aligned}$$

Therefore, we have

$$\begin{aligned} \begin{aligned} \mathbf {0}_p&=c\mathbf {W}^{\top }(\mathbf {W}\varvec{\mu }^{k+1}-\varvec{\eta }^{k}) + \sum _{m=1}^Ma_m(\varvec{\mu }^{k+1}-\varvec{\mu }_m^{k}) + \mathbf {W}^{\top }\mathbf {s}^{k}+\sum _{m=1}^M\mathbf {q}_m^{k}\\&=c\mathbf {W}^{\top }(\mathbf {W}\varvec{\mu }^{k+1}-\varvec{\eta }^{k+1}) + \sum _{m=1}^Ma_m(\varvec{\mu }^{k+1}-\varvec{\mu }_m^{k+1}) + \mathbf {W}^{\top }\mathbf {s}^{k+1}+\sum _{m=1}^M\mathbf {q}_m^{k+1}\\&\quad +c\mathbf {W}^{\top }(\varvec{\eta }^{k+1}-\varvec{\eta }^{k}) + \sum _{m=1}^Ma_m(\varvec{\mu }_m^{k+1}-\varvec{\mu }_m^{k})- \mathbf {W}^{\top }(\mathbf {s}^{k+1}-\mathbf {s}^{k})-\sum _{m=1}^M(\mathbf {q}_m^{k+1}-\mathbf {q}_m^{k})\\&=\mathbf {W}^{\top }\mathbf {s}^{k+1}+\sum _{m=1}^M\mathbf {q}_m^{k+1}+c\mathbf {W}^{\top }(\varvec{\eta }^{k+1}-\varvec{\eta }^{k}) + \sum _{m=1}^Ma_m(\varvec{\mu }_m^{k+1}-\varvec{\mu }_m^{k}). \end{aligned} \end{aligned}$$

Thus, the dual residual is given by

$$\begin{aligned} \mathbf {d}^{k+1} = -c\mathbf {W}^{\top }(\varvec{\eta }^{k+1}-\varvec{\eta }^{k}) - \sum _{m=1}^Ma_m(\varvec{\mu }_m^{k+1}-\varvec{\mu }_m^{k}). \end{aligned}$$

Furthermore, we obtain that

$$\begin{aligned} \lim _{k\rightarrow \infty }\mathbf {d}^{k+1} = -c\mathbf {W}^{\top }(\varvec{\eta }^*-\varvec{\eta }^*) - \sum _{m=1}^Mb_m(\varvec{\mu }_m^*-\varvec{\mu }_m^*) = \mathbf {0}_p, \end{aligned}$$

which implies that the dual feasibility is satisfied. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Feng, X. Distributed identification of heterogeneous treatment effects. Comput Stat 37, 57–89 (2022). https://doi.org/10.1007/s00180-021-01114-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01114-2

Keywords

Navigation