A partitioned quasi-likelihood for distributed statistical inference

Guo, Guangbao; Sun, Yue; Jiang, Xuejun

doi:10.1007/s00180-020-00974-4

A partitioned quasi-likelihood for distributed statistical inference

Original paper
Published: 09 March 2020

Volume 35, pages 1577–1596, (2020)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Guangbao Guo¹,
Yue Sun¹ &
Xuejun Jiang²

386 Accesses
6 Citations
Explore all metrics

Abstract

In the big data setting, working data sets are often distributed on multiple machines. However, classical statistical methods are often developed to solve the problems of single estimation or inference. We employ a novel parallel quasi-likelihood method in generalized linear models, to make the variances between different sub-estimators relatively similar. Estimates are obtained from projection subsets of data and later combined by suitably-chosen unknown weights. We also show the proposed method to produce better asymptotic efficiency than using the simple average. Furthermore, simulation examples show that the proposed method can significantly improve statistical inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel inference for big data with the group Bayesian method

Article 25 June 2020

The Transition from A Priori to A Posteriori Information: Bayesian Procedures in Distributed Large-Scale Data Processing Systems

Article 01 July 2018

Optimal subsample selection for massive logistic regression with distributed data

Article 27 February 2021

References

Battey H, Fan J, Liu H, Lu J, Zhu Z (2015) Distributed estimation and inference with statistical guarantees. ArXiv:1509.05457
Carbonell F, Iturria-Medina Y, Jimenez JC (2016) Multiple shooting-local linearization method for the identification of dynamical systems. Commun Nonlinear Sci Numer Simul 37:292–304
Article MathSciNet Google Scholar
Deuflhard P (2011) Newton methods for nonlinear problems: affine invariance and adaptive algorithms, vol 35. Springer, Berlin
Book Google Scholar
Deuflhard P (2018) The grand four: affine invariant globalizations of Newton’s method. Vietnam J Math 46(4):761–777
Article MathSciNet Google Scholar
Fahrmeir L, Tutz G (2013) Multivariate statistical modelling based on generalized linear models. Springer, Berlin
MATH Google Scholar
Guo G, You W, Qian G, Shao W (2015) Parallel maximum likelihood estimator for multiple linear regression models. J Comput Appl Math 273:251–263
Article MathSciNet Google Scholar
Hasenclever L, Webb S, Lienart T, Vollmer S, Lakshminarayanan B, Blundell C, Teh Y (2017) Distributed bayesian learning with stochastic natural gradient expectation propagation and the posterior server. J Mach Learn Res 18(106):1–37
MathSciNet MATH Google Scholar
Hu T, Chang H (1999) Stability for randomly weighted sums of random elements. Int J Math Math Sci 22(3):559–568
Article MathSciNet Google Scholar
Huang C, Huo X (2015) A distributed one-step estimator. ArXiv:1511.01443
Jordan MI, Lee JD, Yang Y (2018) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:1–14
MathSciNet Google Scholar
Kleiner A, Talwalkar A, Sarkar P, Jordan M (2014) A scalable bootstrap for massive data. J R Stat Soc Ser B (Stat Methodol) 76(4):795–816
Article MathSciNet Google Scholar
Lang S (1993) Real and functional analysis. Springer, New York
Book Google Scholar
Matoušek J (2008) On variants of the Johnson-Lindenstrauss lemma. Random Struct Algorithms 33(2):142–156
Article MathSciNet Google Scholar
Minsker S, Strawn N (2017) Distributed statistical estimation and rates of convergence in normal approximation. ArXiv: 1704.02658
Moualeu-Ngangue DP, Röblitz S, Ehrig R, Deuflhard P (2015) Parameter identification in a tuberculosis model for Cameroon. PLoS ONE 10(4):e0120607
Article Google Scholar
Owen J, Wilkinson D, Gillespie C (2015) Scalable inference for Markov processes with intractable likelihoods. Stat Comput 25(1):145–156
Article MathSciNet Google Scholar
Pilanci M, Wainwright MJ (2016) Iterative Hessian sketch: fast and accurate solution approximation for constrained least-squares. J Mach Learn Res 17(1):1842–1879
MathSciNet MATH Google Scholar
Pratola M, Chipman H, Gattiker J, Higdon D, McCulloch R, Rust W (2014) Parallel Bayesian additive regression trees. J Comput Graph Stat 23(3):830–852
Article MathSciNet Google Scholar
Sengupta S, Volgushev S, Shao X (2016) A subsampled double bootstrap for massive data. J Am Stat Assoc 111(515):1222–1232
Article MathSciNet Google Scholar
Shamir O, Srebro N, Zhang T (2014) Communication-efficient distributed optimization using an approximate Newton-type method. In: International conference on machine learning, pp 1000–1008
Song Q, Liang F (2015) A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression. J R Stat Soc Ser B (Stat Methodol) 77(5):947–972
Article MathSciNet Google Scholar
Zhang K, Zhang L, Yang M (2012) Real-time compressive tracking. European conference on computer vision. Springer, Berlin, pp 864–877
Google Scholar

Download references

Acknowledgements

Guangbao Guo and Yue Sun were supported by a Grant from Natural Science Foundation of Shandong under project ID ZR2016AM09. Xuejun Jiang was supported by the Natural Science Foundation of Guangdong (2017A030313012).

Author information

Authors and Affiliations

Department of Statistics, Shandong University of Technology, Zibo, 255000, China
Guangbao Guo & Yue Sun
Department of Mathematics, Southern University of Science and Technology, Shenzhen, 518000, China
Xuejun Jiang

Authors

Guangbao Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yue Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xuejun Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangbao Guo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (rar 626 KB)

Appendix

Proof of Theorem 1:

We first prove the existence of the sequence $\{w^{opt}_{n,g}\}_{g=1}^{G_n}$. Note that

$$\begin{aligned} \text {var}(\hat{\beta }_A)=\frac{1}{G_{n}^2}\sum _{g=1}^{G_n} \text {var}(\hat{\beta }_{n,g}),\ \text {var}(\hat{\beta }_w)= \sum _{g=1}^{G_n} w_{n,g}^2\text {var}(\hat{\beta }_{n,g}). \end{aligned}$$

And

$$\begin{aligned} \text {tr(var}(\hat{\beta }_A))=\frac{1}{G_{n}^2}\sum _{g=1}^{G_n} \text {tr(var}(\hat{\beta }_{n,g})), \ \text {tr(var}(\hat{\beta }_w))= \sum _{g=1}^{G_n} w_{n,g}^2\text {tr(var}(\hat{\beta }_{n,g})). \end{aligned}$$

Let

$$\begin{aligned} \text {tr}_{1}=\min \{\text {tr}(\text {var}(\hat{\beta }_{n,g}))\},\ldots , \text {tr}_{G_n}=\max \{\text {tr}(\text {var}(\hat{\beta }_{n,g}))\}. \end{aligned}$$

Then

$$\begin{aligned} \text {tr(var}(\hat{\beta }_A))=\sum _{g=1}^{G_n}\text {tr}_g/G_n^2,\ \text {tr(var}(\hat{\beta }_w))=\sum _{g=1}^{G_n}w_{n,g}^2\text {tr}_g. \end{aligned}$$

(i) For $m\in \mathbb {N}^+$, $G_n=2m+1$. We set $w_{n,1}+w_{n,2m+1}=\ldots =w_{n,m}+w_{n,m+2}=2/G_n$ and $w_{n,m+1}=1/G_n$. Let $w_{n,1} =\ldots =w_{n,m} >1/G_n$, we have $\text {tr(var}(\hat{\beta }_w))\le \text {tr(var}(\hat{\beta }_A))$.

If we set $w_{n,1}+w_{n,2}+w_{n,2m}+w_{n,2m+1}=\ldots =w_{n,m-1}+w_{n,m} +w_{n,m+2} +w_{n,m+3}=4/G_n$, $w_{n,m+1}=1/G$. Let $w_{n,1}+w_{n,2}=\ldots =w_{n,m-1}+w_{n,m}>2/G_n$, we have $\text {tr(var}(\hat{\beta }_w))\le \text {tr(var}(\hat{\beta }_A))$.

If we set $w_{n,1}+w_{n,2}+w_{n,3}+w_{n,2m-1}+w_{n,2m}+w_{n,2m+1}=\ldots =w_{n,m-2}+w_{n,m-1}+w_{n,m} +w_{n,m+2} +w_{n,m+3}+w_{n,m+4}=6/G_n$, $w_{n,m+1}=1/G_n$. Let $w_{n,1}+w_{n,2}+w_{n,3}=\ldots =w_{n,m-2}+w_{n,m-1}+w_{n,m}>3/G_n$, we have $\text {tr(var}(\hat{\beta }_w))\le \text {tr(var}(\hat{\beta }_A))$.

(ii) For $m\in \mathbb {N}^+$ with $G_n=2m$. We set $w_{n,1}+w_{n,2m}=\ldots =w_{n,m}+w_{n,m+1}=2/G_n$. Let $w_{n,1} =\ldots =w_{n,m} >1/G_n$,

If we set $w_{n,1}+w_{n,2}+w_{n,2m-1}+w_{n,2m}=\ldots =w_{n,m-1}+w_{n,m}+w_{n,m+1}+w_{n,m+2}=4/G_n$ when m is even number. Let $w_{n,1}+w_{n,2}=\ldots =w_{n,m-1}+w_{n,m}>2/G_n$, we have $\text {tr(var}(\hat{\beta }_w))\le \text {tr(var}(\hat{\beta }_A))$.

We now derive the optimal weight sequence $\{w^{opt}_{n,g}\}_{g=1}^{G_n}$ for $\hat{\beta }_w$. We choose Lagrangian multiplier method to prove the theorem. We set $v_g=\text {tr(var}(\hat{\beta }_{n,g}))$ and objective function

$$\begin{aligned} l(w,\lambda )=\sum _{g=1}^{G_{n}} w_{n,g}^2 v_g+\lambda \Big (\sum _{g=1}^{G_{n}} w_{n,g}-1\Big ), \end{aligned}$$

where $\lambda $ is a Lagrangian multiplier parameter. Taking the derivative of $l(w,\lambda )$ gives

$$\begin{aligned} \frac{\partial l(w,\lambda )}{\partial w_{n,g} }=2w_{n,g}v_g+\lambda ,\ g=1,\ldots ,G_n. \end{aligned}$$

And

$$\begin{aligned} w_{n,g}v_g=-\frac{1}{2}\cdot \lambda v_g^{-1}, \sum _{g=1}^{G_{n}} w_{n,g}=-\lambda \cdot \sum _{g=1}^{G_{n}}v_g^{-1}/2=1. \end{aligned}$$

We then have optimal weight

$$\begin{aligned} w^{opt}_{n,g}=v_g^{-1}\Big /\sum _{g=1}^{G_{n}}v_g^{-1}, \ g=1,\ldots ,G_n. \end{aligned}$$

We also obtain $\text {tr(var}(\hat{\beta }_{w^{opt}}))=1/\sum _{g=1}^{G_{n}} v_g^{-1}$. $\Box $

Proof of Theorem 2:

Observing that $e^x\le 1+x+\frac{1}{2}x^2e^{|x|}$ ($x>0$), we obtain

$$\begin{aligned} \sum _{n=1}^\infty n^{r-2} P\Big (\sum _{g=1}^{G_n}w_{n,g}\Vert \varepsilon _{n,g}\Vert _1>\epsilon \Big )\le & {} \sum _{n=1}^\infty n^{r-2} e^{-\epsilon t} E\exp \Big \{t\sum _{g=1}^{G_n}w_{n,g}\Vert \varepsilon _{n,g}\Vert _1\Big \}\ (t=M\log n/\epsilon ) \\\le & {} \sum _{n=1}^\infty n^{r-2-M} \prod _{g=1}^{G_n} E\exp \{t w_{n,g}\Vert \varepsilon _{n,g}\Vert _1\} \\\le & {} \sum _{n=1}^\infty n^{r-2-M} \prod _{g=1}^{G_n} \Big [1+\frac{1}{2}t^2w_{n,g}^2E\varepsilon _{n,g}^2 e^{t w_{ng}\Vert \varepsilon _{n,g}\Vert _1}\Big ] \\\le & {} \sum _{n=1}^\infty n^{r-2-M} \prod _{g=1}^{G_n} \Big [1+c(\log n)^2 w_{n,g}^2E \exp \{(1+c)\Vert \varepsilon _{n,g}\Vert _1\}\Big ] \\\le & {} \sum _{n=1}^\infty n^{r-2-M} \exp \Big \{ c(\log n)^2 \sum _{g=1}^{G_n} w_{n,g}^2 \Big \} \\\le & {} \sum _{n=1}^\infty n^{r-2-M+\epsilon } < \infty ,\epsilon >0. \end{aligned}$$

Here M is a large constant such that $M>(r+\epsilon )$, c is a suitable constant. Thus we have the theorem. $\Box $

Proof of Theorem 3:

We get this proof through the characteristic function method. For $1\le u\le G_n-1$,

$$\begin{aligned} \sum _{g,h=1,|g-h|\ge u}^{G_n}\Vert |w_{n,g}w_{n,h}\text {cov}(\varepsilon _{n,g},\varepsilon _{n,h})\Vert |\le \sup _k\bigg \Vert \bigg |\sum _{h=1,|k-h|\ge u}^{G_n}\text {cov}(\varepsilon _{n,g},\varepsilon _{n,h})\bigg |\bigg \Vert \Big (\sum _{g=1}^{G_n} w_{n,g}^2\Big ). \end{aligned}$$

By Eqs. (3.3) and (3.4), for any $\epsilon >0$, there exists $u=u_\epsilon $ satisfying

$$\begin{aligned} 0\le \sum _{g,h=1,|g-h|\ge u}^{G_n} w_{n,g}w_{n,h}\Vert |\text {cov}(\varepsilon _{n,g},\varepsilon _{n,h})\Vert |\le \epsilon . \end{aligned}$$

Define

$$\begin{aligned} K=\Big [\frac{1}{\epsilon }\Big ], Y_{n,g}=\sum _{h=ug+1}^{u(g+1)}w_{n,h} \varepsilon _{n,h},\ g=0,1,2,\ldots . \end{aligned}$$

$$\begin{aligned} A_h=\Big \{g: 2Kh \le g\le 2Kh+K,\Vert |\text {cov}(Y_{n,g},Y_{n,g+1}) \Vert |\le \frac{1}{K}\Vert |\sum _{g=2Kh}^{2Kh+K}\text {var}(Y_{n,g})\Vert |\Big \}. \end{aligned}$$

Due to $2\Vert |\text {cov}(Y_{n,g},Y_{n,g+1})\Vert |\le \Vert |\text {var}(Y_{n,g})\Vert |+\Vert |\text {var}(Y_{n,g+1})\Vert |$, $A_h$ is nonempty for any h. Let $m_0=0$, $m_{h+1}=\min \{m:m>m_h,m\in A_h\}$ for $\{m_g\}_{g=1}^{G_n}$, and set

$$\begin{aligned} Z_{n,h}=\sum _{g=m_h+1}^{m_h+1}Y_{n,g}(h=0,1,2,\ldots ). \ \Delta _h=\{u(m_h+1)+1,\ldots ,u(m_{h+1}+1)\}. \end{aligned}$$

Note that

$$\begin{aligned} Z_{n,h}=\sum _{k\in \Delta _h}w_{n,k}\varepsilon _{n,k},\ h=0,1,\ldots ,G_n. \end{aligned}$$

For any t,

$$\begin{aligned}&\bigg |E\exp \Big (it\sum _{h=1}^{G_n}\Vert Z_{n,h}\Vert _1\Big )-\prod _{h=1}^{G_n} E\exp (it\Vert Z_{n,h}\Vert _1)\bigg | \\&\quad \le \Big |\text {cov}\Big (\exp \big (it\sum _{h=1}^{G_n-1}\Vert Z_{n,h}\Vert _1\big ),\exp (it \Vert Z_{n,G_n})\Vert _1\big )\Big | \\&\qquad +|E\exp (it \Vert Z_{n,h}\Vert _1)|\Big |E\exp \big (it\sum _{h=1}^{G_n-1}\Vert Z_{n,h}\Vert _1\big )-\prod _{h=1}^{G_n-1} E\exp (it\Vert Z_{n,h}\Vert _1)\Big | \\&\quad \ll t^2\cdot \sum _{1\le g<h\le G_n} \text {cov}(\Vert Z_{n,g}\Vert _1,\Vert Z_{n,h}\Vert _1) \\&\quad = t^2\cdot \bigg [\sum _{1\le g<h\le G_n,|g-h|=1} \text {cov}(\Vert Z_{n,g}\Vert _1,\Vert Z_{n,h}\Vert _1) +\sum _{1\le g<h\le G_n,|g-h|>1} \text {cov}(\Vert Z_{n,g}\Vert _1,\Vert Z_{n,h}\Vert _1) \bigg ] \\&\quad \ll t^2\cdot \bigg [\sum _{1\le g<h\le G_n,|g-h|\ge u} w_{n,g}w_{n,h} \text {cov}(\Vert \varepsilon _{n,g}\Vert _1,\Vert \varepsilon _{n,h}\Vert _1) \bigg ] \\&\qquad + \sum _{h=1}^{G_n} \text {cov}(\Vert Y_{n,h}\Vert _1,\Vert Y_{n,h+1}\Vert _1) \\&\quad \ll t^2 \cdot \Big [\varepsilon +\frac{O(1)}{K}\sum _{g=1}^{G_n}\text {var}(\Vert Y_{n,g}\Vert _1)\Big ]\ll \epsilon t^2. \end{aligned}$$

We then have the theorem. $\Box $

Proof of Theorem 4:

We first show that $\sqrt{n}(\hat{\beta }_w-\beta ^*)$ is bounded when $G_n=O(p_n)$, after we present two lemmas of Huang and Huo (2015),

$$\begin{aligned} E[\Vert \sqrt{p_n}(\hat{\beta }_{n,g}-\beta ^*)\Vert ^2]\le & {} 2 \text {tr}\{\Sigma \} +O(p_n^{-1})\ \text {from}\ \text {Lemma} \ \text {B.1}, \end{aligned}$$

(7.1)

$$\begin{aligned} \Vert E\sqrt{p_n}(\hat{\beta }_w-\beta ^*)\Vert \le O(1/\sqrt{p_n})\ \text {from}\ \text {Lemma} \ \text {B.5}. \end{aligned}$$

(7.2)

By applying Theorem 3 and (7.1), we have

$$\begin{aligned} \sqrt{n} (\hat{\beta }_w-\beta ^*)= & {} \frac{1}{\sqrt{G_n}}\sum _{g=1} \bigg \{w_{n,g}\sqrt{p_n}(\hat{\beta }_{n,g}-\beta ^*) - E[w_{n,g}\sqrt{p_n}(\hat{\beta }_{n,g}-\beta ^*)] \bigg \}\\&+\sqrt{n}E(\hat{\beta }_{n,1}-\beta ^*)\\&{\mathop {\longrightarrow }\limits ^{d}} N(0,\Sigma )+\lim _{n\rightarrow \infty } \sqrt{n}E(\hat{\beta }_{n,1}-\beta ^*). \end{aligned}$$

Then $\lim _{n\rightarrow \infty } \sqrt{n} E (\hat{\beta }_{n,1}-\beta ^*)$ is finite. By (7.2), we have $\Vert E(\hat{\beta }_{n,g}-\beta ^*)\Vert =O(1/p_n),\ g=1,2,\ldots ,G_n$. Thus

$$\begin{aligned} \Vert \sqrt{n} (\hat{\beta }_{n,g}-\beta ^*)\Vert =O(1)\ \text {if} \ G_n=O(p_n), \end{aligned}$$

and when $G_n=O(p_n)$, $\sqrt{n} \Vert \hat{\beta }_w-\beta ^*\Vert =O(1)\ \text {as} \ n \rightarrow \infty $.

By Theorem 4.2 of Lang (1993), there exists an integrator variable $\rho \ (\in [0,1])$ such that

$$\begin{aligned}&\frac{\sqrt{n}}{a}\cdot F(\hat{\beta }_w)(\tilde{\beta }_w-\beta ^*)\\&\quad = F(\hat{\beta }_w)\cdot \sqrt{n}(\hat{\beta }_w-\beta ^*)-\sqrt{n}(s(\hat{\beta }_w)-s(\beta ^*))-\sqrt{n}s(\beta ^*) \\&\quad = F(\hat{\beta }_w)\cdot \sqrt{n}(\hat{\beta }_w-\beta ^*)-\sqrt{n}\int _0^1 F((1-\rho )\beta ^*+\rho \hat{\beta }_w) d\rho \cdot (\hat{\beta }_w - \beta ^*)-\sqrt{n}s(\beta ^*) \\&\quad = \Big [F(\hat{\beta }_w)-\int _0^1 F((1-\rho )\beta ^*+\rho \hat{\beta }_w) d\rho \Big ]\cdot \sqrt{n}(\hat{\beta }_w-\beta ^*)-\sqrt{n}s(\beta ^*). \end{aligned}$$

For $G_n=O(p_n)$, by applying Theorem 3, we obtain

$$\begin{aligned} \big \Vert (1-\rho )\beta ^*+\rho \hat{\beta }_w-\beta ^*\big \Vert \le \rho \Vert \hat{\beta }_w-\beta ^*\Vert {\mathop {\longrightarrow }\limits ^{P}} 0. \end{aligned}$$

Since $F(\cdot )$ is a continuous function,

$$\begin{aligned} \Big \Vert \Big |F(\hat{\beta }_w)-\int _0^1 F((1-\rho )\beta ^*+\rho \hat{\beta }_w) d\rho \Big \Vert \Big |{\mathop {\longrightarrow }\limits ^{P}} 0. \end{aligned}$$

Thus

$$\begin{aligned} \frac{\sqrt{n}}{a}\cdot F(\hat{\beta }_w)(\tilde{\beta }_w-\beta ^*)=-\sqrt{n}s(\beta ^*)+o_P(1). \end{aligned}$$

We have $F(\hat{\beta }_w){\mathop {\longrightarrow }\limits ^{P}}F(\beta ^*)$ due to $\hat{\beta }_w{\mathop {\longrightarrow }\limits ^{P}}\beta ^*$. By Slutsky’s lemma, we thus have

$$\begin{aligned} \sqrt{n}(\tilde{\beta }_w-\beta ^*){\mathop {\longrightarrow }\limits ^{d}} N(0,\Sigma ) \ \text {as}\ n\rightarrow \infty . \end{aligned}$$

$\Box $

Proof of Theorem 5:

Note that

$$\begin{aligned} \frac{1}{a}F(\hat{\beta }_w)(\tilde{\beta }_w-\beta ^*)=\Big [F(\hat{\beta }_w)-\int _0^1 F((1-\rho )\beta ^*+\rho \hat{\beta }_w) d\rho \Big ](\hat{\beta }_w-\beta ^*)-s(\beta ^*),\ \rho \in (0,1), \end{aligned}$$

we then obtain

$$\begin{aligned} \frac{1}{a}(\tilde{\beta }_w-\beta ^*)= & {} F(\hat{\beta }_w)^{-1} \Big [F(\hat{\beta }_w)-\int _0^1 F((1-\rho )\beta ^*+\rho \hat{\beta }_w) d\rho \Big ]\cdot ( \hat{\beta }_w -\beta ^*) \\&-(F(\hat{\beta }_w)^{-1}F(\beta ^*))\cdot F(\beta ^*)^{-1} s(\beta ^*),\ \beta _w\in B_\delta . \end{aligned}$$

It is observed that

Then

$$\begin{aligned} E\Vert \hat{\beta }_A-\beta ^*\Vert ^2= & {} \text {tr}(\text {cov}(\hat{\beta }_A))+\Vert E(\hat{\beta }_A)-\beta ^*\Vert ^2=\frac{1}{G_n}\text {tr}(\text {cov}(\hat{\beta }_{n,1}))+\Vert E(\hat{\beta }_{n,1})-\beta ^*\Vert ^2 \\= & {} \frac{1}{G_n} E\Vert \hat{\beta }_{n,1}-\beta ^*\Vert ^2+\Vert E(\hat{\beta }_{n,1}-\beta ^*)\Vert ^2 \\\le & {} \frac{1}{G_n}\bigg [\frac{2\text {tr}(\Sigma )G_n}{n}+O(G_n^2n^{-2})\bigg ]+\bigg [\frac{C_GG_n}{n}+O(G_n^2n^{-2})\bigg ]^2\\= & {} \frac{2\text {tr}(\Sigma )}{n}+O(G_nn^{-2})+O(G_n^2n^{-2}). \end{aligned}$$

By Hölder’s inequality, we have

$$\begin{aligned} \frac{1}{a^2}\cdot E\Big [ \big \Vert F(\hat{\beta }_w)^{-1} \Big (F(\hat{\beta }_w)- & {} \int _0^1 F((1-\rho )\beta ^*+\rho \hat{\beta }_w) d\rho \Big )\cdot ( \hat{\beta }_w -\beta ^*)\big \Vert ^2 \Big ] \\\le & {} \lambda _F^{-2} \sqrt{E\Big [\big \Vert \big |F(\hat{\beta }_w) - \int _0^1 F((1-\rho )\beta ^*+\rho \hat{\beta }_w) d\rho \big \Vert \big |^4\Big ]}\\&\cdot \sqrt{E[\Vert \hat{\beta }_w - \beta ^*\Vert ^4]}\\= & {} O(n^{-2})+O(G_n^4n^{-4}),\lambda _F^{-1}=\Vert |F(\beta ^*)^{-1})\Vert |. \end{aligned}$$

Applying $(b+c)^2\le 2(b^2+c^2)$, we obtain

$$\begin{aligned} E[\Vert \tilde{\beta }_w-\beta ^*\Vert ^2]\le & {} 2E\bigg \{\big \Vert F(\beta ^*)^{-1} \Big [F(\hat{\beta }_w) - \int _0^1 F((1-\rho )\beta ^*+\rho \hat{\beta }_w) d\rho \Big ]\cdot ( \hat{\beta }_w -\beta ^*)\big \Vert ^2 \bigg \} \\&+2E[\Vert |(F(\hat{\beta }_w)^{-1}F(\tilde{\beta }_w))\Vert |^2]\cdot E[\Vert F(\tilde{\beta }_w)^{-1}s(\beta ^*)\Vert ^2] \\\le & {} O(n^{-2})+O(G_n^4n^{-4}) +2E[\Vert |(F(\tilde{\beta })^{-1}F(\beta ^*))\Vert |^2]\\\le & {} 2 E[\Vert F(\beta ^*)^{-1}s(\beta ^*)\Vert ^2] +O(n^{-2})+O(G_n^4n^{-4}).\ (\text {through Theorem 4}) \end{aligned}$$

$\Box $

Proof of Theorem 6:

Noting that

$$\begin{aligned} P(e_{g,k}=1)=\frac{r_g}{n} \ (g=1,2,\ldots ,G_n), \end{aligned}$$

we have $Ee_{g,k}=r_g/n $ in (2.2). By Hoeffding’s inequality, we have, for $t>0$,

$$\begin{aligned} P\Big (\frac{1}{n}\big |\sum _{k=1}^n e_{g,k}-r_g \big |>t \Big ) \le 2 e^{-2n^2 t^2}. \end{aligned}$$

Let $ t=\varepsilon $, we obtain

$$\begin{aligned} P\Big (\frac{1}{n}\big |\sum _{k=1}^n e_{g,k}-r_g \big |>\varepsilon \Big ) \le 2 e^{-2n^2 \varepsilon ^2}\rightarrow 0\quad \text {as}\ n\rightarrow \infty . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, G., Sun, Y. & Jiang, X. A partitioned quasi-likelihood for distributed statistical inference. Comput Stat 35, 1577–1596 (2020). https://doi.org/10.1007/s00180-020-00974-4

Download citation

Received: 23 October 2018
Accepted: 03 March 2020
Published: 09 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s00180-020-00974-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A partitioned quasi-likelihood for distributed statistical inference

Abstract

Access this article

Similar content being viewed by others

Parallel inference for big data with the group Bayesian method

The Transition from A Priori to A Posteriori Information: Bayesian Procedures in Distributed Large-Scale Data Processing Systems

Optimal subsample selection for massive logistic regression with distributed data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (rar 626 KB)

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A partitioned quasi-likelihood for distributed statistical inference

Abstract

Access this article

Similar content being viewed by others

Parallel inference for big data with the group Bayesian method

The Transition from A Priori to A Posteriori Information: Bayesian Procedures in Distributed Large-Scale Data Processing Systems

Optimal subsample selection for massive logistic regression with distributed data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (rar 626 KB)

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation