Abstract
In the big data setting, working data sets are often distributed on multiple machines. However, classical statistical methods are often developed to solve the problems of single estimation or inference. We employ a novel parallel quasi-likelihood method in generalized linear models, to make the variances between different sub-estimators relatively similar. Estimates are obtained from projection subsets of data and later combined by suitably-chosen unknown weights. We also show the proposed method to produce better asymptotic efficiency than using the simple average. Furthermore, simulation examples show that the proposed method can significantly improve statistical inference.
Similar content being viewed by others
References
Battey H, Fan J, Liu H, Lu J, Zhu Z (2015) Distributed estimation and inference with statistical guarantees. ArXiv:1509.05457
Carbonell F, Iturria-Medina Y, Jimenez JC (2016) Multiple shooting-local linearization method for the identification of dynamical systems. Commun Nonlinear Sci Numer Simul 37:292–304
Deuflhard P (2011) Newton methods for nonlinear problems: affine invariance and adaptive algorithms, vol 35. Springer, Berlin
Deuflhard P (2018) The grand four: affine invariant globalizations of Newton’s method. Vietnam J Math 46(4):761–777
Fahrmeir L, Tutz G (2013) Multivariate statistical modelling based on generalized linear models. Springer, Berlin
Guo G, You W, Qian G, Shao W (2015) Parallel maximum likelihood estimator for multiple linear regression models. J Comput Appl Math 273:251–263
Hasenclever L, Webb S, Lienart T, Vollmer S, Lakshminarayanan B, Blundell C, Teh Y (2017) Distributed bayesian learning with stochastic natural gradient expectation propagation and the posterior server. J Mach Learn Res 18(106):1–37
Hu T, Chang H (1999) Stability for randomly weighted sums of random elements. Int J Math Math Sci 22(3):559–568
Huang C, Huo X (2015) A distributed one-step estimator. ArXiv:1511.01443
Jordan MI, Lee JD, Yang Y (2018) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:1–14
Kleiner A, Talwalkar A, Sarkar P, Jordan M (2014) A scalable bootstrap for massive data. J R Stat Soc Ser B (Stat Methodol) 76(4):795–816
Lang S (1993) Real and functional analysis. Springer, New York
Matoušek J (2008) On variants of the Johnson-Lindenstrauss lemma. Random Struct Algorithms 33(2):142–156
Minsker S, Strawn N (2017) Distributed statistical estimation and rates of convergence in normal approximation. ArXiv: 1704.02658
Moualeu-Ngangue DP, Röblitz S, Ehrig R, Deuflhard P (2015) Parameter identification in a tuberculosis model for Cameroon. PLoS ONE 10(4):e0120607
Owen J, Wilkinson D, Gillespie C (2015) Scalable inference for Markov processes with intractable likelihoods. Stat Comput 25(1):145–156
Pilanci M, Wainwright MJ (2016) Iterative Hessian sketch: fast and accurate solution approximation for constrained least-squares. J Mach Learn Res 17(1):1842–1879
Pratola M, Chipman H, Gattiker J, Higdon D, McCulloch R, Rust W (2014) Parallel Bayesian additive regression trees. J Comput Graph Stat 23(3):830–852
Sengupta S, Volgushev S, Shao X (2016) A subsampled double bootstrap for massive data. J Am Stat Assoc 111(515):1222–1232
Shamir O, Srebro N, Zhang T (2014) Communication-efficient distributed optimization using an approximate Newton-type method. In: International conference on machine learning, pp 1000–1008
Song Q, Liang F (2015) A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression. J R Stat Soc Ser B (Stat Methodol) 77(5):947–972
Zhang K, Zhang L, Yang M (2012) Real-time compressive tracking. European conference on computer vision. Springer, Berlin, pp 864–877
Acknowledgements
Guangbao Guo and Yue Sun were supported by a Grant from Natural Science Foundation of Shandong under project ID ZR2016AM09. Xuejun Jiang was supported by the Natural Science Foundation of Guangdong (2017A030313012).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
Proof of Theorem 1:
We first prove the existence of the sequence \(\{w^{opt}_{n,g}\}_{g=1}^{G_n}\). Note that
And
Let
Then
(i) For \(m\in \mathbb {N}^+\), \(G_n=2m+1\). We set \(w_{n,1}+w_{n,2m+1}=\ldots =w_{n,m}+w_{n,m+2}=2/G_n\) and \(w_{n,m+1}=1/G_n\). Let \(w_{n,1} =\ldots =w_{n,m} >1/G_n\), we have \(\text {tr(var}(\hat{\beta }_w))\le \text {tr(var}(\hat{\beta }_A))\).
If we set \(w_{n,1}+w_{n,2}+w_{n,2m}+w_{n,2m+1}=\ldots =w_{n,m-1}+w_{n,m} +w_{n,m+2} +w_{n,m+3}=4/G_n\), \(w_{n,m+1}=1/G\). Let \(w_{n,1}+w_{n,2}=\ldots =w_{n,m-1}+w_{n,m}>2/G_n\), we have \(\text {tr(var}(\hat{\beta }_w))\le \text {tr(var}(\hat{\beta }_A))\).
If we set \(w_{n,1}+w_{n,2}+w_{n,3}+w_{n,2m-1}+w_{n,2m}+w_{n,2m+1}=\ldots =w_{n,m-2}+w_{n,m-1}+w_{n,m} +w_{n,m+2} +w_{n,m+3}+w_{n,m+4}=6/G_n\), \(w_{n,m+1}=1/G_n\). Let \(w_{n,1}+w_{n,2}+w_{n,3}=\ldots =w_{n,m-2}+w_{n,m-1}+w_{n,m}>3/G_n\), we have \(\text {tr(var}(\hat{\beta }_w))\le \text {tr(var}(\hat{\beta }_A))\).
(ii) For \(m\in \mathbb {N}^+\) with \(G_n=2m\). We set \(w_{n,1}+w_{n,2m}=\ldots =w_{n,m}+w_{n,m+1}=2/G_n\). Let \(w_{n,1} =\ldots =w_{n,m} >1/G_n\),
If we set \(w_{n,1}+w_{n,2}+w_{n,2m-1}+w_{n,2m}=\ldots =w_{n,m-1}+w_{n,m}+w_{n,m+1}+w_{n,m+2}=4/G_n\) when m is even number. Let \(w_{n,1}+w_{n,2}=\ldots =w_{n,m-1}+w_{n,m}>2/G_n\), we have \(\text {tr(var}(\hat{\beta }_w))\le \text {tr(var}(\hat{\beta }_A))\).
We now derive the optimal weight sequence \(\{w^{opt}_{n,g}\}_{g=1}^{G_n}\) for \(\hat{\beta }_w\). We choose Lagrangian multiplier method to prove the theorem. We set \(v_g=\text {tr(var}(\hat{\beta }_{n,g}))\) and objective function
where \(\lambda \) is a Lagrangian multiplier parameter. Taking the derivative of \(l(w,\lambda )\) gives
And
We then have optimal weight
We also obtain \(\text {tr(var}(\hat{\beta }_{w^{opt}}))=1/\sum _{g=1}^{G_{n}} v_g^{-1}\). \(\Box \)
Proof of Theorem 2:
Observing that \(e^x\le 1+x+\frac{1}{2}x^2e^{|x|}\) (\(x>0\)), we obtain
Here M is a large constant such that \(M>(r+\epsilon )\), c is a suitable constant. Thus we have the theorem. \(\Box \)
Proof of Theorem 3:
We get this proof through the characteristic function method. For \(1\le u\le G_n-1\),
By Eqs. (3.3) and (3.4), for any \(\epsilon >0\), there exists \(u=u_\epsilon \) satisfying
Define
Due to \(2\Vert |\text {cov}(Y_{n,g},Y_{n,g+1})\Vert |\le \Vert |\text {var}(Y_{n,g})\Vert |+\Vert |\text {var}(Y_{n,g+1})\Vert |\), \(A_h\) is nonempty for any h. Let \(m_0=0\), \(m_{h+1}=\min \{m:m>m_h,m\in A_h\}\) for \(\{m_g\}_{g=1}^{G_n}\), and set
Note that
For any t,
We then have the theorem. \(\Box \)
Proof of Theorem 4:
We first show that \(\sqrt{n}(\hat{\beta }_w-\beta ^*)\) is bounded when \(G_n=O(p_n)\), after we present two lemmas of Huang and Huo (2015),
By applying Theorem 3 and (7.1), we have
Then \(\lim _{n\rightarrow \infty } \sqrt{n} E (\hat{\beta }_{n,1}-\beta ^*)\) is finite. By (7.2), we have \(\Vert E(\hat{\beta }_{n,g}-\beta ^*)\Vert =O(1/p_n),\ g=1,2,\ldots ,G_n\). Thus
and when \(G_n=O(p_n)\), \(\sqrt{n} \Vert \hat{\beta }_w-\beta ^*\Vert =O(1)\ \text {as} \ n \rightarrow \infty \).
By Theorem 4.2 of Lang (1993), there exists an integrator variable \(\rho \ (\in [0,1])\) such that
For \(G_n=O(p_n)\), by applying Theorem 3, we obtain
Since \(F(\cdot )\) is a continuous function,
Thus
We have \(F(\hat{\beta }_w){\mathop {\longrightarrow }\limits ^{P}}F(\beta ^*)\) due to \(\hat{\beta }_w{\mathop {\longrightarrow }\limits ^{P}}\beta ^*\). By Slutsky’s lemma, we thus have
\(\Box \)
Proof of Theorem 5:
Note that
we then obtain
It is observed that
Then
By Hölder’s inequality, we have
Applying \((b+c)^2\le 2(b^2+c^2)\), we obtain
\(\Box \)
Proof of Theorem 6:
Noting that
we have \(Ee_{g,k}=r_g/n \) in (2.2). By Hoeffding’s inequality, we have, for \(t>0\),
Let \( t=\varepsilon \), we obtain
Rights and permissions
About this article
Cite this article
Guo, G., Sun, Y. & Jiang, X. A partitioned quasi-likelihood for distributed statistical inference. Comput Stat 35, 1577–1596 (2020). https://doi.org/10.1007/s00180-020-00974-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-00974-4