Parallel computing in linear mixed models

Gokalp Yavuz, Fulya; Schloerke, Barret

doi:10.1007/s00180-019-00950-7

Parallel computing in linear mixed models

Original paper
Published: 07 January 2020

Volume 35, pages 1273–1289, (2020)
Cite this article

Computational Statistics Aims and scope Submit manuscript

510 Accesses
2 Citations
Explore all metrics

Abstract

In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed method is used to fit LMM with dense and sparse parameters and for large number of observations. It is faster than the classical approach and generalizes for big data. Supplementary sources for the proposed method are available in the R package lmmpar.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spike and slab Bayesian sparse principal component analysis

Article 13 May 2024

Robust estimation in regression and classification methods for large dimensional data

Article 05 July 2023

Quantile-based random sparse Kaczmarz for corrupted and noisy linear systems

Article 13 May 2024

References

Broderick T, Boyd N, Wibisono A, Wilson AC, Jordan MI (2013) Streaming variational Bayes. In proceedings of the 26th international conference on neural information processing systems—volume 2, NIPS’13. Curran Associates Inc, New York, pp 1727–1735
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1):1–38
MathSciNet MATH Google Scholar
Gokalp Yavuz F, Schloerke B (2017) Parallel Linear Mixed Model https://CRAN.R-project.org/package=lmmpar, R package version 0.1.0
Guo G (2012) Parallel statistical computing for statistical inference. J Stat Theory Pract 6(3):536–565
Article MathSciNet Google Scholar
Guo G, You W, Qian G, Shao W (2015) Parallel maximum likelihood estimator for multiple linear regression models. J Comput Appl Math 273:251–263
Article MathSciNet Google Scholar
Kane MJ, Emerson J, Weston S (2013) Scalable strategies for computing with massive data. J Stat Softw 55(14):1–19
Article Google Scholar
Kontoghiorghes EJ (2005) Handbook of parallel computing and statistics (statistics, textbooks and monographs). Chapman & Hall/CRC, Boca Raton
Book Google Scholar
Laird NM, Ware JH (1982) Random-effects models for longitudinal data. Biometrics 38(4):963–74
Article Google Scholar
Liu C, Rubin DB (1994) The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4):633
Article MathSciNet Google Scholar
Maclaurin D, Adams RP (2014) Firefly monte carlo: exact MCMC with subsets of data. In Proceedings of the thirtieth conference on uncertainty in artificial intelligence, UAI’14. AUAI Press, Arlington, pp 543–552
Nagel K, Rickert M (2001) Parallel implementation of the transims micro-simulation. Parallel Comput 27:1611–1639
Article Google Scholar
Neiswanger W, Wang C, Xing EP (2014) Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the thirtieth conference on uncertainty in artificial intelligence, UAI’14. AUAI Press, Arlington, pp 623–632
Ooi H, Microsoft Corporation, Weston S, Tenenbaum D (2019a) doParallel: foreach parallel adaptor for the ‘parallel’ package. R package version 1.0.15. https://cran.r-project.org/web/packages/doParallel/index.html
Ooi H, Microsoft Corporation, Weston S (2019b) Foreach: provides foreach looping construct. R package version 1.4.7. https://cran.r-project.org/web/packages/foreach/index.html
Pinheiro JC, Liu C, Wu YN (2001) Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. J Comput Graph Stat 10(2):249–276
Article MathSciNet Google Scholar
R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Renaut RA (1998) A parallel multisplitting solution of the least squares problem. Numer Linear Algeb Appl 5(1):11–31
Article MathSciNet Google Scholar
Schafer JL (1998) Some improved procedures for linear mixed models. Technical Report, Department of Statistics, The Pennsylvania State University
Tran M-N, Nott DJ, Kuk AYC, Kohn R (2016) Parallel variational Bayes for large datasets with an application to generalized linear mixed models. J Comput Graph Stat 25(2):626–646
Article MathSciNet Google Scholar
Wickham H (2011) The split-apply-combine strategy for data analysis. J Stat Softw 40(1):1–29
Article MathSciNet Google Scholar
Wolfe J, Haghighi A, Klein D (2008) Fully distributed EM for very large datasets. In: Proceedings of the 25th international conference on machine learning, ICML ’08. ACM, New York, pp 1184–1191
Yavuz FG, Arslan O (2018) Linear mixed model with Laplace distribution (LLMM). Stat Pap 59(1):271–289
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Purdue University, 250 N. University St., West Lafayette, IN, 47906, USA
Fulya Gokalp Yavuz & Barret Schloerke
Department of Statistics, Middle East Technical University, 140, Ankara, Turkey
Fulya Gokalp Yavuz

Authors

Fulya Gokalp Yavuz
View author publications
You can also search for this author in PubMed Google Scholar
Barret Schloerke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fulya Gokalp Yavuz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gokalp Yavuz, F., Schloerke, B. Parallel computing in linear mixed models. Comput Stat 35, 1273–1289 (2020). https://doi.org/10.1007/s00180-019-00950-7

Download citation

Received: 31 January 2018
Accepted: 23 December 2019
Published: 07 January 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00180-019-00950-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel computing in linear mixed models

Abstract

Access this article

Similar content being viewed by others

Spike and slab Bayesian sparse principal component analysis

Robust estimation in regression and classification methods for large dimensional data

Quantile-based random sparse Kaczmarz for corrupted and noisy linear systems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel computing in linear mixed models

Abstract

Access this article

Similar content being viewed by others

Spike and slab Bayesian sparse principal component analysis

Robust estimation in regression and classification methods for large dimensional data

Quantile-based random sparse Kaczmarz for corrupted and noisy linear systems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation