Abstract
In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed method is used to fit LMM with dense and sparse parameters and for large number of observations. It is faster than the classical approach and generalizes for big data. Supplementary sources for the proposed method are available in the R package lmmpar.
Similar content being viewed by others
References
Broderick T, Boyd N, Wibisono A, Wilson AC, Jordan MI (2013) Streaming variational Bayes. In proceedings of the 26th international conference on neural information processing systems—volume 2, NIPS’13. Curran Associates Inc, New York, pp 1727–1735
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1):1–38
Gokalp Yavuz F, Schloerke B (2017) Parallel Linear Mixed Model https://CRAN.R-project.org/package=lmmpar, R package version 0.1.0
Guo G (2012) Parallel statistical computing for statistical inference. J Stat Theory Pract 6(3):536–565
Guo G, You W, Qian G, Shao W (2015) Parallel maximum likelihood estimator for multiple linear regression models. J Comput Appl Math 273:251–263
Kane MJ, Emerson J, Weston S (2013) Scalable strategies for computing with massive data. J Stat Softw 55(14):1–19
Kontoghiorghes EJ (2005) Handbook of parallel computing and statistics (statistics, textbooks and monographs). Chapman & Hall/CRC, Boca Raton
Laird NM, Ware JH (1982) Random-effects models for longitudinal data. Biometrics 38(4):963–74
Liu C, Rubin DB (1994) The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4):633
Maclaurin D, Adams RP (2014) Firefly monte carlo: exact MCMC with subsets of data. In Proceedings of the thirtieth conference on uncertainty in artificial intelligence, UAI’14. AUAI Press, Arlington, pp 543–552
Nagel K, Rickert M (2001) Parallel implementation of the transims micro-simulation. Parallel Comput 27:1611–1639
Neiswanger W, Wang C, Xing EP (2014) Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the thirtieth conference on uncertainty in artificial intelligence, UAI’14. AUAI Press, Arlington, pp 623–632
Ooi H, Microsoft Corporation, Weston S, Tenenbaum D (2019a) doParallel: foreach parallel adaptor for the ‘parallel’ package. R package version 1.0.15. https://cran.r-project.org/web/packages/doParallel/index.html
Ooi H, Microsoft Corporation, Weston S (2019b) Foreach: provides foreach looping construct. R package version 1.4.7. https://cran.r-project.org/web/packages/foreach/index.html
Pinheiro JC, Liu C, Wu YN (2001) Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. J Comput Graph Stat 10(2):249–276
R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Renaut RA (1998) A parallel multisplitting solution of the least squares problem. Numer Linear Algeb Appl 5(1):11–31
Schafer JL (1998) Some improved procedures for linear mixed models. Technical Report, Department of Statistics, The Pennsylvania State University
Tran M-N, Nott DJ, Kuk AYC, Kohn R (2016) Parallel variational Bayes for large datasets with an application to generalized linear mixed models. J Comput Graph Stat 25(2):626–646
Wickham H (2011) The split-apply-combine strategy for data analysis. J Stat Softw 40(1):1–29
Wolfe J, Haghighi A, Klein D (2008) Fully distributed EM for very large datasets. In: Proceedings of the 25th international conference on machine learning, ICML ’08. ACM, New York, pp 1184–1191
Yavuz FG, Arslan O (2018) Linear mixed model with Laplace distribution (LLMM). Stat Pap 59(1):271–289
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gokalp Yavuz, F., Schloerke, B. Parallel computing in linear mixed models. Comput Stat 35, 1273–1289 (2020). https://doi.org/10.1007/s00180-019-00950-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-019-00950-7