Abstract
The expectation–maximization (EM) algorithm is a popular algorithm for finding maximum likelihood estimates from incomplete data. However, the EM algorithm converges slowly when the proportion of missing data is large. Although many acceleration algorithms have been proposed, they require complex calculations. Kuroda and Sakakihara (Comput Stat Data Anal 51:1549–1561, 2006) developed the \(\varepsilon \)-accelerated EM algorithm which only uses the sequence of estimates obtained by the EM algorithm to get an accelerated sequence for the EM sequence but does not change the original EM sequence. We find that the accelerated sequence often has larger values of the likelihood than the current estimate obtained by the EM algorithm. Thus, in this paper, we try to re-start the EM iterations using the accelerated sequence and then generate a new EM sequence that increases its speed of convergence. This algorithm has another advantage of simple implementation since it only uses the EM iterations and re-starts the iterations by an estimate with a larger likelihood. The re-starting algorithm called the \(\varepsilon \)R-accelerated EM algorithm can further improve the EM algorithm and the \(\varepsilon \)-accelerated EM algorithm in the sense of that it can reduces the number of iterations and computation time.
Similar content being viewed by others
Notes
Times are typically available to 10 msec.
References
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41:561–575
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–22
Jamshidian M, Jennrich RI (1993) Conjugate gradient acceleration of the EM algorithm. J Am Stat Assoc 88:221–228
Jamshidian M, Jennrich RI (1997) Acceleration of the EM algorithm by using quasi-Newton methods. J R Stat Soc Ser B 59:569–587
Jones O, Maillardet R, Robinson A (2009) Introduction to scientific programming and simulation using R. Chapman & Hall/CRC, Boca Raton
Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41:577–590
Kuroda M, Sakakihara M (2006) Accelerating the convergence of the EM algorithm using the vector \(\varepsilon \) algorithm. Comput Stat Data Anal 51:1549–1561
Laird NM, Lange K, Stram DO (1987) Maximum likelihood computations with repeated measures: application of the EM algorithm. J Am Stat Assoc 82:97–105
Lange K (1995) A quasi Newton acceleration of the EM algorithm. Stat Sin 5:1–18
Lee G, Scott C (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput Stat Data Anal 56:2816–2829
Lee G, Finn W, Scott C (2011) Statistical file matching of flow cytometry data. J Biomed Inform 44:663–676
Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100:257–265
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44:226–233
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Meng XL, Rubin DB (1994) On the global and componentwise rates of convergence of the EM algorithm. Linear Algebra Appl 199:413–425
Pynea S, Hua X, Wangb K, Rossina E, Linc T, Maiera LM, Baecher-Alland C, McLachlan GJ, Tamayoa P, Haflera DA, De Jagera PL, Mesirova JP (2009) Automated high-dimensional flow cytometry data analysis. Proc Natl Acad Sci USA 106:8519–8524
R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall/CRC, London
Snedecor GW, Cochran WC (1967) Statistical methods. Iowa State University Press, Iowa
Wang M, Kuroda M, Sakakihara M, Geng Z (2008) Acceleration of the EM algorithm using the vector epsilon algorithm. Comput Stat 23:469–486
Wynn P (1962) Acceleration techniques for iterated vector and matrix problems. Math Comp 16:301–322
Acknowledgments
The authors would like to thank the editor and two referees for their valuable comments and helpful suggestions. This research is supported by the Japan Society for the Promotion of Science (JSPS), Grant-in-Aid for Scientific Research (C), No. 24500353.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: The vector \(\varepsilon \) algorithm
Let \(\theta ^{(t)}\) denote a vector of dimensionality \(d\) that converges to a vector \(\theta ^{(\infty )}\) as \(t \rightarrow \infty \). Let the inverse \([\theta ]^{-1}\) of a vector \(\theta \) be defined by
where \(|| \theta ||\) is the Euclidean norm of \(\theta \).
In general, the vector \(\varepsilon \) algorithm for a sequence \(\{\theta ^{(t)}\}_{t \ge 0}\) starts with
and then generates a vector \(\varepsilon ^{(t,k+1)}\) by
For practical implementation, we apply the vector \(\varepsilon \) algorithm for \(k=1\) to accelerate the convergence of \(\{\theta ^{(t)}\}_{t \ge 0}\). From Eq. (14), we have
Then the vector \(\varepsilon ^{(t,2)}\) becomes as follows:
Appendix 2: Pseudo-code of the \(\varepsilon \)R-accelerated EM algorithm
Initialization
We set the initial value of the EM step \(\theta _{0}\), the desired precision \(\delta \), the threshold \(\delta _{Re} (>\delta )\) and the size of decrement \(10^{-k}\) and determine the maximum number of iterations (\(itrmax\)).
Appendix 3: The AEM algorithm of Jamshidian and Jennrich (1993)
We briefly introduce the AEM algorithm, which applies the generalized conjugate gradient (CG) algorithm to accelerate the EM algorithm. The idea of the AEM algorithm is that the change in \(\theta '\) after an EM iteration \(\tilde{\mathbf {g}}(\theta ') = M(\theta ')-\theta '\) can be viewed approximately as a generalized gradient. Thus the AEM algorithm treats the EM step as a generalized gradient and uses the generalized CG algorithm as an EM accelerator. The generalized gradient \(\tilde{\mathbf {g}}(\theta ')\) is given by
In the AEM algorithm, first the EM algorithm runs until the difference between \(2 L_{o}(\theta ^{(t)})\) and \(2 L_{o}(\theta ^{(t-1)})\) falls below one. Then the CG accelerator updates the EM estimate \(\theta ^{(t)}\) for obtaining \(\theta ^{(t+1)}\):
-
1.
Set \(t=0\) and \(\mathbf {d}^{(t)}=\tilde{\mathbf {g}}(\theta ^{(t)})\).
-
2.
Find \(\alpha ^{(t)}\) to maximize \(L_{o}(\theta ^{(t)}+\alpha \mathbf {d}^{(t)})\) using a line search algorithm.
-
3.
Update \(\theta ^{(t+1)} = \theta ^{(t)}+\alpha ^{(t)} \mathbf {d}^{(t)}\).
-
4.
Compute
$$\begin{aligned} \tilde{\mathbf {g}}(\theta ^{(t+1)})&= M(\theta ^{(t+1)})-\theta ^{(t+1)}, \\ \gamma ^{(t)}&= \frac{ \{\mathbf {g}(\theta ^{(t+1)})-\mathbf {g}(\theta ^{(t)})\}^{\top } \tilde{\mathbf {g}}(\theta ^{(t+1)}) }{ \{\mathbf {g}(\theta ^{(t+1)})-\mathbf {g}(\theta ^{(t)})\}^{\top } \mathbf {d}^{(t)}}, \end{aligned}$$where \(\mathbf {g}(\theta )\) is the gradient of \(L_{o}(\theta )\).
-
5.
Update \(\mathbf {d}^{(t+1)}=\tilde{\mathbf {g}}(\theta ^{(t+1)})-\gamma ^{(t)} \mathbf {d}^{(t)}\) and then \(t=t+1\).
-
6.
Repeat Steps 2 to 5 until \(|| \tilde{\mathbf {g}}(\theta ^{(t)}) ||^{2} \le \delta \).
Rights and permissions
About this article
Cite this article
Kuroda, M., Geng, Z. & Sakakihara, M. Improving the vector \(\varepsilon \) acceleration for the EM algorithm using a re-starting procedure. Comput Stat 30, 1051–1077 (2015). https://doi.org/10.1007/s00180-015-0565-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-015-0565-y