Improving the vector $$\varepsilon $$ acceleration for the EM algorithm using a re-starting procedure

Kuroda, Masahiro; Geng, Zhi; Sakakihara, Michio

doi:10.1007/s00180-015-0565-y

Improving the vector $\varepsilon $ acceleration for the EM algorithm using a re-starting procedure

Original Paper
Published: 10 February 2015

Volume 30, pages 1051–1077, (2015)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Masahiro Kuroda¹,
Zhi Geng² &
Michio Sakakihara³

347 Accesses
6 Citations
Explore all metrics

Abstract

The expectation–maximization (EM) algorithm is a popular algorithm for finding maximum likelihood estimates from incomplete data. However, the EM algorithm converges slowly when the proportion of missing data is large. Although many acceleration algorithms have been proposed, they require complex calculations. Kuroda and Sakakihara (Comput Stat Data Anal 51:1549–1561, 2006) developed the $\varepsilon $-accelerated EM algorithm which only uses the sequence of estimates obtained by the EM algorithm to get an accelerated sequence for the EM sequence but does not change the original EM sequence. We find that the accelerated sequence often has larger values of the likelihood than the current estimate obtained by the EM algorithm. Thus, in this paper, we try to re-start the EM iterations using the accelerated sequence and then generate a new EM sequence that increases its speed of convergence. This algorithm has another advantage of simple implementation since it only uses the EM iterations and re-starts the iterations by an estimate with a larger likelihood. The re-starting algorithm called the $\varepsilon $R-accelerated EM algorithm can further improve the EM algorithm and the $\varepsilon $-accelerated EM algorithm in the sense of that it can reduces the number of iterations and computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incomplete-data Fisher scoring method with steplength adjustment

Article Open access 05 February 2020

An information criterion for model selection with missing data via complete-data divergence

Article 21 January 2017

The EM algorithm for ML Estimators under nonlinear inequalities restrictions on the parameters

Article 27 December 2019

Notes

Times are typically available to 10 msec.

References

Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41:561–575
Article MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–22
MathSciNet MATH Google Scholar
Jamshidian M, Jennrich RI (1993) Conjugate gradient acceleration of the EM algorithm. J Am Stat Assoc 88:221–228
MathSciNet MATH Google Scholar
Jamshidian M, Jennrich RI (1997) Acceleration of the EM algorithm by using quasi-Newton methods. J R Stat Soc Ser B 59:569–587
Article MathSciNet MATH Google Scholar
Jones O, Maillardet R, Robinson A (2009) Introduction to scientific programming and simulation using R. Chapman & Hall/CRC, Boca Raton
MATH Google Scholar
Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41:577–590
Article MathSciNet MATH Google Scholar
Kuroda M, Sakakihara M (2006) Accelerating the convergence of the EM algorithm using the vector $\varepsilon $ algorithm. Comput Stat Data Anal 51:1549–1561
Article MathSciNet MATH Google Scholar
Laird NM, Lange K, Stram DO (1987) Maximum likelihood computations with repeated measures: application of the EM algorithm. J Am Stat Assoc 82:97–105
Article MathSciNet MATH Google Scholar
Lange K (1995) A quasi Newton acceleration of the EM algorithm. Stat Sin 5:1–18
MATH MathSciNet Google Scholar
Lee G, Scott C (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput Stat Data Anal 56:2816–2829
Article MathSciNet MATH Google Scholar
Lee G, Finn W, Scott C (2011) Statistical file matching of flow cytometry data. J Biomed Inform 44:663–676
Article Google Scholar
Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100:257–265
Article MATH MathSciNet Google Scholar
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
MATH Google Scholar
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44:226–233
MathSciNet MATH Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Meng XL, Rubin DB (1994) On the global and componentwise rates of convergence of the EM algorithm. Linear Algebra Appl 199:413–425
Article MathSciNet MATH Google Scholar
Pynea S, Hua X, Wangb K, Rossina E, Linc T, Maiera LM, Baecher-Alland C, McLachlan GJ, Tamayoa P, Haflera DA, De Jagera PL, Mesirova JP (2009) Automated high-dimensional flow cytometry data analysis. Proc Natl Acad Sci USA 106:8519–8524
Article Google Scholar
R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall/CRC, London
Book MATH Google Scholar
Snedecor GW, Cochran WC (1967) Statistical methods. Iowa State University Press, Iowa
MATH Google Scholar
Wang M, Kuroda M, Sakakihara M, Geng Z (2008) Acceleration of the EM algorithm using the vector epsilon algorithm. Comput Stat 23:469–486
Article MathSciNet MATH Google Scholar
Wynn P (1962) Acceleration techniques for iterated vector and matrix problems. Math Comp 16:301–322
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the editor and two referees for their valuable comments and helpful suggestions. This research is supported by the Japan Society for the Promotion of Science (JSPS), Grant-in-Aid for Scientific Research (C), No. 24500353.

Author information

Authors and Affiliations

Department of Socio-Information, Okayama University of Science, 1-1 Ridaicho, Okayama, 700-0005, Japan
Masahiro Kuroda
School of Mathematical Sciences, LMAM, Peking University, Beijing, 100871, China
Zhi Geng
Department of Information Science, Okayama University of Science, 1-1 Ridaicho, Okayama, 700-0005, Japan
Michio Sakakihara

Authors

Masahiro Kuroda
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Geng
View author publications
You can also search for this author in PubMed Google Scholar
Michio Sakakihara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masahiro Kuroda.

Appendices

Appendix 1: The vector $\varepsilon $ algorithm

Let $\theta ^{(t)}$ denote a vector of dimensionality $d$ that converges to a vector $\theta ^{(\infty )}$ as $t \rightarrow \infty $. Let the inverse $[\theta ]^{-1}$ of a vector $\theta $ be defined by

$$\begin{aligned}{}[\theta ]^{-1} = \frac{\theta }{\Vert \theta \Vert ^{2}}, \end{aligned}$$

where $|| \theta ||$ is the Euclidean norm of $\theta $.

In general, the vector $\varepsilon $ algorithm for a sequence $\{\theta ^{(t)}\}_{t \ge 0}$ starts with

$$\begin{aligned} \varepsilon ^{(t,-1)}=0, \qquad \varepsilon ^{(t,0)}=\theta ^{(t)}, \end{aligned}$$

and then generates a vector $\varepsilon ^{(t,k+1)}$ by

$$\begin{aligned} \varepsilon ^{(t,k+1)} = \varepsilon ^{(t+1,k-1)}+ \left[ \varepsilon ^{(t+1,k)}- \varepsilon ^{(t,k)} \right] ^{-1}, \qquad k=0,1,2,\ldots . \end{aligned}$$

(14)

For practical implementation, we apply the vector $\varepsilon $ algorithm for $k=1$ to accelerate the convergence of $\{\theta ^{(t)}\}_{t \ge 0}$. From Eq. (14), we have

$$\begin{aligned} \varepsilon ^{(t,2)}&= \varepsilon ^{(t+1,0)}+ \left[ \varepsilon ^{(t+1,1)}- \varepsilon ^{(t,1)} \right] ^{-1} \quad \text{ for } k=1, \\ \varepsilon ^{(t,1)}&= \varepsilon ^{(t+1,-1)}+ \left[ \varepsilon ^{(t+1,0)}- \varepsilon ^{(t,0)} \right] ^{-1} = \left[ \varepsilon ^{(t+1,0)}- \varepsilon ^{(t,0)} \right] ^{-1} \quad \text{ for } k=0. \end{aligned}$$

Then the vector $\varepsilon ^{(t,2)}$ becomes as follows:

$$\begin{aligned} \varepsilon ^{(t,2)}&= \varepsilon ^{(t+1,0)} + \left[ \left[ \varepsilon ^{(t,0)}-\varepsilon ^{(t+1,0)}\right] ^{-1} +\left[ \varepsilon ^{(t+2,0)}-\varepsilon ^{(t+1,0)} \right] ^{-1} \right] ^{-1} \\&= \theta ^{(t+1)} + \left[ \left[ \theta ^{(t)}-\theta ^{(t+1)} \right] ^{-1} +\left[ \theta ^{(t+2)}-\theta ^{(t+1)}\right] ^{-1} \right] ^{-1}. \end{aligned}$$

Appendix 2: Pseudo-code of the $\varepsilon $R-accelerated EM algorithm

Initialization

We set the initial value of the EM step $\theta _{0}$, the desired precision $\delta $, the threshold $\delta _{Re} (>\delta )$ and the size of decrement $10^{-k}$ and determine the maximum number of iterations ($itrmax$).

Appendix 3: The AEM algorithm of Jamshidian and Jennrich (1993)

We briefly introduce the AEM algorithm, which applies the generalized conjugate gradient (CG) algorithm to accelerate the EM algorithm. The idea of the AEM algorithm is that the change in $\theta '$ after an EM iteration $\tilde{\mathbf {g}}(\theta ') = M(\theta ')-\theta '$ can be viewed approximately as a generalized gradient. Thus the AEM algorithm treats the EM step as a generalized gradient and uses the generalized CG algorithm as an EM accelerator. The generalized gradient $\tilde{\mathbf {g}}(\theta ')$ is given by

$$\begin{aligned} \tilde{\mathbf {g}}(\theta ') = M(\theta ')-\theta ' \approx - \left. \frac{\partial ^{2} Q(\theta |\theta ')}{\partial \theta \partial \theta ^{\top }}^{-1} \right| _{\theta =\theta '} \left. \frac{\partial L_{o}(\theta )}{\partial \theta } \right| _{\theta =\theta '}. \end{aligned}$$

In the AEM algorithm, first the EM algorithm runs until the difference between $2 L_{o}(\theta ^{(t)})$ and $2 L_{o}(\theta ^{(t-1)})$ falls below one. Then the CG accelerator updates the EM estimate $\theta ^{(t)}$ for obtaining $\theta ^{(t+1)}$:

1.
Set $t=0$ and $\mathbf {d}^{(t)}=\tilde{\mathbf {g}}(\theta ^{(t)})$.
2.
Find $\alpha ^{(t)}$ to maximize $L_{o}(\theta ^{(t)}+\alpha \mathbf {d}^{(t)})$ using a line search algorithm.
3.
Update $\theta ^{(t+1)} = \theta ^{(t)}+\alpha ^{(t)} \mathbf {d}^{(t)}$.
4.
Compute
$$\begin{aligned} \tilde{\mathbf {g}}(\theta ^{(t+1)})&= M(\theta ^{(t+1)})-\theta ^{(t+1)}, \\ \gamma ^{(t)}&= \frac{ \{\mathbf {g}(\theta ^{(t+1)})-\mathbf {g}(\theta ^{(t)})\}^{\top } \tilde{\mathbf {g}}(\theta ^{(t+1)}) }{ \{\mathbf {g}(\theta ^{(t+1)})-\mathbf {g}(\theta ^{(t)})\}^{\top } \mathbf {d}^{(t)}}, \end{aligned}$$
where $\mathbf {g}(\theta )$ is the gradient of $L_{o}(\theta )$.
5.
Update $\mathbf {d}^{(t+1)}=\tilde{\mathbf {g}}(\theta ^{(t+1)})-\gamma ^{(t)} \mathbf {d}^{(t)}$ and then $t=t+1$.
6.
Repeat Steps 2 to 5 until $|| \tilde{\mathbf {g}}(\theta ^{(t)}) ||^{2} \le \delta $.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuroda, M., Geng, Z. & Sakakihara, M. Improving the vector $\varepsilon $ acceleration for the EM algorithm using a re-starting procedure. Comput Stat 30, 1051–1077 (2015). https://doi.org/10.1007/s00180-015-0565-y

Download citation

Received: 15 November 2013
Accepted: 31 January 2015
Published: 10 February 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s00180-015-0565-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the vector \(\varepsilon \) acceleration for the EM algorithm using a re-starting procedure

Abstract

Access this article

Similar content being viewed by others

Incomplete-data Fisher scoring method with steplength adjustment

An information criterion for model selection with missing data via complete-data divergence

The EM algorithm for ML Estimators under nonlinear inequalities restrictions on the parameters

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The vector \(\varepsilon \) algorithm

Appendix 2: Pseudo-code of the \(\varepsilon \)R-accelerated EM algorithm

Appendix 3: The AEM algorithm of Jamshidian and Jennrich (1993)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the vector \(\varepsilon \) acceleration for the EM algorithm using a re-starting procedure

Abstract

Access this article

Similar content being viewed by others

Incomplete-data Fisher scoring method with steplength adjustment

An information criterion for model selection with missing data via complete-data divergence

The EM algorithm for ML Estimators under nonlinear inequalities restrictions on the parameters

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The vector \(\varepsilon \) algorithm

Appendix 2: Pseudo-code of the \(\varepsilon \)R-accelerated EM algorithm

Appendix 3: The AEM algorithm of Jamshidian and Jennrich (1993)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation