Skip to main content
Log in

Improving the vector \(\varepsilon \) acceleration for the EM algorithm using a re-starting procedure

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The expectation–maximization (EM) algorithm is a popular algorithm for finding maximum likelihood estimates from incomplete data. However, the EM algorithm converges slowly when the proportion of missing data is large. Although many acceleration algorithms have been proposed, they require complex calculations. Kuroda and Sakakihara (Comput Stat Data Anal 51:1549–1561, 2006) developed the \(\varepsilon \)-accelerated EM algorithm which only uses the sequence of estimates obtained by the EM algorithm to get an accelerated sequence for the EM sequence but does not change the original EM sequence. We find that the accelerated sequence often has larger values of the likelihood than the current estimate obtained by the EM algorithm. Thus, in this paper, we try to re-start the EM iterations using the accelerated sequence and then generate a new EM sequence that increases its speed of convergence. This algorithm has another advantage of simple implementation since it only uses the EM iterations and re-starts the iterations by an estimate with a larger likelihood. The re-starting algorithm called the \(\varepsilon \)R-accelerated EM algorithm can further improve the EM algorithm and the \(\varepsilon \)-accelerated EM algorithm in the sense of that it can reduces the number of iterations and computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Times are typically available to 10 msec.

References

  • Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41:561–575

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–22

    MathSciNet  MATH  Google Scholar 

  • Jamshidian M, Jennrich RI (1993) Conjugate gradient acceleration of the EM algorithm. J Am Stat Assoc 88:221–228

    MathSciNet  MATH  Google Scholar 

  • Jamshidian M, Jennrich RI (1997) Acceleration of the EM algorithm by using quasi-Newton methods. J R Stat Soc Ser B 59:569–587

    Article  MathSciNet  MATH  Google Scholar 

  • Jones O, Maillardet R, Robinson A (2009) Introduction to scientific programming and simulation using R. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41:577–590

    Article  MathSciNet  MATH  Google Scholar 

  • Kuroda M, Sakakihara M (2006) Accelerating the convergence of the EM algorithm using the vector \(\varepsilon \) algorithm. Comput Stat Data Anal 51:1549–1561

    Article  MathSciNet  MATH  Google Scholar 

  • Laird NM, Lange K, Stram DO (1987) Maximum likelihood computations with repeated measures: application of the EM algorithm. J Am Stat Assoc 82:97–105

    Article  MathSciNet  MATH  Google Scholar 

  • Lange K (1995) A quasi Newton acceleration of the EM algorithm. Stat Sin 5:1–18

    MATH  MathSciNet  Google Scholar 

  • Lee G, Scott C (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput Stat Data Anal 56:2816–2829

    Article  MathSciNet  MATH  Google Scholar 

  • Lee G, Finn W, Scott C (2011) Statistical file matching of flow cytometry data. J Biomed Inform 44:663–676

    Article  Google Scholar 

  • Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100:257–265

    Article  MATH  MathSciNet  Google Scholar 

  • Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  • Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44:226–233

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Meng XL, Rubin DB (1994) On the global and componentwise rates of convergence of the EM algorithm. Linear Algebra Appl 199:413–425

    Article  MathSciNet  MATH  Google Scholar 

  • Pynea S, Hua X, Wangb K, Rossina E, Linc T, Maiera LM, Baecher-Alland C, McLachlan GJ, Tamayoa P, Haflera DA, De Jagera PL, Mesirova JP (2009) Automated high-dimensional flow cytometry data analysis. Proc Natl Acad Sci USA 106:8519–8524

    Article  Google Scholar 

  • R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall/CRC, London

    Book  MATH  Google Scholar 

  • Snedecor GW, Cochran WC (1967) Statistical methods. Iowa State University Press, Iowa

    MATH  Google Scholar 

  • Wang M, Kuroda M, Sakakihara M, Geng Z (2008) Acceleration of the EM algorithm using the vector epsilon algorithm. Comput Stat 23:469–486

    Article  MathSciNet  MATH  Google Scholar 

  • Wynn P (1962) Acceleration techniques for iterated vector and matrix problems. Math Comp 16:301–322

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the editor and two referees for their valuable comments and helpful suggestions. This research is supported by the Japan Society for the Promotion of Science (JSPS), Grant-in-Aid for Scientific Research (C), No. 24500353.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masahiro Kuroda.

Appendices

Appendix 1: The vector \(\varepsilon \) algorithm

Let \(\theta ^{(t)}\) denote a vector of dimensionality \(d\) that converges to a vector \(\theta ^{(\infty )}\) as \(t \rightarrow \infty \). Let the inverse \([\theta ]^{-1}\) of a vector \(\theta \) be defined by

$$\begin{aligned}{}[\theta ]^{-1} = \frac{\theta }{\Vert \theta \Vert ^{2}}, \end{aligned}$$

where \(|| \theta ||\) is the Euclidean norm of \(\theta \).

In general, the vector \(\varepsilon \) algorithm for a sequence \(\{\theta ^{(t)}\}_{t \ge 0}\) starts with

$$\begin{aligned} \varepsilon ^{(t,-1)}=0, \qquad \varepsilon ^{(t,0)}=\theta ^{(t)}, \end{aligned}$$

and then generates a vector \(\varepsilon ^{(t,k+1)}\) by

$$\begin{aligned} \varepsilon ^{(t,k+1)} = \varepsilon ^{(t+1,k-1)}+ \left[ \varepsilon ^{(t+1,k)}- \varepsilon ^{(t,k)} \right] ^{-1}, \qquad k=0,1,2,\ldots . \end{aligned}$$
(14)

For practical implementation, we apply the vector \(\varepsilon \) algorithm for \(k=1\) to accelerate the convergence of \(\{\theta ^{(t)}\}_{t \ge 0}\). From Eq. (14), we have

$$\begin{aligned} \varepsilon ^{(t,2)}&= \varepsilon ^{(t+1,0)}+ \left[ \varepsilon ^{(t+1,1)}- \varepsilon ^{(t,1)} \right] ^{-1} \quad \text{ for } k=1, \\ \varepsilon ^{(t,1)}&= \varepsilon ^{(t+1,-1)}+ \left[ \varepsilon ^{(t+1,0)}- \varepsilon ^{(t,0)} \right] ^{-1} = \left[ \varepsilon ^{(t+1,0)}- \varepsilon ^{(t,0)} \right] ^{-1} \quad \text{ for } k=0. \end{aligned}$$

Then the vector \(\varepsilon ^{(t,2)}\) becomes as follows:

$$\begin{aligned} \varepsilon ^{(t,2)}&= \varepsilon ^{(t+1,0)} + \left[ \left[ \varepsilon ^{(t,0)}-\varepsilon ^{(t+1,0)}\right] ^{-1} +\left[ \varepsilon ^{(t+2,0)}-\varepsilon ^{(t+1,0)} \right] ^{-1} \right] ^{-1} \\&= \theta ^{(t+1)} + \left[ \left[ \theta ^{(t)}-\theta ^{(t+1)} \right] ^{-1} +\left[ \theta ^{(t+2)}-\theta ^{(t+1)}\right] ^{-1} \right] ^{-1}. \end{aligned}$$

Appendix 2: Pseudo-code of the \(\varepsilon \)R-accelerated EM algorithm

Initialization

We set the initial value of the EM step \(\theta _{0}\), the desired precision \(\delta \), the threshold \(\delta _{Re} (>\delta )\) and the size of decrement \(10^{-k}\) and determine the maximum number of iterations (\(itrmax\)).

figure a

Appendix 3: The AEM algorithm of Jamshidian and Jennrich (1993)

We briefly introduce the AEM algorithm, which applies the generalized conjugate gradient (CG) algorithm to accelerate the EM algorithm. The idea of the AEM algorithm is that the change in \(\theta '\) after an EM iteration \(\tilde{\mathbf {g}}(\theta ') = M(\theta ')-\theta '\) can be viewed approximately as a generalized gradient. Thus the AEM algorithm treats the EM step as a generalized gradient and uses the generalized CG algorithm as an EM accelerator. The generalized gradient \(\tilde{\mathbf {g}}(\theta ')\) is given by

$$\begin{aligned} \tilde{\mathbf {g}}(\theta ') = M(\theta ')-\theta ' \approx - \left. \frac{\partial ^{2} Q(\theta |\theta ')}{\partial \theta \partial \theta ^{\top }}^{-1} \right| _{\theta =\theta '} \left. \frac{\partial L_{o}(\theta )}{\partial \theta } \right| _{\theta =\theta '}. \end{aligned}$$

In the AEM algorithm, first the EM algorithm runs until the difference between \(2 L_{o}(\theta ^{(t)})\) and \(2 L_{o}(\theta ^{(t-1)})\) falls below one. Then the CG accelerator updates the EM estimate \(\theta ^{(t)}\) for obtaining \(\theta ^{(t+1)}\):

  1. 1.

    Set \(t=0\) and \(\mathbf {d}^{(t)}=\tilde{\mathbf {g}}(\theta ^{(t)})\).

  2. 2.

    Find \(\alpha ^{(t)}\) to maximize \(L_{o}(\theta ^{(t)}+\alpha \mathbf {d}^{(t)})\) using a line search algorithm.

  3. 3.

    Update \(\theta ^{(t+1)} = \theta ^{(t)}+\alpha ^{(t)} \mathbf {d}^{(t)}\).

  4. 4.

    Compute

    $$\begin{aligned} \tilde{\mathbf {g}}(\theta ^{(t+1)})&= M(\theta ^{(t+1)})-\theta ^{(t+1)}, \\ \gamma ^{(t)}&= \frac{ \{\mathbf {g}(\theta ^{(t+1)})-\mathbf {g}(\theta ^{(t)})\}^{\top } \tilde{\mathbf {g}}(\theta ^{(t+1)}) }{ \{\mathbf {g}(\theta ^{(t+1)})-\mathbf {g}(\theta ^{(t)})\}^{\top } \mathbf {d}^{(t)}}, \end{aligned}$$

    where \(\mathbf {g}(\theta )\) is the gradient of \(L_{o}(\theta )\).

  5. 5.

    Update \(\mathbf {d}^{(t+1)}=\tilde{\mathbf {g}}(\theta ^{(t+1)})-\gamma ^{(t)} \mathbf {d}^{(t)}\) and then \(t=t+1\).

  6. 6.

    Repeat Steps 2 to 5 until \(|| \tilde{\mathbf {g}}(\theta ^{(t)}) ||^{2} \le \delta \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuroda, M., Geng, Z. & Sakakihara, M. Improving the vector \(\varepsilon \) acceleration for the EM algorithm using a re-starting procedure. Comput Stat 30, 1051–1077 (2015). https://doi.org/10.1007/s00180-015-0565-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-015-0565-y

Keywords

Navigation