Skip to main content
Log in

Generalized EM estimation for semi-parametric mixture distributions with discretized non-parametric component

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We consider independent sampling from a two-component mixture distribution, where one component (called the parametric component) is from a known distributional family and the other component (called the non-parametric component) is unknown. This is a semi-parametric mixture distribution. We discretize the non-parametric component and estimate the parameters of this mixture model, namely the mixing proportion, the unknown parameters of the parametric component and the discretized non-parametric component. We define the maximum penalized likelihood (MPL) estimates of the mixture model parameters and then develop a generalized EM (GEM) iterative scheme to compute the MPL estimates. A simulation study and an example from biology are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bordes, L., Delmas, C., Vandekerkhove, P.: Semiparametric estimation of a two-component mixture model where one component is known. Scand. J. Stat. 33, 733–752 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Bordes, L., Chauveau, D., Vandekerkhove, P.: A stochastic EM algorithm for a semiparametric mixture model. Comput. Stat. Data Anal. 51, 5429–5443 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Cavalier-Smith, T.: Selfish DNA and the origin of introns. Nature 315, 283–284 (1985)

    Article  Google Scholar 

  • Cho, G., Doolittle, R.F.: Intron distribution in ancient paralogs supports random insertion and not random loss. J. Mol. Evol. 44, 573–584 (1997)

    Article  Google Scholar 

  • Cruz-Medina, I.R., Hettmansperger, T.P.: Nonparametric estimation in semi-parametric univariate mixture models. J. Stat. Comput. Simul. 74, 513–524 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • De Souza, S.J., Long, M., Klein, R.J., Roy, S., Lin, S., Gilbert, W.: Toward a resolution of the introns early/late debate: Only phase zero introns are correlated with the structure of ancient proteins. Proc. Natl. Acad. Sci. USA 95, 5094–5099 (1998)

    Article  Google Scholar 

  • Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  • Gudlaugsdottir, S., Boswell, D.R., Wood, G.R., Ma, J.: Exon size distribution and the origin of introns. Genetica 131, 299–306 (2007)

    Article  Google Scholar 

  • Hall, P., Zhou, X.H.: Nonparametric estimation of component distributions in a multivariate mixture. Ann. Stat. 31, 201–224 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Lindsay, B.G., Lesperance, M.L.: A review of semiparametric mixture models. J. Stat. Plan. Inference 47, 29–39 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  • Logsdon, J.M., Palmer, J.D.: Origin of introns—early or late? Nature 369, 526 (1994)

    Article  Google Scholar 

  • Long, M., Rosenberg, C., Gilbert, W.: Intron phase correlations and the evolution of the intron/exon structure of genes. Proc. Natl. Acad. Sci. USA 92, 12495–12499 (1995)

    Article  Google Scholar 

  • Luenberger, D.: Linear and Nonlinear Programming, 2nd edn. Wiley, New York (1984)

    MATH  Google Scholar 

  • Ma, J.: Multiplicative algorithms for maximum penalized likelihood inversion with nonnegative constraints and generalized error distributions. Commun. Stat., Theory Methods 35, 831–848 (2006)

    Article  MATH  Google Scholar 

  • McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  • Olkin, I., Spiegelman, C.H.: A semiparametric approach to density estimation. J. Am. Stat. Assoc. 82, 858–865 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  • Ortega, J.M., Rheinboldt, W.C.: Iterative Solutions of Nonlinear Equations in Several Variables. Academic Press, New York (1970)

    Google Scholar 

  • Roy, S.W., Nosaka, M., de Souza, S.J., Gilbert, W.: Centripetal modules and ancient introns. Gene 238, 85–91 (1999)

    Article  Google Scholar 

  • Tikhonov, T., Arsenin, V.: Solutions of Ill-Posed Problems. Wiley, New York (1977)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, J., Gudlaugsdottir, S. & Wood, G. Generalized EM estimation for semi-parametric mixture distributions with discretized non-parametric component. Stat Comput 21, 601–612 (2011). https://doi.org/10.1007/s11222-010-9195-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-010-9195-y

Keywords

Navigation