Abstract
We consider independent sampling from a two-component mixture distribution, where one component (called the parametric component) is from a known distributional family and the other component (called the non-parametric component) is unknown. This is a semi-parametric mixture distribution. We discretize the non-parametric component and estimate the parameters of this mixture model, namely the mixing proportion, the unknown parameters of the parametric component and the discretized non-parametric component. We define the maximum penalized likelihood (MPL) estimates of the mixture model parameters and then develop a generalized EM (GEM) iterative scheme to compute the MPL estimates. A simulation study and an example from biology are presented.
Similar content being viewed by others
References
Bordes, L., Delmas, C., Vandekerkhove, P.: Semiparametric estimation of a two-component mixture model where one component is known. Scand. J. Stat. 33, 733–752 (2006)
Bordes, L., Chauveau, D., Vandekerkhove, P.: A stochastic EM algorithm for a semiparametric mixture model. Comput. Stat. Data Anal. 51, 5429–5443 (2007)
Cavalier-Smith, T.: Selfish DNA and the origin of introns. Nature 315, 283–284 (1985)
Cho, G., Doolittle, R.F.: Intron distribution in ancient paralogs supports random insertion and not random loss. J. Mol. Evol. 44, 573–584 (1997)
Cruz-Medina, I.R., Hettmansperger, T.P.: Nonparametric estimation in semi-parametric univariate mixture models. J. Stat. Comput. Simul. 74, 513–524 (2004)
De Souza, S.J., Long, M., Klein, R.J., Roy, S., Lin, S., Gilbert, W.: Toward a resolution of the introns early/late debate: Only phase zero introns are correlated with the structure of ancient proteins. Proc. Natl. Acad. Sci. USA 95, 5094–5099 (1998)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)
Gudlaugsdottir, S., Boswell, D.R., Wood, G.R., Ma, J.: Exon size distribution and the origin of introns. Genetica 131, 299–306 (2007)
Hall, P., Zhou, X.H.: Nonparametric estimation of component distributions in a multivariate mixture. Ann. Stat. 31, 201–224 (2003)
Lindsay, B.G., Lesperance, M.L.: A review of semiparametric mixture models. J. Stat. Plan. Inference 47, 29–39 (1995)
Logsdon, J.M., Palmer, J.D.: Origin of introns—early or late? Nature 369, 526 (1994)
Long, M., Rosenberg, C., Gilbert, W.: Intron phase correlations and the evolution of the intron/exon structure of genes. Proc. Natl. Acad. Sci. USA 92, 12495–12499 (1995)
Luenberger, D.: Linear and Nonlinear Programming, 2nd edn. Wiley, New York (1984)
Ma, J.: Multiplicative algorithms for maximum penalized likelihood inversion with nonnegative constraints and generalized error distributions. Commun. Stat., Theory Methods 35, 831–848 (2006)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Olkin, I., Spiegelman, C.H.: A semiparametric approach to density estimation. J. Am. Stat. Assoc. 82, 858–865 (1987)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solutions of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
Roy, S.W., Nosaka, M., de Souza, S.J., Gilbert, W.: Centripetal modules and ancient introns. Gene 238, 85–91 (1999)
Tikhonov, T., Arsenin, V.: Solutions of Ill-Posed Problems. Wiley, New York (1977)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ma, J., Gudlaugsdottir, S. & Wood, G. Generalized EM estimation for semi-parametric mixture distributions with discretized non-parametric component. Stat Comput 21, 601–612 (2011). https://doi.org/10.1007/s11222-010-9195-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-010-9195-y