Abstract
A weighted likelihood approach for robust fitting of a mixture of multivariate Gaussian components is developed in this work. Two approaches have been proposed that are driven by a suitable modification of the standard EM and CEM algorithms, respectively. In both techniques, the M-step is enhanced by the computation of weights aimed at downweighting outliers. The weights are based on Pearson residuals stemming from robust Mahalanobis-type distances. Formal rules for robust clustering and outlier detection can be also defined based on the fitted mixture model. The behavior of the proposed methodologies has been investigated by numerical studies and real data examples in terms of both fitting and classification accuracy and outlier detection.










Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Agostinelli, C.: Robust model selection in regression via weighted likelihood methodology. Stat. Probab. Lett. 56(3), 289–300 (2002)
Agostinelli, C.: Notes on pearson residuals and weighted likelihood estimating equations. Stat. Probab. Lett. 76(17), 1930–1934 (2006)
Agostinelli, C., Greco, L.: A weighted strategy to handle likelihood uncertainty in Bayesian inference. Comput. Stat. 28(1), 319–339 (2013)
Agostinelli, C., Greco, L.: Discussion on “The power of monitoring: how to make the most of a contaminated sample”. Stat. Methods Appl. (2017). https://doi.org/10.1007/s10260-017-0416-9
Agostinelli, C., Greco, L.: Weighted likelihood estimation of multivariate location and scatter. Test (2018). https://doi.org/10.1007/s11749-018-0596-0
Atkinson, A., Riani, M., Cerioli, A.: Exploring Multivariate Data with the Forward Search. Springer, Berlin (2013)
Basu, A., Lindsay, B.: Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Stat. Math. 46(4), 683–705 (1994)
Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)
Bryant, P.: Large-sample results for optimization-based clustering methods. J. Classif. 8(1), 31–44 (1991)
Campbell, N.: Mixture models and atypical values. Math. Geol. 16(5), 465–477 (1984)
Celeux, G., Govaert, G.: Comparison of the mixture and the classification maximum likelihood in cluster analysis. J. Stat. Comput. Simul. 47(3–4), 127–146 (1993)
Cerioli, A.: Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc. 105(489), 147–156 (2010)
Cerioli, A., Farcomeni, A.: Error rates for multivariate outlier detection. Comput. Stat. Data Anal. 55(1), 544–553 (2011)
Cerioli, A., Riani, M., Atkinson, A., Corbellini, A.: The power of monitoring: how to make the most of a contaminated sample. Stat. Methods Appl. (2017). https://doi.org/10.1007/s10260-017-0409-8
Colonna, J.G., Gama, J., Nakamura, E.: Recognizing Family, Genus, and Species of Anuran Using a Hierarchical Classification Approach. Lecture Notes in Computer Science, pp. 198–212. Springer, Berlin (2016)
Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust gaussian clustering. J. Am. Stat. Assoc. 111(516), 1648–1659 (2016)
Coretto, P., Hennig, C.: Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. J. Mach. Learn. Res. 18(1), 5199–5237 (2017)
Day, N.: Estimating the components of a mixture of normal distributions. Biometrika 56(3), 463–474 (1969)
Dempster, A., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. 39, 1–38 (1977)
Dotto, F., Farcomeni, A.: Robust inference for parsimonious model-based clustering. J. Stat. Comput. Simul. 89(3), 414–442 (2019)
Dotto, F., Farcomeni, A., Garcia-Escudero, L.A., Mayo-Iscar, A.: A reweighting approach to robust clustering. Stat. Comput. 28(2), 477–493 (2016)
Elashoff, M., Ryan, L.: An em algorithm for estimating equations. J. Comput. Graph. Stat. 13(1), 48–65 (2004)
Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. CRC Press, Boca Raton (2015a)
Farcomeni, A., Greco, L.: S-estimation of hidden Markov models. Comput. Stat. 30(1), 57–80 (2015b)
Fraley, C., Raftery, A.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)
Fraley, C., Raftery, A.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)
Fraley, C., Raftery, A., Murphy, T., Scrucca, L.: mclust version 4 for r: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, University of Washington, Seattle (2012)
Fritz, H., Garcia-Escudero, L., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124–136 (2013)
Garcia-Escudero, L., Gordaliza, A., Matran, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21(4), 585–599 (2011)
Garcia-Escudero, L., Gordaliza, A., Matran, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 25(3), 619–633 (2015)
Greco, L.: Weighted likelihood based inference for \(p (x< y)\). Commun. Stat. Simul. Comput. 46(10), 7777–7789 (2017)
Helliwell, J., Layard, R., Sachs, J.: World Happiness Report 2018 (2018)
Kuchibhotla, A., Basu, A.: A general set up for minimum disparity estimation. Stat. Probab. Lett. 96, 68–74 (2015)
Kuchibhotla, A., Basu, A.: A minimum distance weighted likelihood method of estimation. Technical report, Interdisciplinary Statistical Research Unit (ISRU), Indian Statistical Institute, Kolkata, India (2018). https://faculty.wharton.upenn.edu/wp-content/uploads/2018/02/attemptv4p1.pdf. Accessed 17 Jan 2018
Lee, S., McLachlan, G.: Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat. Comput. 24(2), 181–202 (2014)
Lin, T.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20(3), 343–356 (2010)
Markatou, M.: Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56(2), 483–486 (2000)
Markatou, M., Basu, A., Lindsay, B.G.: Weighted likelihood equations with bootstrap root search. J. Am. Stat. Assoc. 93(442), 740–750 (1998)
Maronna, R., Jacovkis, P.: Multivariate clustering procedures with variable metrics. Biometrics 30(3), 499–505 (1974)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2004)
McLachlan, G.J., Peel, D., Bean, R.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41(3–4), 379–388 (2003)
Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52(1), 299–308 (2007)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019). https://www.R-project.org/
Rousseeuw, P., Van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)
Symon, M.: Clustering criterion and multi-variate normal mixture. Biometrics 77, 35–43 (1977)
Acknowledgements
The authors are grateful to the coordinating editor and two anonymous referees for their valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Greco, L., Agostinelli, C. Weighted likelihood mixture modeling and model-based clustering. Stat Comput 30, 255–277 (2020). https://doi.org/10.1007/s11222-019-09881-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-019-09881-1
Keywords
- Classification
- EM
- Mixture
- Multivariate normal
- Outlier detection
- Pearson residuals
- Robustness
- Weighted likelihood