Weighted likelihood mixture modeling and model-based clustering

Greco, Luca; Agostinelli, Claudio

doi:10.1007/s11222-019-09881-1

Weighted likelihood mixture modeling and model-based clustering

Published: 10 June 2019

Volume 30, pages 255–277, (2020)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

620 Accesses
Explore all metrics

Abstract

A weighted likelihood approach for robust fitting of a mixture of multivariate Gaussian components is developed in this work. Two approaches have been proposed that are driven by a suitable modification of the standard EM and CEM algorithms, respectively. In both techniques, the M-step is enhanced by the computation of weights aimed at downweighting outliers. The weights are based on Pearson residuals stemming from robust Mahalanobis-type distances. Formal rules for robust clustering and outlier detection can be also defined based on the fitted mixture model. The behavior of the proposed methodologies has been investigated by numerical studies and real data examples in terms of both fitting and classification accuracy and outlier detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted likelihood latent class linear regression

Article 23 July 2020

Finding Outliers in Gaussian Model-based Clustering

Article 30 May 2024

Estimation and computations for Gaussian mixtures with uniform noise under separation constraints

Article Open access 25 July 2021

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Agostinelli, C.: Robust model selection in regression via weighted likelihood methodology. Stat. Probab. Lett. 56(3), 289–300 (2002)
Article MathSciNet Google Scholar
Agostinelli, C.: Notes on pearson residuals and weighted likelihood estimating equations. Stat. Probab. Lett. 76(17), 1930–1934 (2006)
Article MathSciNet Google Scholar
Agostinelli, C., Greco, L.: A weighted strategy to handle likelihood uncertainty in Bayesian inference. Comput. Stat. 28(1), 319–339 (2013)
Article MathSciNet Google Scholar
Agostinelli, C., Greco, L.: Discussion on “The power of monitoring: how to make the most of a contaminated sample”. Stat. Methods Appl. (2017). https://doi.org/10.1007/s10260-017-0416-9
Article MATH Google Scholar
Agostinelli, C., Greco, L.: Weighted likelihood estimation of multivariate location and scatter. Test (2018). https://doi.org/10.1007/s11749-018-0596-0
Article MATH Google Scholar
Atkinson, A., Riani, M., Cerioli, A.: Exploring Multivariate Data with the Forward Search. Springer, Berlin (2013)
MATH Google Scholar
Basu, A., Lindsay, B.: Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Stat. Math. 46(4), 683–705 (1994)
Article MathSciNet Google Scholar
Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)
Article MathSciNet Google Scholar
Bryant, P.: Large-sample results for optimization-based clustering methods. J. Classif. 8(1), 31–44 (1991)
Article MathSciNet Google Scholar
Campbell, N.: Mixture models and atypical values. Math. Geol. 16(5), 465–477 (1984)
Article MathSciNet Google Scholar
Celeux, G., Govaert, G.: Comparison of the mixture and the classification maximum likelihood in cluster analysis. J. Stat. Comput. Simul. 47(3–4), 127–146 (1993)
Article Google Scholar
Cerioli, A.: Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc. 105(489), 147–156 (2010)
Article MathSciNet Google Scholar
Cerioli, A., Farcomeni, A.: Error rates for multivariate outlier detection. Comput. Stat. Data Anal. 55(1), 544–553 (2011)
Article MathSciNet Google Scholar
Cerioli, A., Riani, M., Atkinson, A., Corbellini, A.: The power of monitoring: how to make the most of a contaminated sample. Stat. Methods Appl. (2017). https://doi.org/10.1007/s10260-017-0409-8
Article MATH Google Scholar
Colonna, J.G., Gama, J., Nakamura, E.: Recognizing Family, Genus, and Species of Anuran Using a Hierarchical Classification Approach. Lecture Notes in Computer Science, pp. 198–212. Springer, Berlin (2016)
Google Scholar
Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust gaussian clustering. J. Am. Stat. Assoc. 111(516), 1648–1659 (2016)
Article MathSciNet Google Scholar
Coretto, P., Hennig, C.: Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. J. Mach. Learn. Res. 18(1), 5199–5237 (2017)
MathSciNet MATH Google Scholar
Day, N.: Estimating the components of a mixture of normal distributions. Biometrika 56(3), 463–474 (1969)
Article MathSciNet Google Scholar
Dempster, A., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Dotto, F., Farcomeni, A.: Robust inference for parsimonious model-based clustering. J. Stat. Comput. Simul. 89(3), 414–442 (2019)
Article MathSciNet Google Scholar
Dotto, F., Farcomeni, A., Garcia-Escudero, L.A., Mayo-Iscar, A.: A reweighting approach to robust clustering. Stat. Comput. 28(2), 477–493 (2016)
Article MathSciNet Google Scholar
Elashoff, M., Ryan, L.: An em algorithm for estimating equations. J. Comput. Graph. Stat. 13(1), 48–65 (2004)
Article MathSciNet Google Scholar
Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. CRC Press, Boca Raton (2015a)
MATH Google Scholar
Farcomeni, A., Greco, L.: S-estimation of hidden Markov models. Comput. Stat. 30(1), 57–80 (2015b)
Article MathSciNet Google Scholar
Fraley, C., Raftery, A.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)
Article Google Scholar
Fraley, C., Raftery, A.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)
Article MathSciNet Google Scholar
Fraley, C., Raftery, A., Murphy, T., Scrucca, L.: mclust version 4 for r: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, University of Washington, Seattle (2012)
Fritz, H., Garcia-Escudero, L., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124–136 (2013)
Article MathSciNet Google Scholar
Garcia-Escudero, L., Gordaliza, A., Matran, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)
Article MathSciNet Google Scholar
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21(4), 585–599 (2011)
Article MathSciNet Google Scholar
Garcia-Escudero, L., Gordaliza, A., Matran, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 25(3), 619–633 (2015)
Article MathSciNet Google Scholar
Greco, L.: Weighted likelihood based inference for $p (x< y)$. Commun. Stat. Simul. Comput. 46(10), 7777–7789 (2017)
Article MathSciNet Google Scholar
Helliwell, J., Layard, R., Sachs, J.: World Happiness Report 2018 (2018)
Kuchibhotla, A., Basu, A.: A general set up for minimum disparity estimation. Stat. Probab. Lett. 96, 68–74 (2015)
Article MathSciNet Google Scholar
Kuchibhotla, A., Basu, A.: A minimum distance weighted likelihood method of estimation. Technical report, Interdisciplinary Statistical Research Unit (ISRU), Indian Statistical Institute, Kolkata, India (2018). https://faculty.wharton.upenn.edu/wp-content/uploads/2018/02/attemptv4p1.pdf. Accessed 17 Jan 2018
Lee, S., McLachlan, G.: Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat. Comput. 24(2), 181–202 (2014)
Article MathSciNet Google Scholar
Lin, T.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20(3), 343–356 (2010)
Article MathSciNet Google Scholar
Markatou, M.: Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56(2), 483–486 (2000)
Article Google Scholar
Markatou, M., Basu, A., Lindsay, B.G.: Weighted likelihood equations with bootstrap root search. J. Am. Stat. Assoc. 93(442), 740–750 (1998)
Article MathSciNet Google Scholar
Maronna, R., Jacovkis, P.: Multivariate clustering procedures with variable metrics. Biometrics 30(3), 499–505 (1974)
Article Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2004)
MATH Google Scholar
McLachlan, G.J., Peel, D., Bean, R.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41(3–4), 379–388 (2003)
Article MathSciNet Google Scholar
Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52(1), 299–308 (2007)
Article MathSciNet Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019). https://www.R-project.org/
Rousseeuw, P., Van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)
Article Google Scholar
Symon, M.: Clustering criterion and multi-variate normal mixture. Biometrics 77, 35–43 (1977)
Google Scholar

Download references

Acknowledgements

The authors are grateful to the coordinating editor and two anonymous referees for their valuable suggestions.

Author information

Authors and Affiliations

DEMM Department, University of Sannio, Benevento, Italy
Luca Greco
Department of Mathematics, University of Trento, Trento, Italy
Claudio Agostinelli

Authors

Luca Greco
View author publications
You can also search for this author inPubMed Google Scholar
Claudio Agostinelli
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Luca Greco.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Greco, L., Agostinelli, C. Weighted likelihood mixture modeling and model-based clustering. Stat Comput 30, 255–277 (2020). https://doi.org/10.1007/s11222-019-09881-1

Download citation

Received: 08 October 2018
Accepted: 04 June 2019
Published: 10 June 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11222-019-09881-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted likelihood mixture modeling and model-based clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Weighted likelihood latent class linear regression

Finding Outliers in Gaussian Model-based Clustering

Estimation and computations for Gaussian mixtures with uniform noise under separation constraints

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now