Skip to main content

Advertisement

Log in

Weighted likelihood mixture modeling and model-based clustering

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A weighted likelihood approach for robust fitting of a mixture of multivariate Gaussian components is developed in this work. Two approaches have been proposed that are driven by a suitable modification of the standard EM and CEM algorithms, respectively. In both techniques, the M-step is enhanced by the computation of weights aimed at downweighting outliers. The weights are based on Pearson residuals stemming from robust Mahalanobis-type distances. Formal rules for robust clustering and outlier detection can be also defined based on the fitted mixture model. The behavior of the proposed methodologies has been investigated by numerical studies and real data examples in terms of both fitting and classification accuracy and outlier detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  • Agostinelli, C.: Robust model selection in regression via weighted likelihood methodology. Stat. Probab. Lett. 56(3), 289–300 (2002)

    Article  MathSciNet  Google Scholar 

  • Agostinelli, C.: Notes on pearson residuals and weighted likelihood estimating equations. Stat. Probab. Lett. 76(17), 1930–1934 (2006)

    Article  MathSciNet  Google Scholar 

  • Agostinelli, C., Greco, L.: A weighted strategy to handle likelihood uncertainty in Bayesian inference. Comput. Stat. 28(1), 319–339 (2013)

    Article  MathSciNet  Google Scholar 

  • Agostinelli, C., Greco, L.: Discussion on “The power of monitoring: how to make the most of a contaminated sample”. Stat. Methods Appl. (2017). https://doi.org/10.1007/s10260-017-0416-9

    Article  MATH  Google Scholar 

  • Agostinelli, C., Greco, L.: Weighted likelihood estimation of multivariate location and scatter. Test (2018). https://doi.org/10.1007/s11749-018-0596-0

    Article  MATH  Google Scholar 

  • Atkinson, A., Riani, M., Cerioli, A.: Exploring Multivariate Data with the Forward Search. Springer, Berlin (2013)

    MATH  Google Scholar 

  • Basu, A., Lindsay, B.: Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Stat. Math. 46(4), 683–705 (1994)

    Article  MathSciNet  Google Scholar 

  • Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)

    Article  MathSciNet  Google Scholar 

  • Bryant, P.: Large-sample results for optimization-based clustering methods. J. Classif. 8(1), 31–44 (1991)

    Article  MathSciNet  Google Scholar 

  • Campbell, N.: Mixture models and atypical values. Math. Geol. 16(5), 465–477 (1984)

    Article  MathSciNet  Google Scholar 

  • Celeux, G., Govaert, G.: Comparison of the mixture and the classification maximum likelihood in cluster analysis. J. Stat. Comput. Simul. 47(3–4), 127–146 (1993)

    Article  Google Scholar 

  • Cerioli, A.: Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc. 105(489), 147–156 (2010)

    Article  MathSciNet  Google Scholar 

  • Cerioli, A., Farcomeni, A.: Error rates for multivariate outlier detection. Comput. Stat. Data Anal. 55(1), 544–553 (2011)

    Article  MathSciNet  Google Scholar 

  • Cerioli, A., Riani, M., Atkinson, A., Corbellini, A.: The power of monitoring: how to make the most of a contaminated sample. Stat. Methods Appl. (2017). https://doi.org/10.1007/s10260-017-0409-8

    Article  MATH  Google Scholar 

  • Colonna, J.G., Gama, J., Nakamura, E.: Recognizing Family, Genus, and Species of Anuran Using a Hierarchical Classification Approach. Lecture Notes in Computer Science, pp. 198–212. Springer, Berlin (2016)

    Google Scholar 

  • Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust gaussian clustering. J. Am. Stat. Assoc. 111(516), 1648–1659 (2016)

    Article  MathSciNet  Google Scholar 

  • Coretto, P., Hennig, C.: Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. J. Mach. Learn. Res. 18(1), 5199–5237 (2017)

    MathSciNet  MATH  Google Scholar 

  • Day, N.: Estimating the components of a mixture of normal distributions. Biometrika 56(3), 463–474 (1969)

    Article  MathSciNet  Google Scholar 

  • Dempster, A., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Dotto, F., Farcomeni, A.: Robust inference for parsimonious model-based clustering. J. Stat. Comput. Simul. 89(3), 414–442 (2019)

    Article  MathSciNet  Google Scholar 

  • Dotto, F., Farcomeni, A., Garcia-Escudero, L.A., Mayo-Iscar, A.: A reweighting approach to robust clustering. Stat. Comput. 28(2), 477–493 (2016)

    Article  MathSciNet  Google Scholar 

  • Elashoff, M., Ryan, L.: An em algorithm for estimating equations. J. Comput. Graph. Stat. 13(1), 48–65 (2004)

    Article  MathSciNet  Google Scholar 

  • Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. CRC Press, Boca Raton (2015a)

    MATH  Google Scholar 

  • Farcomeni, A., Greco, L.: S-estimation of hidden Markov models. Comput. Stat. 30(1), 57–80 (2015b)

    Article  MathSciNet  Google Scholar 

  • Fraley, C., Raftery, A.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)

    Article  Google Scholar 

  • Fraley, C., Raftery, A.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)

    Article  MathSciNet  Google Scholar 

  • Fraley, C., Raftery, A., Murphy, T., Scrucca, L.: mclust version 4 for r: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, University of Washington, Seattle (2012)

  • Fritz, H., Garcia-Escudero, L., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124–136 (2013)

    Article  MathSciNet  Google Scholar 

  • Garcia-Escudero, L., Gordaliza, A., Matran, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)

    Article  MathSciNet  Google Scholar 

  • García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21(4), 585–599 (2011)

    Article  MathSciNet  Google Scholar 

  • Garcia-Escudero, L., Gordaliza, A., Matran, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 25(3), 619–633 (2015)

    Article  MathSciNet  Google Scholar 

  • Greco, L.: Weighted likelihood based inference for \(p (x< y)\). Commun. Stat. Simul. Comput. 46(10), 7777–7789 (2017)

    Article  MathSciNet  Google Scholar 

  • Helliwell, J., Layard, R., Sachs, J.: World Happiness Report 2018 (2018)

  • Kuchibhotla, A., Basu, A.: A general set up for minimum disparity estimation. Stat. Probab. Lett. 96, 68–74 (2015)

    Article  MathSciNet  Google Scholar 

  • Kuchibhotla, A., Basu, A.: A minimum distance weighted likelihood method of estimation. Technical report, Interdisciplinary Statistical Research Unit (ISRU), Indian Statistical Institute, Kolkata, India (2018). https://faculty.wharton.upenn.edu/wp-content/uploads/2018/02/attemptv4p1.pdf. Accessed 17 Jan 2018

  • Lee, S., McLachlan, G.: Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat. Comput. 24(2), 181–202 (2014)

    Article  MathSciNet  Google Scholar 

  • Lin, T.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20(3), 343–356 (2010)

    Article  MathSciNet  Google Scholar 

  • Markatou, M.: Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56(2), 483–486 (2000)

    Article  Google Scholar 

  • Markatou, M., Basu, A., Lindsay, B.G.: Weighted likelihood equations with bootstrap root search. J. Am. Stat. Assoc. 93(442), 740–750 (1998)

    Article  MathSciNet  Google Scholar 

  • Maronna, R., Jacovkis, P.: Multivariate clustering procedures with variable metrics. Biometrics 30(3), 499–505 (1974)

    Article  Google Scholar 

  • McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2004)

    MATH  Google Scholar 

  • McLachlan, G.J., Peel, D., Bean, R.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41(3–4), 379–388 (2003)

    Article  MathSciNet  Google Scholar 

  • Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52(1), 299–308 (2007)

    Article  MathSciNet  Google Scholar 

  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019). https://www.R-project.org/

  • Rousseeuw, P., Van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)

    Article  Google Scholar 

  • Symon, M.: Clustering criterion and multi-variate normal mixture. Biometrics 77, 35–43 (1977)

    Google Scholar 

Download references

Acknowledgements

The authors are grateful to the coordinating editor and two anonymous referees for their valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Greco.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Greco, L., Agostinelli, C. Weighted likelihood mixture modeling and model-based clustering. Stat Comput 30, 255–277 (2020). https://doi.org/10.1007/s11222-019-09881-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-019-09881-1

Keywords

Mathematics Subject Classification