Robust mixture modeling based on scale mixtures of skew-normal distributions

https://doi.org/10.1016/j.csda.2009.09.031Get rights and content

Abstract

A flexible class of probability distributions, convenient for modeling data with skewness behavior, discrepant observations and population heterogeneity is presented. The elements of this family are convex linear combinations of densities that are scale mixtures of skew-normal distributions. An EM-type algorithm for maximum likelihood estimation is developed and the observed information matrix is obtained. These procedures are discussed with emphasis on finite mixtures of skew-normal, skew-t, skew-slash and skew contaminated normal distributions. In order to examine the performance of the proposed methods, some simulation studies are presented to show the advantage of this flexible class in clustering heterogeneous data and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. A real data set is analyzed, illustrating the usefulness of the proposed methodology.

Introduction

Finite mixtures of distributions, that is, convex linear combination of densities (known as the mixture components), have been widely used as a powerful tool to model heterogeneous data and to approximate complicated probability densities, presenting multimodality, skewness and heavy tails. These models have been applied in several areas like genetics, image processing, medicine and economics. Comprehensive surveys are available in Böhning (2000), McLachlan and Peel (2000) and, from a Bayesian point of view, in Frühwirth-Schnatter (2006).

The literature on maximum likelihood estimation of the parameters of the normal and Student-t mixture models–hereafter the FM-NOR and the FM-T models, respectively–is very extensive; see McLachlan and Peel (2000) and the references herein, Peel and McLachlan (2000), Nityasuddhi and Böhning (2003), Biernacki et al. (2003) and Dias and Wedel (2004), for example. The standard algorithm in this case is the so-called EM (Expectation-Maximization) of Dempster et al. (1977), or maybe some extension like the ECM (Meng and Rubin, 1993) or the ECME (Liu and Rubin, 1994) algorithms. For a good review, including applications in finite mixture models, see McLachlan and Krishnan (2008).

It is well known that robustness is achieved by modeling the outlier using the Student-t distribution. Finite mixtures of these distributions are useful when there is, besides discrepant observations, unobserved heterogeneity. Here, we suggest a class of models to deal with extra skewness, extending the work of Lin et al. (2007b) and Lin et al. (2007a), where finite mixtures of skew-normal (Azzalini, 1985, SN) and skew-Student-t (Azzalini and Capitanio, 2003, ST) distributions are investigated, respectively. The mixture components distributions are assumed to follow a flexible class of scale mixtures of skew-normal distributions — hereafter SMSN, presented by Branco and Dey (2001). This class contains the entire family of scale mixtures of normal distributions (Andrews and Mallows, 1974). In addition, the skew-normal and skewed versions of some other classical symmetric distributions are SMSN members: the skew-t, the skew-slash (SSL) and the skew contaminated normal (SCN), for example. These distributions have heavier tails than the skew-normal (and the normal) one, and thus they seem to be a reasonable choice for robust inference.

The remainder of the paper is organized as follows. In Section 2, for the sake of completeness, we present some properties of the univariate SMSN family and the related EM-type algorithm for maximum likelihood estimation. In Section 3 we propose a finite mixture of scale mixtures of skew-normal distributions (FM-SMSN) and an EM-type algorithm for maximum likelihood estimation. The associated observed information matrix is obtained analytically in Section 4. In Section 5 we present a simulation study to show that the proposed models are robust in terms of clustering heterogeneous data and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. Additionally, we report some model selection criteria via simulation. The methodology proposed is illustrated in Section 6, considering the analysis of a real data set.

Section snippets

Preliminaries

First, we make some remarks about the class of scale mixtures of skew-normal distributions, as introduced by Branco and Dey (2001); see also Arellano-Valle et al. (2006).

As defined by Azzalini (1985), a random variable Z has skew-normal distribution with location parameter μ, scale parameter σ2 and skewness parameter λ, if its density is given by ψ(z)=2ϕ(z;μ,σ2)Φ(λ(zμ)σ), where ϕ(;μ,σ2) denotes the density of the univariate normal distribution with mean μ and variance σ2>0 and Φ() is the

The model

The finite mixture of SMSN distributions model (FM-SMSN) is defined by considering a random sample y=(y1,,yn) from a g-component mixture of SMSN densities given by f(yi;Θ)=j=1gpjψ(yi;θj),pj0,j=1gpj=1,i=1,,n,j=1,,g, where θj=(μj,σj2,λj,νj) is the specific vector of parameters for the component j, ψ(;θj) is the SMSN(θj) density, p1,,pg are the mixing probabilities and Θ=((p1,,pg),θ1,,θg) is the vector with all parameters. Concerning the parameter νj of the mixing distribution H(.)

The observed information matrix

In this section we obtain the observed information matrix of the FM-SMSN model, defined as Jo(Θ|y)=2(Θ|y)/ΘΘ. It is well known that, under some regularity conditions, the covariance matrix of the maximum likelihood estimates Θ̂ can be approximated by the inverse of Jo(Θ|y). Following Basford et al. (1997) and Lin et al. (2007b), we evaluate Jo(Θ̂|y)=i=1nŝiŝi, where ŝi=(logj=1gpjψ(yi;θj))/Θ|Θ=Θ̂. We consider now the vector ŝi which is partitioned into components corresponding to

Simulation study

In order to examine the performance of the proposed method, we present some simulation studies. The first simulation study shows that the underlying FM-SMSN models are robust in the ability to cluster heterogeneous data. The second simulation study shows that our proposed ECME algorithm estimates do provide good asymptotic properties. In the third study we compare some model selection criteria.

Application — The BMI data

As an application of the methodology proposed in this work, we consider the body mass index for men aged between 18 to 80 years. The data set comes from the National Health and Nutrition Examination Survey, made by the National Center for Health Statistics (NCHS) of the Center for Disease Control (CDC) in the USA. The problem of obesity has attracted attention in the last few years due to its strong relationship with many chronic diseases. Body mass index (BMI, kg/m2) has become the standard

Final conclusion

In this work we have proposed a robust approach to finite mixture modeling based on scale mixtures of skew-normal distributions. Our proposed model generalizes the recent works of Lin et al., 2007a, Lin et al., 2007b. This generalized robust model simultaneously accommodates multimodality, asymmetry and heavy tails, thus allowing practitioners from different areas to analyze data in an extremely flexible way. An ECME algorithm is developed by exploring the statistical properties of the class

Acknowledgments

The authors would like to thank the Associate Editor and two anonymous referees for their useful comments which substantially improved the quality of this paper. The second author acknowledges the partial financial support from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) and CNPq-Brazil. The third author acknowledges the partial financial support from CNPq and CAPES-Brazil.

References (36)

  • A. Azzalini

    The skew-normal distribution and related multivariate families

    Scandinavian Journal of Statistics

    (2005)
  • A. Azzalini et al.

    Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution

    Journal of the Royal Statistical Society, Series B

    (2003)
  • Z.D. Bai et al.

    On rates of convergence of efficient detection criteria in signalprocessing with white noise

    IEEE Transactions on Information Theory

    (1989)
  • K.E. Basford et al.

    Standard errors of fitted component means of normal mixtures

    Computational Statistics

    (1997)
  • C. Biernacki et al.

    Assessing a mixture model for clustering with the integratedcompleted likelihood

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2000)
  • D. Böhning

    Computer-Assisted Analysis of Mixtures and Applications

    (2000)
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    Journal of the Royal Statistical Society, Series B

    (1977)
  • J.G. Dias et al.

    An empirical comparison of EM, SEM and MCMC performance for problematic gaussian mixture likelihoods

    Statistics and Computing

    (2004)
  • Cited by (122)

    • Model-based clustering using a new multivariate skew distribution

      2024, Advances in Data Analysis and Classification
    View all citing articles on Scopus
    View full text