Robust mixture modeling based on scale mixtures of skew-normal distributions

doi:10.1016/j.csda.2009.09.031

Computational Statistics & Data Analysis

Volume 54, Issue 12, 1 December 2010, Pages 2926-2941

https://doi.org/10.1016/j.csda.2009.09.031 Get rights and content

Abstract

A flexible class of probability distributions, convenient for modeling data with skewness behavior, discrepant observations and population heterogeneity is presented. The elements of this family are convex linear combinations of densities that are scale mixtures of skew-normal distributions. An EM-type algorithm for maximum likelihood estimation is developed and the observed information matrix is obtained. These procedures are discussed with emphasis on finite mixtures of skew-normal, skew-t, skew-slash and skew contaminated normal distributions. In order to examine the performance of the proposed methods, some simulation studies are presented to show the advantage of this flexible class in clustering heterogeneous data and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. A real data set is analyzed, illustrating the usefulness of the proposed methodology.

Introduction

Finite mixtures of distributions, that is, convex linear combination of densities (known as the mixture components), have been widely used as a powerful tool to model heterogeneous data and to approximate complicated probability densities, presenting multimodality, skewness and heavy tails. These models have been applied in several areas like genetics, image processing, medicine and economics. Comprehensive surveys are available in Böhning (2000), McLachlan and Peel (2000) and, from a Bayesian point of view, in Frühwirth-Schnatter (2006).

The literature on maximum likelihood estimation of the parameters of the normal and Student-t mixture models–hereafter the FM-NOR and the FM-T models, respectively–is very extensive; see McLachlan and Peel (2000) and the references herein, Peel and McLachlan (2000), Nityasuddhi and Böhning (2003), Biernacki et al. (2003) and Dias and Wedel (2004), for example. The standard algorithm in this case is the so-called EM (Expectation-Maximization) of Dempster et al. (1977), or maybe some extension like the ECM (Meng and Rubin, 1993) or the ECME (Liu and Rubin, 1994) algorithms. For a good review, including applications in finite mixture models, see McLachlan and Krishnan (2008).

It is well known that robustness is achieved by modeling the outlier using the Student-t distribution. Finite mixtures of these distributions are useful when there is, besides discrepant observations, unobserved heterogeneity. Here, we suggest a class of models to deal with extra skewness, extending the work of Lin et al. (2007b) and Lin et al. (2007a), where finite mixtures of skew-normal (Azzalini, 1985, SN) and skew-Student-t (Azzalini and Capitanio, 2003, ST) distributions are investigated, respectively. The mixture components distributions are assumed to follow a flexible class of scale mixtures of skew-normal distributions — hereafter SMSN, presented by Branco and Dey (2001). This class contains the entire family of scale mixtures of normal distributions (Andrews and Mallows, 1974). In addition, the skew-normal and skewed versions of some other classical symmetric distributions are SMSN members: the skew-t, the skew-slash (SSL) and the skew contaminated normal (SCN), for example. These distributions have heavier tails than the skew-normal (and the normal) one, and thus they seem to be a reasonable choice for robust inference.

The remainder of the paper is organized as follows. In Section 2, for the sake of completeness, we present some properties of the univariate SMSN family and the related EM-type algorithm for maximum likelihood estimation. In Section 3 we propose a finite mixture of scale mixtures of skew-normal distributions (FM-SMSN) and an EM-type algorithm for maximum likelihood estimation. The associated observed information matrix is obtained analytically in Section 4. In Section 5 we present a simulation study to show that the proposed models are robust in terms of clustering heterogeneous data and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. Additionally, we report some model selection criteria via simulation. The methodology proposed is illustrated in Section 6, considering the analysis of a real data set.

Section snippets

Preliminaries

First, we make some remarks about the class of scale mixtures of skew-normal distributions, as introduced by Branco and Dey (2001); see also Arellano-Valle et al. (2006).

As defined by Azzalini (1985), a random variable $Z$ has skew-normal distribution with location parameter $μ$ , scale parameter $σ^{2}$ and skewness parameter $λ$ , if its density is given by $ψ (z) = 2 ϕ (z; μ, σ^{2}) Φ (\frac{λ (z - μ)}{σ}),$ where $ϕ (\cdot; μ, σ^{2})$ denotes the density of the univariate normal distribution with mean $μ$ and variance $σ^{2} > 0$ and $Φ (\cdot)$ is the

The model

The finite mixture of SMSN distributions model (FM-SMSN) is defined by considering a random sample $y = {(y_{1}, \dots, y_{n})}^{⊤}$ from a $g$ -component mixture of SMSN densities given by $f (y_{i}; Θ) = \sum_{j = 1}^{g} p_{j} ψ (y_{i}; θ_{j}), p_{j} \geq 0, \sum_{j = 1}^{g} p_{j} = 1, i = 1, \dots, n, j = 1, \dots, g,$ where $θ_{j} = {(μ_{j}, σ_{j}^{2}, λ_{j}, ν_{j}^{⊤})}^{⊤}$ is the specific vector of parameters for the component $j$ , $ψ (\cdot; θ_{j})$ is the SMSN $(θ_{j})$ density, $p_{1}, \dots, p_{g}$ are the mixing probabilities and $Θ = {({(p_{1}, \dots, p_{g})}^{⊤}, θ_{1}^{⊤}, \dots, θ_{g}^{⊤})}^{⊤}$ is the vector with all parameters. Concerning the parameter $ν_{j}$ of the mixing distribution $H (.)$

The observed information matrix

In this section we obtain the observed information matrix of the FM-SMSN model, defined as $J_{o} (Θ | y) = - \partial^{2} ℓ (Θ | y) / \partial Θ \partial Θ^{⊤} .$ It is well known that, under some regularity conditions, the covariance matrix of the maximum likelihood estimates $\hat{Θ}$ can be approximated by the inverse of $J_{o} (Θ | y)$ . Following Basford et al. (1997) and Lin et al. (2007b), we evaluate $J_{o} (\hat{Θ} | y) = \sum_{i = 1}^{n} {\hat{s}}_{i}^{⊤} {\hat{s}}_{i},$ where ${\hat{s}}_{i} = \partial (log \sum_{j = 1}^{g} p_{j} ψ (y_{i}; θ_{j})) / \partial Θ |_{Θ = \hat{Θ}} .$ We consider now the vector ${\hat{s}}_{i}$ which is partitioned into components corresponding to

Simulation study

In order to examine the performance of the proposed method, we present some simulation studies. The first simulation study shows that the underlying FM-SMSN models are robust in the ability to cluster heterogeneous data. The second simulation study shows that our proposed ECME algorithm estimates do provide good asymptotic properties. In the third study we compare some model selection criteria.

Application — The BMI data

As an application of the methodology proposed in this work, we consider the body mass index for men aged between 18 to 80 years. The data set comes from the National Health and Nutrition Examination Survey, made by the National Center for Health Statistics (NCHS) of the Center for Disease Control (CDC) in the USA. The problem of obesity has attracted attention in the last few years due to its strong relationship with many chronic diseases. Body mass index (BMI, kg/m²) has become the standard

Final conclusion

In this work we have proposed a robust approach to finite mixture modeling based on scale mixtures of skew-normal distributions. Our proposed model generalizes the recent works of Lin et al., 2007a, Lin et al., 2007b. This generalized robust model simultaneously accommodates multimodality, asymmetry and heavy tails, thus allowing practitioners from different areas to analyze data in an extremely flexible way. An ECME algorithm is developed by exploring the statistical properties of the class

Acknowledgments

The authors would like to thank the Associate Editor and two anonymous referees for their useful comments which substantially improved the quality of this paper. The second author acknowledges the partial financial support from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) and CNPq-Brazil. The third author acknowledges the partial financial support from CNPq and CAPES-Brazil.

References (36)

C. Biernacki et al.
Choosing starting values for the em algorithm for getting the highest likelihood in multivariate gaussian mixture models
Computational Statistics & Data Analysis
(2003)
M.D. Branco et al.
A general class of multivariate skew-elliptical distributions
Journal of Multivariate Analysis
(2001)
T.I. Lin
Maximum likelihood estimation for multivariate skew normal mixture models
Journal of Multivariate Analysis
(2009)
D. Nityasuddhi et al.
Asymptotic properties of the EM algorithm estimate for normal mixture models with component specific variances
Computational Statistics & Data Analysis
(2003)
J. Wang et al.
The multivariate skew-slash distribution
Journal of Statistical Planning and Inference
(2006)
H. Akaike
A new look at the statistical model identification
IEEE Transactions on Automatic Control
(1974)
D.F. Andrews et al.
Scale mixtures of normal distributions
Journal of the Royal Statistical Society, Series B
(1974)
R.B. Arellano-Valle et al.
A unified view on skewed distributions arising from selections
Canadian Journal of Statistics
(2006)
B..C. Arnold et al.
The nontruncated marginal of a truncated bivariate normal distribution
Psychometrika
(1993)
A. Azzalini
A class of distributions which includes the normal ones
Scandinavian Journal of Statistics
(1985)

A. Azzalini

The skew-normal distribution and related multivariate families

Scandinavian Journal of Statistics

(2005)

A. Azzalini et al.

Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution

Journal of the Royal Statistical Society, Series B

(2003)

Z.D. Bai et al.

On rates of convergence of efficient detection criteria in signalprocessing with white noise

IEEE Transactions on Information Theory

(1989)

K.E. Basford et al.

Standard errors of fitted component means of normal mixtures

Computational Statistics

(1997)

C. Biernacki et al.

Assessing a mixture model for clustering with the integratedcompleted likelihood

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2000)

D. Böhning

Computer-Assisted Analysis of Mixtures and Applications

(2000)

A.P. Dempster et al.

Maximum likelihood from incomplete data via the EM algorithm

Journal of the Royal Statistical Society, Series B

(1977)

J.G. Dias et al.

An empirical comparison of EM, SEM and MCMC performance for problematic gaussian mixture likelihoods

Statistics and Computing

(2004)

Cited by (122)

Clustering asymmetrical data with outliers: Parsimonious mixtures of contaminated mean-mixture of normal distributions
2024, Journal of Computational and Applied Mathematics
Mixture modeling has emerged as a statistical tool to perform unsupervised model-based clustering for heterogeneous data. A framework of using contaminated mean-mixture of normal distributions as the components of the mixture model is designed to accommodate asymmetric data with outliers. Fourteen parsimonious variants of the postulated model are introduced by employing an eigenvalue decomposition of the component scale matrices. Simultaneously clustering and outliers detection is an outstanding advantage of the proposed model in analyzing non-normally distributed data. A computationally feasible and flexible EM-type algorithm is outlined for obtaining maximum likelihood parameter estimates. Moreover, the score vector and empirical information matrix for calculating asymptotic standard errors of the parameter estimates are derived by offering an information-based approach. The applicability of the proposed method is demonstrated through the analysis of simulated and real datasets with varying proportions of outliers.
Robust fitting of mixture models using weighted complete estimating equations
2022, Computational Statistics and Data Analysis
Mixture modeling, which considers the potential heterogeneity in data, is widely adopted for classification and clustering problems. Mixture models can be estimated using the Expectation-Maximization algorithm, which works with the complete estimating equations conditioned by the latent membership variables of the cluster assignment based on the hierarchical expression of mixture models. However, when the mixture components have light tails such as a normal distribution, the mixture model can be sensitive to outliers. This study proposes a method of weighted complete estimating equations (WCE) for the robust fitting of mixture models. Our WCE introduces weights to complete estimating equations such that the weights can automatically downweight the outliers. The weights are constructed similarly to the density power divergence for mixture models, but in our WCE, they depend only on the component distributions and not on the whole mixture. A novel expectation-estimating-equation (EEE) algorithm is also developed to solve the WCE. For illustrative purposes, a multivariate Gaussian mixture, a mixture of experts, and a multivariate skew normal mixture are considered, and how our EEE algorithm can be implemented for these specific models is described. The numerical performance of the proposed robust estimation method was examined using simulated and real datasets.
An overview on the progeny of the skew-normal family— A personal perspective
2022, Journal of Multivariate Analysis
In the last two decades or so, much work has been dedicated to the portion of distribution theory stemming from the skew-normal distribution and its ramification. This contribution presents an outline of the theme, without attempting a detailed review, which would be unfeasible, given the amount of available material. The aim is to present a panoramic view of the theme, leaving out the fine details, with rather more emphasis on the evolution of the underlying ideas and on the breath of the overall developments, as for range of specific directions considered.
A non-iteration Bayesian sampling algorithm for robust seemingly unrelated regression models <sup>∗</sup>
2024, Computational Statistics
Assessment of extreme records in environmental data through the study of stochastic orders for scale mixtures of skew normal vectors
2024, Environmental and Ecological Statistics
Model-based clustering using a new multivariate skew distribution
2024, Advances in Data Analysis and Classification

View all citing articles on Scopus

View full text

Robust mixture modeling based on scale mixtures of skew-normal distributions

Abstract

Introduction

Section snippets

Preliminaries

The model

The observed information matrix

Simulation study

Application — The BMI data

Final conclusion

Acknowledgments

Computational Statistics & Data Analysis

Journal of Multivariate Analysis

Journal of Multivariate Analysis

Computational Statistics & Data Analysis

Journal of Statistical Planning and Inference

A new look at the statistical model identification

IEEE Transactions on Automatic Control

Scale mixtures of normal distributions

Journal of the Royal Statistical Society, Series B

A unified view on skewed distributions arising from selections

Canadian Journal of Statistics

The nontruncated marginal of a truncated bivariate normal distribution

Psychometrika

A class of distributions which includes the normal ones

Scandinavian Journal of Statistics

The skew-normal distribution and related multivariate families

Scandinavian Journal of Statistics

Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution

Journal of the Royal Statistical Society, Series B

On rates of convergence of efficient detection criteria in signalprocessing with white noise

IEEE Transactions on Information Theory

Standard errors of fitted component means of normal mixtures

Computational Statistics

Assessing a mixture model for clustering with the integratedcompleted likelihood

IEEE Transactions on Pattern Analysis and Machine Intelligence

Computer-Assisted Analysis of Mixtures and Applications

Maximum likelihood from incomplete data via the EM algorithm

Journal of the Royal Statistical Society, Series B

An empirical comparison of EM, SEM and MCMC performance for problematic gaussian mixture likelihoods

Statistics and Computing