Abstract
In this paper, we introduce an unrestricted skew-normal generalized hyperbolic (SUNGH) distribution for use in finite mixture modeling or clustering problems. The SUNGH is a broad class of flexible distributions that includes various other well-known asymmetric and symmetric families such as the scale mixtures of skew-normal, the skew-normal generalized hyperbolic and its corresponding symmetric versions. The class of distributions provides a much needed unified framework where the choice of the best fitting distribution can proceed quite naturally through either parameter estimation or by placing constraints on specific parameters and assessing through model choice criteria. The class has several desirable properties, including an analytically tractable density and ease of computation for simulation and estimation of parameters. We illustrate the flexibility of the proposed class of distributions in a mixture modeling context using a Bayesian framework and assess the performance using simulated and real data.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Andrews, D.R., Mallows, C.L.: Scale mixture of normal distribution. J. Roy. Stat. Soc. B 36, 99–102 (1974)
Arellano-Valle, R.B., Azzalini, A.: On the unification of families of skew-normal distributions. Scand. J. Stat. 33, 561–574 (2006)
Arellano-Valle, R.B., Genton, M.G.: On fundamental skew distributions. J. Multivar. Anal. 96, 93–116 (2005)
Arellano-Valle, R.B., Genton, M.G.: Multivariate unified skew-elliptical distributions. Chil. J. Stat. 2, 17–34 (2010)
Arellano-Valle, R.B., Branco, M.D., Genton, M.G.: A unified view on skewed distributions arising from selections. Can. J. Stat. 34, 581–601 (2006)
Arellano-Valle, R.B., Bolfarine, H., Lachos, G.H.: Bayesian inference for skew-normal linear mixed model. J. Appl. Stat. 33, 561–574 (2007)
Azzalini, A.: Package ‘sn’. http://azzalini.stat.unipd.it/SN (2015). Accessed 13 May 2017
Azzalini, A., with the collaboration of Capitanio, A.: The Skew-Normal and Related Families. IMS Monographs Series. Cambridge University Press (2014)
Barndorff-Nielsen, O.: Hyperbolic distributions and distributions on hyperbolae. Scand. J. Stat. 5, 151–157 (1978)
Barndorff-Nielsen, O., Blaesild, P.: Hyperbolic distributions. In: Kotz, S., Johnson, N.L., Read, C. (eds.) Encyclopedia of Statistical Sciences, vol. 3. Wiley, New York (1980)
Barndorff-Nielsen, O., Halgreen, C.: Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 38, 309–311 (1977)
Basso, R.M., Lachos, V.H., Cabral, C.R.B., Ghosh, P.: Robust mixture modeling based on the scale mixtures of skew-normal distributions. Comput. Stat. Data Anal. 54, 2926–2941 (2010)
Böhning, D.: Computer-Assisted Analysis of Mixtures and Applications. Meta-Analysis, Disease Mapping and Others. Chapman & Hall, Boca Raton (2000)
Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001)
Browne, R.P., McNicholas, P.D.: A mixture of generalized hyperbolic distributions. Can. J. Stat. 43(2), 176–198 (2015)
Carlin, B.P., Louis, T.A.: Bayesian Methods for Data Analysis. CRC Press, Boca Raton (2011)
Celeux, G., Hurn, M., Robert, C.P.: Computational and inferential difficulties with mixture posterior distributions. J. Am. Stat. Assoc. 95, 957–970 (2000)
Celeux, G., Forbes, F., Robert, C.P., Titterington, D.M.: Deviance information criteria for missing data models. Bayesian Anal. 1, 651–674 (2006)
Chhikara, R.S., Folks, J.L.: The Inverse Gaussian Distribution. Marcel Dekker, New York (1989)
Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994)
Forbes, F., Wraith, D.: A new family of multivariate heavy-tailed distributions with variable marginal amounts of tail weight: application to robust clustering. Stat. Comput. 24(6), 971–984 (2014)
Franczak, B.C., Browne, R.P., McNicholas, P.D.: Mixtures of shifted asymmetric laplace distributions. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1149–1157 (2014)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer Series in Statistics. Springer, Berlin (2006)
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of skew-normal and skew-t distributions. Biostatistics 11(2), 317–336 (2010)
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992)
Genton, M.G.: Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman & Hall, Boca Raton (2004)
Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–260 (1953)
Hogan, J.W., Laird, N.M.: Mixture models for the joint distribution of repeated measures and event times. Stat. Med. 16, 239–258 (1997)
Holzmann, H., Munk, A., Gneiting, T.: Identifiability of finite mixtures of elliptical distributions. Scand. J. Stat. 33(4), 753–763 (2006)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continous Univariate Distributions, vol. 1. Wiley, New York (1994)
Jørgensen, B.: Statistical Properties of the Generalized Inverse Gaussian distribution. Springer, New York (1982)
Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19(1), 73–83 (2009)
Lachos, V.H., Bolfarine, H., Arellano-Valle, R.B.: Likelihood-based inference for multivariate skew-normal regression models. Commun. Stat. Theory Methods 36(9), 1769–1786 (2007)
Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew-normal independent linear mixed models. Stat. Sin. 20, 303–322 (2010)
Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22(4), 427–454 (2013a)
Lee, S.X., McLachlan, G.J.: On mixtures of skew normal and skew t distributions. Adv. Data Anal. Classif. 7(3), 241–266 (2013b)
Lee, S.X., McLachlan, G.J.: Finite mixtures of multivariate skew t distributions: some recent and new results. Stat. Comput. 24, 181–202 (2014)
Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew t-distributions: the unification of the restricted and unrestricted skew t-mixture models. Stat. Comput. 26, 573–589 (2016)
Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100(2), 257–265 (2009)
Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20(3), 343–356 (2010)
Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modeling using the skew-normal distribution. Stat. Sin. 17(b), 909–927 (2007)
Lin, T.I., Ho, H.J., Chen, C.L.: Analysis of multivariate skew normal models with incomplete data. J. Multivar. Anal. 100(10), 2337–2351 (2009)
Maier, L.M., Anderson, D.E., De Jager, P.L., Wicker, L.S., Hafler, D.A.: Allelic variant in CTLA4 alters t cell phosphorylation patterns. Proc. Natl. Acad. Sci. USA 104, 18607–18612 (2007)
Maleki, M., Arellano-Valle, R.B.: Maximum a-posteriori estimation of autoregressive processes based on finite mixtures of scale-mixtures of skew-normal distributions. J. Stat. Comput. Simul. 87(6), 1061–1083 (2017)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)
McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton (2005)
Mengersen, K., Robert, C., Titterington, D.M.: Mixtures: Estimation and Applications. Wiley, Chichester (2011)
Morris, K., McNicholas, P.D., Punzo, A., Browne, R.P.: Robust Asymmetric Clustering. ArXiv e-print arxiv:1402.6744 (2014)
Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirov, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. 106(21), 8519–8524 (2009)
R Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2017). Accessed 20 June 2017
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31(2), 129–150 (2003)
Seshadri, V.: The Inverse Gaussian Distribution: A Case Study in Exponential Families. Oxford University Press, New York (1993)
Teicher, H.: Identifiability of finite mixtures. Ann. Math. Stat. 34(4), 1265–1269 (1963)
Vilca, F., Balakrishnan, N., Zeller, C.B.: Multivariate skew-normal generalized hyperbolic distribution and its properties. J. Multivar. Anal. 128, 73–85 (2014)
Vrbik, I., McNicholas, P.D.: Analytic calculations for the EM algorithm for multivariate skew-t mixture models. Stat. Probab. Lett. 82(6), 1169–1174 (2012)
Wang, H.X., Zhang, Q.B., Luo, B., Wei, S.: Robust mixture modelling using multivariate t-distribution with missing information. Pattern Recogn. Lett. 25(6), 701–710 (2004)
Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Digital Image Computing: Techniques and Applications, Los Alamitos, California, pp. 526–531. IEEE (2009)
Wraith, D., Forbes, F.: Location and scale mixtures of Gaussians with flexible tail behaviour: properties, inference and application to multivariate clustering. Comput. Stat. Data Anal. 90(Oct.), 61–73 (2015)
Acknowledgements
The authors would like to thank the coordinating editor and anonymous reviewers for their suggestions, corrections and encouragement, which helped us to improve earlier versions of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 A.1. Proof of Propositions 1 to 6
In this appendix, we prove Propositions 1 to 6.
Proof of Proposition 1
By considering (7),
-
(a):
-
(b):
\(\square \)
Proof of Proposition 2
By considering the stochastic representation (7) and the fact that \(\varvec{W}_{0} \) (and so \({\varvec{W}}\)) are uncorrelated, this subject proved. In the case of \(\varvec{\Lambda }^{*}=\left( {{\begin{array}{cc} {\varvec{\Lambda }_{p\times q} }&{} {\mathbf{0}_{p\times m} } \\ \end{array} }} \right) \), relation (7) for \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q+m} \left( {{\varvec{\mu }} ,\varvec{\Sigma },\varvec{\Lambda }^{*},\varpi } \right) \) is equivalent to \({\varvec{Y}}={\varvec{\mu }} +\varvec{\Lambda }^{*}{\varvec{W}}+\kappa \left( U \right) ^{1/2}\varvec{\Sigma }^{1/2}{\varvec{W}}_1 ={\varvec{\mu }} +\varvec{\Lambda }{\varvec{W}}^{\left( 1 \right) }+\kappa \left( U \right) ^{1/2}\varvec{\Sigma }^{1/2}{\varvec{W}}_1 \), where \({\varvec{W}}^{\left( 1 \right) }\) is the first q components of W, and in the case of \(\varvec{\Lambda }^{*}=\left( {{\begin{array}{cc} {\mathbf{0}_{p\times m} }&{} {\varvec{\Lambda }_{p\times q} } \\ \end{array} }} \right) \), relation (7) for \({\varvec{Y}}\sim \mathrm{SUNGH}_{p,q+m} \left( {{\varvec{\mu }} ,\varvec{\Sigma },\varvec{\Lambda }^{*},{\varvec{\varpi }} } \right) \) is equivalent to \({\varvec{Y}}={\varvec{\mu }} +\varvec{\Lambda }^{*}{\varvec{W}}+\kappa \left( U \right) ^{1/2}\varvec{\Sigma }^{1/2}{\varvec{W}}_1 ={\varvec{\mu }} +\varvec{\Lambda }{\varvec{W}}^{\left( 2 \right) }+\kappa \left( U \right) ^{1/2}\varvec{\Sigma }^{1/2}{\varvec{W}}_1 \), where \({\varvec{W}}^{\left( 2 \right) }\) is the last q components of \({\varvec{W}}\)\(\square \)
Proof of Proposition 3
By considering the stochastic representation (7), we have that \({\varvec{b}}+{\varvec{BY}}={\varvec{b}}+{\varvec{B}}{\varvec{\mu }} +{\varvec{B}}\varvec{\Lambda }{\varvec{W}}+\kappa \left( U \right) ^{1/2}\left( {{\varvec{B}}\varvec{\Sigma }{\varvec{B}}^{\top }} \right) ^{1/2}{\varvec{W}}_1 \)\(\square \)
Proof of Proposition 4
By considering Proposition 3, with \({\varvec{b}}=\mathbf{0}\) and the matrix \({\varvec{B}}\) in the form of \(\left( {{\begin{array}{cc} {{\varvec{I}}_{p_1 } }&{} {{\varvec{0}}_{p_1 \times p_2 } } \\ \end{array} }} \right) \) or \(\left( {{\begin{array}{cc} {{\varvec{0}}_{p_2 \times p_1 } }&{} {{\varvec{I}}_{p_2 } } \\ \end{array} }} \right) \), respectively, this subject proved \(\square \)
Proof of Proposition 5
Since \({\varvec{Y}}=\left( {{\varvec{Y}}_1^\top ,{\varvec{Y}}_2^\top } \right) ^{\top }\), from part b) of the Proposition 1, we have \(\hbox {Var}\left[ {\varvec{Y}} \right] =\left( {\hbox {Cov}\left( {{\varvec{Y}}_i ,{\varvec{Y}}_j } \right) } \right) _{i,j=1,2} =\left( {\varvec{\Sigma }_{ij} +\varvec{\Lambda }_i \left[ {\left( {k_2 -k_1^2 } \right) \frac{2}{\pi }{} \mathbf{1}_q \mathbf{1}_q^\top -\frac{2}{\pi }k_2 I_q } \right] \varvec{\Lambda }_j^\top } \right) \). Thus, if \(\varvec{\Sigma }_{12} =\mathbf{0}\), then \(\hbox {Cov}\left( {{\varvec{Y}}_1 ,{\varvec{Y}}_2 } \right) =\varvec{\Lambda }_1 \big [ \left( {k_2 -k_1^2 } \right) \frac{2}{\pi }{} \mathbf{1}_q \mathbf{1}_q^\top -\frac{2}{\pi }k_2 I_q \big ]\varvec{\Lambda }_2^\top \), thus following that each of the conditions \(\varvec{\Lambda }_1 =\mathbf{0}\) or \(\varvec{\Lambda }_2 =\mathbf{0}\) leads to \(\hbox {Cov}\left( {{\varvec{Y}}_1 ,{\varvec{Y}}_2 } \right) =\mathbf{0}\)\(\square \)
Proof of Proposition 6
The first part follows by applying Proposition 2 in the Proposition 4. For the proof of the second result, note from the proof of Proposition 5 that
Thus, using the partitions \({\varvec{I}}_q =\hbox {diag}\left( {{\varvec{I}}_{q_1 } ,{\varvec{I}}_{q_2 } } \right) \) and \(\mathbf{1}_q =\left( {\mathbf{1}_{q_1 }^\top ,\mathbf{1}_{q_2 }^\top } \right) ^{\top }\) we obtain the proof \(\square \)
1.2 A.2. Matrix variate priors for skewness matrix
Considering the matrix variate priors in the form of \(\varvec{\Lambda }_k \sim MN_{p,q} \left( {{\varvec{N}}_k ,{\varvec{S}}_k ,{\varvec{F}}_k } \right) ,k=1,\ldots ,K\), where MN denotes the matrix normal distributions, this leads to the following posteriors instead of (19) as follows:
\(\left. {\hbox {vec}(\varvec{\Lambda }_k )} \right| \varvec{\Theta }_{\left( {-\varvec{\Lambda }_k } \right) } ,{\varvec{y}},{\varvec{u}},{\varvec{w}},z_i =k\sim N_{pq} \left( {{\varvec{\mu }} ,\varvec{\Sigma }} \right) ;k=1,\ldots ,K\), where
where \({\varvec{L}}_{ik} ={\varvec{w}}_{ik} {\varvec{w}}_{ik}^\top \) and \({\varvec{M}}_{ik} =\left( {{\varvec{y}}_i -{\varvec{\mu }} _k } \right) {\varvec{w}}_{ik}^\top \), for which \(\otimes \) denotes the Kronecker product and \(\hbox {vec}\) denotes the vectorization of a matrix (a linear transformation which converts the matrix into a column vector).
Using these forms for the Gibbs updates may improve mixing and convergence to a stationary distribution. However, they involve the use of matrix variate distributions for which users may not be familiar; hence, a simpler (computational) update is provided in the main text.
Rights and permissions
About this article
Cite this article
Maleki, M., Wraith, D. & Arellano-Valle, R.B. Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions. Stat Comput 29, 415–428 (2019). https://doi.org/10.1007/s11222-018-9815-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-018-9815-5