Additive and multiplicative mixed normal distributions and finding cluster centers

Lipovetsky, Stan

doi:10.1007/s13042-012-0070-3

Additive and multiplicative mixed normal distributions and finding cluster centers

Original Article
Published: 22 February 2012

Volume 4, pages 1–11, (2013)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Stan Lipovetsky¹

321 Accesses
Explore all metrics

Abstract

Mixed normal distributions are considered in additive and multiplicative forms. While the weighted arithmetic mean of the probability density functions typically demonstrates several peaks corresponding to the parent sub-distributions, their weighted geometric mean is always expressed in one unimodal multivariate normal distribution. Estimation of the cluster center parameters from such a synthesized distribution is considered. The problem is solved by a non-linear least squares optimization yielding the cluster centers and sizes. The relationship to factor analysis by unweighted least squares and generalized least squares is noted, and numerical results are discussed. The described approach uses only the sample variance–covariance matrix and not the observations, so it can be applied for difficult clustering tasks on huge data sets from data bases and for data mining problems such as finding the approximation for the cluster centers and sizes. The suggested techniques can enrich both theoretical consideration and practical applications for clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Alternating Least-Squares Algorithm for CDPCA

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

Weighted likelihood mixture modeling and model-based clustering

Article 10 June 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Anderson TW (1959) Some scaling models and estimation procedures in the latent class model. In: Grenander U (ed) Probability and statistics. The Harald Cramer volume. Wiley, New York, pp 9–38
Google Scholar
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Article MathSciNet MATH Google Scholar
Barral J, Mandelbrot B (2009) Fractional multiplicative processes. Annales de l’Institut Henri Poincare Pobabilites et Statistiques 45:1116–1129
Article MathSciNet MATH Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Bromiley PA (2003) Products and convolutions of Gaussian distributions, Tina Memo No. 2003-003. Internal report. Imaging Science and Biomedical Engineering Division, Medical School, University of Manchester. https://people.ok.ubc.ca/jbobowsk/phys327/Gaussian%20Convolution.pdf
Brusco MJ, Steinley D (2007) A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika 73:125–144
MathSciNet Google Scholar
Byrne BM, Shavelson RJ, Muthen B (1989) Testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. Psychol Bull 105:456–466
Article Google Scholar
Carreira-Perpinan M (2000) Mode-finding for mixtures of Gaussian distributions, IEEE Trans Pattern Anal Mach Intel 22:1318–1323 (see animations at http://www.cs.toronto.edu/~miguel/research/GMmodes.html)
Google Scholar
Carreira-Perpinan M, Williams C (2003) On the number of modes of a Gaussian mixture. Scale-space methods in computer vision. Lecture Notes in Comput. Sci., vol 2695. Springer, New York, pp 625–640
Chen CH (ed) (2009) Handbook of pattern recognition and computer vision. World Scientific Publishing Co. Pte. Ltd., Singapore
Google Scholar
Cohen AC, Burke PJ (1956) Compound normal distribution (advanced problems and solutions). Am Math Mon 63:129
Article MathSciNet Google Scholar
Draief M, Massoulie L (2010) Epidemics and rumours in complex networks. Cambridge University Press, Cambridge
MATH Google Scholar
Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about highly connected world. Cambridge University Press, New York
Book MATH Google Scholar
Fraley C (1996) Algorithms for model-based Gaussian hierarchical clustering, technical report no. 311. Dept. of Statistics, University of Washington, Seattle. http://www.cba.ua.edu/~mhardin/tr311.pdf
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631
Article MathSciNet MATH Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(part II):179–188
Google Scholar
Gage TB (2002) Modeling birthweight and gestational age distributions: additive vs. multiplicative processes. Am J Hum Biol 14:728–734
Article Google Scholar
Gower J, deRooij M (2003) A comparison of the multidimensional scaling of triadic and dyadic distances. J Classif 20:115–136
Article MathSciNet MATH Google Scholar
Graaff AJ, Engelbrecht AP (2011) Clustering data in stationary environments with a local network neighborhood artificial immune system. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0041-0
Guo GD, Chen S, Chen LF (2011) Soft subspace clustering with an improved feature weight self-adjustment mechanism. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0038-8
Joreskog KG (1977) Factor analysis by least-squares and maximum-likelihood methods. In: Enslein K, Ralston A, Wilf HS (eds) Statistical methods for digital computers. Wiley, New York, pp 125–153
Google Scholar
Joreskog KG (1967) Some contributions to maximum likelihood factor analysis. Psychometrika 32:443–482
Article MathSciNet Google Scholar
Joreskog KG, Goldberger AS (1972) Factor analysis by generalized least squares. Psychometrika 37:243–259
Article MathSciNet Google Scholar
Ladd JW (1966) Linear probability functions and discriminant functions. Econometrica 34:873–885
Article Google Scholar
Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method. Elsevier, New York
MATH Google Scholar
Liang JZ, Song W (2011) Clustering based on Steiner points. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0047-7
Lindsay B (1995) Mixture models: theory, geometry and applications. IMS Monographs, Hayward
Lipovetsky S, Senashenko V (1972) On the character of the interference of resonances with a continuum in the spectra of electrons ejected from atoms by protons. J Phys B 5:183–186
Article Google Scholar
Lipovetsky S, Senashenko V (1974) On the shape of resonance in the spectra of electrons ejected from helium atoms during their encounter with fast electrons and protons. J Phys B 7:693–703
Article Google Scholar
Lipovetsky S, Conklin M (2005) Regression by data segments via discriminant analysis. J Modern Appl Stat Methods 4:63–74
Google Scholar
Lipovetsky S (2007) Equidistant regression modeling. Model Assist Stat Appl 2:71–80
MathSciNet MATH Google Scholar
Lipovetsky S (2009) PCA and SVD with nonnegative loadings. Pattern Recognit 42:68–76
Article MATH Google Scholar
Lipovetsky S (2009) Linear regression with special coefficient features attained via parameterization in exponential, logistic, and multinomial-logit forms. Math Comput Model 49:1427–1435
Article MathSciNet MATH Google Scholar
Lughofer E, Bouchot JL, Shaker A (2011) On-line elimination of local redundancies in evolving fuzzy systems. Evolv Syst 2:165–187
Article Google Scholar
MacLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Book Google Scholar
Maxwell AE (1983) Factor analysis. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences, vol 3. Wiley, New York, pp 2–8
Olofsson P (2005) Probability, statistics, and stochastic processes. Wiley, Hoboken
Book MATH Google Scholar
Ossiander M, Waymire EC (2000) Statistical estimation for multiplicative cascades. Ann Stat 28:1533–1560
Article MathSciNet MATH Google Scholar
Otter R (1948) The multiplicative process. Ann Math Stat 20:206–224
Article MathSciNet Google Scholar
Ray S, Lindsay BG (2005) The topography of multivariate normal mixtures. Ann Stat 33:2042–2065
Article MathSciNet MATH Google Scholar
Redner S (1990) Random mulitplicative processes: an elementary tutorial. Am J Phys 58:267–273
Article Google Scholar
Richardson S, Green P (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B 59:731–792
Article MathSciNet MATH Google Scholar
Robert C, Mengersen K (1999) Reparametrization issues in mixture estimation and their bearings on the Gibbs sampler. Comput Stat Data Anal 29:325–343
Article MATH Google Scholar
Roeder K (1994) A graphical technique for determining the number of components in a mixture of normal. J Am Stat Assoc 89:487–495
Article MathSciNet MATH Google Scholar
Roeder K, Wasserman L (1997) Practical Bayesian density estimation using mixtures of normal. J Am Stat Assoc 92:894–902
Article MathSciNet MATH Google Scholar
Rokach L (2009) Pattern classification using ensemble methods. World Scientific Publishing Co. Pte. Ltd., Singapore
Google Scholar
Schilling MF, Watkins AE, Watkins W (2002) Is human height bimodal? Am Stat 56:223–229
Article MathSciNet Google Scholar
S-PLUS’2000 (1999) MathSoft Inc., Seattle
Sornette D (1998) Multiplicative processes and power laws. Phys Rev E 57:4811–4813
Article Google Scholar
Szekely GJ, Rizzo ML (2005) Hierarchical clustering via joint between-within distances: extending Ward’s minimum variance method. J Classif 22:151–183
Article MathSciNet Google Scholar
Tsai H, Tsay RS (2010) Constrained factor models. J Am Stat Assoc 105:1593–1605
Article MathSciNet Google Scholar
Vagin V, Fomina M (2011) Problem of knowledge discovery in noisy databases. Int J Mach Learn Cyber 2:135–145
Article Google Scholar

Download references

Acknowledgments

I thank three reviewers for their help improving the paper.

Author information

Authors and Affiliations

GfK Custom Research North America, 8401 Golden Valley Road, Minneapolis, MN, 55427, USA
Stan Lipovetsky

Authors

Stan Lipovetsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stan Lipovetsky.

Appendix: Geometric mean of multinormal distributions

To find the explicit form for the geometric mean of sub-distributions, use (1) in (3) yielding:

$$ \begin{aligned} G(x) & = \prod\limits_{q = 1}^{K} {\left[ {\left( {(2\pi )^{n} |S_{q} |} \right)^{{ - \gamma_{q} /2}} \exp \left( { - \frac{1}{2}(x - m_{q} )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(x - m_{q} )} \right)} \right]} \\ & = \exp \left( { - \frac{1}{2}Q(x)} \right)\prod\limits_{q = 1}^{K} {\left( {(2\pi )^{n} |S_{q} |} \right)^{{ - \gamma_{q} /2}} } \\ \end{aligned} $$

(38)

where the total of the quadratic forms can be represented as follows:

$$ \begin{aligned} Q(x) & = \sum\limits_{q = 1}^{K} {(x - m_{q} )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(x - m_{q} )} \\ & = x^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)x - 2x^{\prime}\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} + } \sum\limits_{q = 1}^{K} {m_{q}^{\prime } (\gamma_{q} S_{q}^{ - 1} )m_{q} } \\ & = x^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)\left\{ {x - 2\left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)} \right\} + \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } \\ \end{aligned} $$

(39)

Completing the first term to the whole square we get:

$$ \begin{aligned} Q(x) & = \left\{ {x - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)} \right\}^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right) \\ & \quad \times \left\{ {x - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)} \right\} + \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } \\ & \quad - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right) \\ \end{aligned} $$

(40)

Let us denote the weighted mean of the inverted covariance matrices in (40) as:

$$ S_{tot}^{ - 1} \equiv \sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } . $$

(41)

So the total covariance matrix is defined via the covariance matrices of sub-distributions:

$$ S_{tot} = \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} . $$

(42)

The weighted aggregate of the Fisher discriminator vectors in (40) (in another formulation, a combination of the coefficients of regressions of binary indices of belonging to each class by predictors) we denote as:

$$ F_{tot} \equiv \sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } . $$

(43)

Then the result in (40) can be presented as

$$ Q(x) = \left( {x - x^{ * } } \right)^{\prime } S_{tot}^{ - 1} \left( {x - x^{ * } } \right) + C, $$

(44)

where the vector $ x^{ * } $ is defined as:

$$ x^{ * } = \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right), $$

(45)

and the constant C is explained below. The vector (45) can also be presented as:

$$ x^{ * } = S_{tot} F_{tot} = \sum\limits_{q = 1}^{K} {(\gamma_{q} S_{tot} S_{q}^{ - 1} )m_{q} } , $$

(46)

so it is a weighted vector of means, because the total matrix of weights in (46) equals the identity matrix:

$$ \sum\limits_{q = 1}^{K} {\gamma_{q} S_{tot} S_{q}^{ - 1} = } S_{tot} \sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} = } S_{tot} S_{tot}^{ - 1} = I. $$

(47)

When $ x = x^{ * } $, the quadratic form (44) equals its maximum value:

$$ \begin{aligned} C & = \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right) \\ & = \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } - F^{\prime}_{tot} S_{tot} F_{tot} = \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } - F_{tot}^{\prime } x^{ * } \\ & = \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} (m_{q} } - x^{ * } ) = \sum\limits_{q = 1}^{K} {(m_{q} - x^{ * } )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(m_{q} } - x^{ * } ) + A \\ \end{aligned} $$

(48)

where the additional constant A is actually zero, because

$$ \begin{aligned} A & = \sum\limits_{q = 1}^{K} {(x^{ * } )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(m_{q} } - x^{ * } ) = (x^{ * } )^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)x^{ * } } \right) \\ & = (x^{ * } )^{\prime } \left( {F_{tot} - S_{tot}^{ - 1} S_{tot} F_{tot} } \right) = 0 \\ \end{aligned} $$

(49)

Thus, (44) is reduced explicitly to:

$$ Q(x) = (x - x^{ * } )^{\prime } S_{tot}^{ - 1} (x - x^{ * } ) + \sum\limits_{q = 1}^{K} {(m_{q} - x^{ * } )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(m_{q} - x^{ * } )} . $$

(50)

Using (50) in (38) yields:

$$ \begin{aligned} G(x) & = \exp \left( { - \frac{1}{2}(x - x^{ * } )^{\prime } S_{tot}^{ - 1} (x - x^{ * } )} \right) \\ & \quad \times \prod\limits_{q = 1}^{K} {\left[ {\left( {(2\pi )^{n} |S_{q} |} \right)^{ - 1/2} \exp \left( { - \frac{1}{2}(x^{ * } - m_{q} )^{\prime } (S_{q}^{ - 1} )(x^{ * } - m_{q} )} \right)} \right]}^{{\gamma_{q} }} \\ \end{aligned} $$

(51)

Then the expression (51) can be reduced to the following:

$$ \begin{aligned} G(x) & = \exp \left( { - \frac{1}{2}(x - x^{ * } )^{\prime } S_{tot}^{ - 1} (x - x^{ * } )} \right)\prod\limits_{q = 1}^{K} {\left[ {f_{q} \left( {x^{ * } ,m_{q} ,S_{q} } \right)} \right]}^{{\gamma_{q} }} \\ & = G(x^{ * } )\exp \left( { - \frac{1}{2}(x - x^{ * } )^{\prime } S_{tot}^{ - 1} (x - x^{ * } )} \right) \\ \end{aligned} $$

(52)

which is the geometric mean (3) in the point $ x^{ * } $ (45), and the dependence on x is given in one exponent.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lipovetsky, S. Additive and multiplicative mixed normal distributions and finding cluster centers. Int. J. Mach. Learn. & Cyber. 4, 1–11 (2013). https://doi.org/10.1007/s13042-012-0070-3

Download citation

Received: 10 July 2011
Accepted: 16 January 2012
Published: 22 February 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s13042-012-0070-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Additive and multiplicative mixed normal distributions and finding cluster centers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Alternating Least-Squares Algorithm for CDPCA

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

Weighted likelihood mixture modeling and model-based clustering

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Geometric mean of multinormal distributions

Appendix: Geometric mean of multinormal distributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now