Abstract
Mixed normal distributions are considered in additive and multiplicative forms. While the weighted arithmetic mean of the probability density functions typically demonstrates several peaks corresponding to the parent sub-distributions, their weighted geometric mean is always expressed in one unimodal multivariate normal distribution. Estimation of the cluster center parameters from such a synthesized distribution is considered. The problem is solved by a non-linear least squares optimization yielding the cluster centers and sizes. The relationship to factor analysis by unweighted least squares and generalized least squares is noted, and numerical results are discussed. The described approach uses only the sample variance–covariance matrix and not the observations, so it can be applied for difficult clustering tasks on huge data sets from data bases and for data mining problems such as finding the approximation for the cluster centers and sizes. The suggested techniques can enrich both theoretical consideration and practical applications for clustering problems.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anderson TW (1959) Some scaling models and estimation procedures in the latent class model. In: Grenander U (ed) Probability and statistics. The Harald Cramer volume. Wiley, New York, pp 9–38
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Barral J, Mandelbrot B (2009) Fractional multiplicative processes. Annales de l’Institut Henri Poincare Pobabilites et Statistiques 45:1116–1129
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Bromiley PA (2003) Products and convolutions of Gaussian distributions, Tina Memo No. 2003-003. Internal report. Imaging Science and Biomedical Engineering Division, Medical School, University of Manchester. https://people.ok.ubc.ca/jbobowsk/phys327/Gaussian%20Convolution.pdf
Brusco MJ, Steinley D (2007) A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika 73:125–144
Byrne BM, Shavelson RJ, Muthen B (1989) Testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. Psychol Bull 105:456–466
Carreira-Perpinan M (2000) Mode-finding for mixtures of Gaussian distributions, IEEE Trans Pattern Anal Mach Intel 22:1318–1323 (see animations at http://www.cs.toronto.edu/~miguel/research/GMmodes.html)
Carreira-Perpinan M, Williams C (2003) On the number of modes of a Gaussian mixture. Scale-space methods in computer vision. Lecture Notes in Comput. Sci., vol 2695. Springer, New York, pp 625–640
Chen CH (ed) (2009) Handbook of pattern recognition and computer vision. World Scientific Publishing Co. Pte. Ltd., Singapore
Cohen AC, Burke PJ (1956) Compound normal distribution (advanced problems and solutions). Am Math Mon 63:129
Draief M, Massoulie L (2010) Epidemics and rumours in complex networks. Cambridge University Press, Cambridge
Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about highly connected world. Cambridge University Press, New York
Fraley C (1996) Algorithms for model-based Gaussian hierarchical clustering, technical report no. 311. Dept. of Statistics, University of Washington, Seattle. http://www.cba.ua.edu/~mhardin/tr311.pdf
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(part II):179–188
Gage TB (2002) Modeling birthweight and gestational age distributions: additive vs. multiplicative processes. Am J Hum Biol 14:728–734
Gower J, deRooij M (2003) A comparison of the multidimensional scaling of triadic and dyadic distances. J Classif 20:115–136
Graaff AJ, Engelbrecht AP (2011) Clustering data in stationary environments with a local network neighborhood artificial immune system. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0041-0
Guo GD, Chen S, Chen LF (2011) Soft subspace clustering with an improved feature weight self-adjustment mechanism. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0038-8
Joreskog KG (1977) Factor analysis by least-squares and maximum-likelihood methods. In: Enslein K, Ralston A, Wilf HS (eds) Statistical methods for digital computers. Wiley, New York, pp 125–153
Joreskog KG (1967) Some contributions to maximum likelihood factor analysis. Psychometrika 32:443–482
Joreskog KG, Goldberger AS (1972) Factor analysis by generalized least squares. Psychometrika 37:243–259
Ladd JW (1966) Linear probability functions and discriminant functions. Econometrica 34:873–885
Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method. Elsevier, New York
Liang JZ, Song W (2011) Clustering based on Steiner points. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0047-7
Lindsay B (1995) Mixture models: theory, geometry and applications. IMS Monographs, Hayward
Lipovetsky S, Senashenko V (1972) On the character of the interference of resonances with a continuum in the spectra of electrons ejected from atoms by protons. J Phys B 5:183–186
Lipovetsky S, Senashenko V (1974) On the shape of resonance in the spectra of electrons ejected from helium atoms during their encounter with fast electrons and protons. J Phys B 7:693–703
Lipovetsky S, Conklin M (2005) Regression by data segments via discriminant analysis. J Modern Appl Stat Methods 4:63–74
Lipovetsky S (2007) Equidistant regression modeling. Model Assist Stat Appl 2:71–80
Lipovetsky S (2009) PCA and SVD with nonnegative loadings. Pattern Recognit 42:68–76
Lipovetsky S (2009) Linear regression with special coefficient features attained via parameterization in exponential, logistic, and multinomial-logit forms. Math Comput Model 49:1427–1435
Lughofer E, Bouchot JL, Shaker A (2011) On-line elimination of local redundancies in evolving fuzzy systems. Evolv Syst 2:165–187
MacLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Maxwell AE (1983) Factor analysis. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences, vol 3. Wiley, New York, pp 2–8
Olofsson P (2005) Probability, statistics, and stochastic processes. Wiley, Hoboken
Ossiander M, Waymire EC (2000) Statistical estimation for multiplicative cascades. Ann Stat 28:1533–1560
Otter R (1948) The multiplicative process. Ann Math Stat 20:206–224
Ray S, Lindsay BG (2005) The topography of multivariate normal mixtures. Ann Stat 33:2042–2065
Redner S (1990) Random mulitplicative processes: an elementary tutorial. Am J Phys 58:267–273
Richardson S, Green P (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B 59:731–792
Robert C, Mengersen K (1999) Reparametrization issues in mixture estimation and their bearings on the Gibbs sampler. Comput Stat Data Anal 29:325–343
Roeder K (1994) A graphical technique for determining the number of components in a mixture of normal. J Am Stat Assoc 89:487–495
Roeder K, Wasserman L (1997) Practical Bayesian density estimation using mixtures of normal. J Am Stat Assoc 92:894–902
Rokach L (2009) Pattern classification using ensemble methods. World Scientific Publishing Co. Pte. Ltd., Singapore
Schilling MF, Watkins AE, Watkins W (2002) Is human height bimodal? Am Stat 56:223–229
S-PLUS’2000 (1999) MathSoft Inc., Seattle
Sornette D (1998) Multiplicative processes and power laws. Phys Rev E 57:4811–4813
Szekely GJ, Rizzo ML (2005) Hierarchical clustering via joint between-within distances: extending Ward’s minimum variance method. J Classif 22:151–183
Tsai H, Tsay RS (2010) Constrained factor models. J Am Stat Assoc 105:1593–1605
Vagin V, Fomina M (2011) Problem of knowledge discovery in noisy databases. Int J Mach Learn Cyber 2:135–145
Acknowledgments
I thank three reviewers for their help improving the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix: Geometric mean of multinormal distributions
Appendix: Geometric mean of multinormal distributions
To find the explicit form for the geometric mean of sub-distributions, use (1) in (3) yielding:
where the total of the quadratic forms can be represented as follows:
Completing the first term to the whole square we get:
Let us denote the weighted mean of the inverted covariance matrices in (40) as:
So the total covariance matrix is defined via the covariance matrices of sub-distributions:
The weighted aggregate of the Fisher discriminator vectors in (40) (in another formulation, a combination of the coefficients of regressions of binary indices of belonging to each class by predictors) we denote as:
Then the result in (40) can be presented as
where the vector \( x^{ * } \) is defined as:
and the constant C is explained below. The vector (45) can also be presented as:
so it is a weighted vector of means, because the total matrix of weights in (46) equals the identity matrix:
When \( x = x^{ * } \), the quadratic form (44) equals its maximum value:
where the additional constant A is actually zero, because
Thus, (44) is reduced explicitly to:
Then the expression (51) can be reduced to the following:
which is the geometric mean (3) in the point \( x^{ * } \) (45), and the dependence on x is given in one exponent.
Rights and permissions
About this article
Cite this article
Lipovetsky, S. Additive and multiplicative mixed normal distributions and finding cluster centers. Int. J. Mach. Learn. & Cyber. 4, 1–11 (2013). https://doi.org/10.1007/s13042-012-0070-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-012-0070-3