Data visualization via latent variables and mixture models: a brief survey

Priam, Rodolphe; Nadif, Mohamed

doi:10.1007/s10044-015-0521-z

Data visualization via latent variables and mixture models: a brief survey

Short Paper
Published: 05 November 2015

Volume 19, pages 807–819, (2016)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Rodolphe Priam¹ &
Mohamed Nadif²

519 Accesses
2 Citations
Explore all metrics

Abstract

In the literature, data visualization is extensively studied via diverse parametric probabilistic distributions for the exploration of continuous, binary, and counting data. An overview of the existing methods for non-symmetric data matrices is presented in an unified framework via the Bernoulli law and binary variables. An extension to continuous or counting variables is available by using instead any another univariate distribution such as the Poisson or Gaussian one. Several approaches are possible when the model is with a distribution on the rows, the columns, the row clusters, the column clusters, the cells, the blocks, or a transformed matrix of the distances from the pairs of rows or columns. The objective functions are presented with their full expressions in separated sections, one for each method: Kohonen’s map and related methods of self-organizing maps, generative topographic mapping as a probabilistic self-organizing map, linear principal component analysis and related matricial methods (non-negative factorization, factorization), probabilistic parametric embedding, probabilistic latent semantic visualization, latent cluster position model, t-distributed stochastic neighbor embedding. The conclusion is a discussion of the contribution with perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Note that an optimal parameter \(\lambda\) may be found with a measure of the quality of the mapping as proposed in [20].
This distribution illustrates well the double purpose of a visual representation. Here, only the central positions are visualized and the corresponding sampled data just shown at the same coordinates. This can be summarized by (1) First a non-soft clustering (2) A projection of the cluster means. This is the global part to achieve in order to get a skeleton of the data cloud. The local part comes with the fuzzification when the data vectors scatter around the mean centers.
The R package named VBLCPM was used for training the model. The convergence has been observed after less than 61 steps.
The representation has to keep enough 1) the local relations of vicinity in the data cloud to access similar data in a same area of the map, 2) the global relations which make the shape and the form of the data cloud in order to access a suitable view of its appearance and also of the relative distances between the different sub-structures. When classes exist, it might be preferred the higher possible separation for their projections, without canceling the visual information on nearest neighbors for the points projections.

References

Ambroise C, Govaert G (1996) Constrained clustering and kohonen self-organizing maps. J Classif 13(2):299–313
Article MathSciNet MATH Google Scholar
Anouar F, Badran F, Thiria S (1997) Self organizing map: a probabilistic approach. In: Proceedings of the WSOM'07, Finland, pp 339–344
Bacciu D, Micheli A, Sperduti A (2012) Compositional generative mapping for tree-structured data - part I: bottom-up probabilistic modeling of trees. IEEE Trans Neural Netw Learn Syst 23(12):1987–2002
Article Google Scholar
Baek J, McLachlan G, Flack LK (2009) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309
Bakker R, Poole KT (2013) Bayesian metric multidimensional scaling. Polit Anal 21(1):125–140
Article Google Scholar
Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises-Fisher distributions. J Mach Learn Res 6:1345–1382
MathSciNet MATH Google Scholar
Basseville M (2013) Divergence measures for statistical data processing—An annotated bibliography. Signal Process 93(4):621–633
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comp 15(6):1373–1396
Article MATH Google Scholar
Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Tech. rep., ICSI, U.C. Berkeley
Bingham E, Kabán A, Fortelius M (2009) The aspect Bernoulli model: multiple causes of presences and absences. Pat Anal Appl 12(1):55–78
Article MathSciNet Google Scholar
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: Applications to image and text data. In: KDD ’01: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 245–250
Bishop C, Svensén M, Williams CKI (1997) Magnification factors for the gtm algorithm. In: Fifth international conference on artificial neural networks, pp 64–69
Bishop CM, Svensén M, Williams CKI (1997) GTM: a principled alternative to the self-organizing map. In: Advances in neural information processing systems 9, pp 354–360
Google Scholar
Bishop CM, Svensén M, Williams CKI (1998) Developments of the generative topographic mapping. Neurocomputing 21:203–224
Article MATH Google Scholar
Careirra-Perpinan MA, Lu Z (2007) The Laplacian eigenmaps latent variable model. In: Proceedings of the Eleventh international conference on artificial intelligence and statistics (AISTATS -7), pp 59–66
Carreira-Perpina MA (2010) The elastic embedding algorithm for dimensionality reduction. In: Proceedings of the 27th international conference on machine learning (ICML '10), pp 167–174
Carter KM, Raich R, Hero AO (2008) FINE: Information embedding for document classification. In: ICASSP, pp 1861–1864
Chaibi A, Lebbah M, Azzag H (2013) A new bi-clustering approach using topological maps. In: Neural Networks (IJCNN), pp 1–7
Chang KY, Ghosh J (2001) A unified model for probabilistic principal surfaces. IEEE Trans Pattern Anal Mach Intell 23(1):22–41
Article Google Scholar
Chen L, Buja A (2009) Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. J Am Stat Assoc 104(485):209–219
Article MathSciNet MATH Google Scholar
Choi JY, Qiu J, Pierce M, Fox G (2010) Generative topographic mapping by deterministic annealing. Procedia Comp Sci 1(1):47–56
Article Google Scholar
Cruz-Barbosa R, Vellido A (2010) Semi-supervised geodesic generative topographic mapping. Pattern Recogn Lett 31(3):202–209
Article Google Scholar
Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comp 18(2):173–183
Article MathSciNet Google Scholar
Deerwester S, Dumais ST, Furnas GW, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Info Sci 41(6):391–407
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum-likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38
MathSciNet MATH Google Scholar
Ding C, Li T, Peng W (2008) On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comp Stat Data Anal 52:3913–3927
Article MathSciNet MATH Google Scholar
Estévez PA, Figueroa CJ, Saito K (2005) Special issue: Cross-entropy embedding of high-dimensional data using the neural gas model. Neural Netw 18(5–6):727–737
Article Google Scholar
Fevotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the itakura-saito divergence. with application to music analysis. Neural Comp 21(3):793–830
Fort JC, Letremy P, Cottrell M (2002) Advantages and drawbacks of the batch kohonen algorithm. In: ESANN, pp 223–230
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comp J 41(8):578–588
Article MATH Google Scholar
Girolami M (2001) The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models. IEEE Trans Neural Netw 12(6):1367–1374
Article Google Scholar
Gisbrecht A, Mokbel B, Hammer B (2011) Relational generative topographic mapping. Neurocomputing 74(9):1359–1371
Article Google Scholar
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighbourhood components analysis. In: Advances in neural information processing systems 17, pp 513–520
Gollini I, Murphy TB (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comp 24(4):569–588
Article MathSciNet MATH Google Scholar
Govaert G (1983) Classification croisée. Thèse d’état, Université Paris 6, France
Govaert G (1995) Simultaneous clustering of rows and columns. Control Cybern 24(4):437–458
MATH Google Scholar
Govaert G, Nadif M (2003) Clustering with block mixture models. Patt Recogn 36(2):463–473
Article MATH Google Scholar
Govaert G, Nadif M (2005) An EM algorithm for the block mixture model. IEEE Trans Pattern Anal Mach Intell 27(4):643–647
Article MATH Google Scholar
Gupta MR, Chen Y (2011) Theory and use of the em algorithm. Found Trends Signal Process 4(3):223–296
Article MATH Google Scholar
Handcock MS, Raftery AE, Tantrum JM (2007) Model-based clustering for social networks. J Royal Stat Soc A 170(2):301–354
Article MathSciNet Google Scholar
Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probab Lett 4(2):53–56
Article MathSciNet MATH Google Scholar
Hernandez-lobato JM, Houlsby N, Ghahramani Z (2014) Stochastic inference for scalable probabilistic modeling of binary matrices. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp 379–387
Hinton G, Roweis S (2003) Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems 15, pp 857–864
Hoff PD, Raftery AE, Handcock MS (2002) Latent space approaches to social network analysis. J Am Stat Assoc 97(460):1090–1098
Hofmann T (2000) ProbMap - a probabilistic approach for mapping large document collections. Intell Data Anal 4(2):149–164
MATH Google Scholar
Hofmann T, Puzicha J (1998) Statistical models for co-occurrence data. Tech. Rep. AIM-1625, MIT
Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220
Article MathSciNet MATH Google Scholar
Hsu CC (2006) Generalizing self-organizing map for categorical data. IEEE Trans Neural Netw 17(2):294–304
Iwata T, Saito K, Ueda N, Stromsten S, Griffiths TL, Tenenbaum JB (2007) Parametric embedding for class visualization. Neural Comp 19(9):2536–2556
Article MATH Google Scholar
Iwata T, Yamada T, Ueda N (2008) Probabilistic latent semantic visualization: topic model for visualizing documents. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining., KDD ’08, New York, pp 363–371
Jolliffe I (2002) Principal component analysis. Springer Verlag
Juan A, Vidal E (2002) On the use of bernoulli mixture models for text classification. Patt Recogn 35(12):2705–2710
Article MATH Google Scholar
Kabán A (2007) Predictive modelling of heterogeneous sequence collections by topographic ordering of histories. Mach Learn 68(1):63–95
Article Google Scholar
Kabán A, Bingham E, Hirsimaki T (2004) Learning to read between the lines: the aspect bernoulli model. In: Proceedings of the 2004 SIAM International Conference on Data Mining (SIAM DM04), pp 462–66
Kabán A, Girolami M (2001) A combined latent class and trait model for analysis and visualisation of discrete data. IEEE Trans Pattern Anal Mach Intell 23(8):859–872
Kabán A, Sun J, Raychaudhury S, Nolan L (2006) On class visualisation for high dimensional data: Exploring scientific data sets. In: Proceedings of the 9th International Conference on Discovery Science, DS’06, Springer-Verlag pp 125–136
Kiang MY (2001) Extending the kohonen, self-organizing map networks for clustering analysis. Comp Stat Data Anal 38(2):161–180
Article MathSciNet MATH Google Scholar
Klock H, Buhmann JM (2000) Data visualization by multidimensional scaling: a deterministic annealing approach. Patt Recogn 33(4):651–669
Kohonen T (1997) Self-organizing maps. Springer
Kozma L, Ilin A, Raiko T (2009) Binary principal component analysis in the netflix collaborative filtering task. In: IEEE International Workshop on Machine Learning for Signal Processing, pp 1–6
Lawrence ND (2005) Probabilistic non-linear principal component analysis with gaussian process latent variable models. J Mach Learn Res 6:1783–1816
MathSciNet MATH Google Scholar
Lawrence ND (2012) A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models. J Mach Learn Res 13:1609–1638
MathSciNet MATH Google Scholar
Le TV, Lauw HW (2014) Manifold learning for jointly modeling topic and visualization. In: Twenty-Eighth AAAI Conference on Artificial Intelligence, pp 1960–1967
Le TV, Lauw HW (2014) Probabilistic latent document network embedding. In: ICDM, pp 270–279
Lebart L, Morineau A, Warwick K (1984) Multivariate descriptive statistical analysis. Wiley
Lebbah M, Rogovschi N, Bennani Y (2007) Besom : Bernoulli on self-organizing map. In: IJCNN, pp 631–636
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems 13, pp 556–562
Lee JA, Renard E, Bernard G, Dupont P, Verleysen M (2013) Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing 112:92–108
Lee JA, Verleysen M (2009) Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 72(7–9):1431–1443
Article Google Scholar
Lee JA, Verleysen M (2010) Scale-independent quality criteria for dimensionality reduction. Pattern Recogn Lett 31(14):2248–2257
Article Google Scholar
Lee S, Huang JZ, Hu J (2010) Sparse logistic principal components analysis for binary data. Ann Appl Stat 4(3):1579–1601
Article MathSciNet MATH Google Scholar
López-Rubio E (2010) Probabilistic self-organizing maps for qualitative data. Neural Netw 23(10):1208–1225
Article Google Scholar
Luttrell SP (1994) A bayesian analysis of self-organising maps. Neural Comp 6(5):767–794
van der Maaten L, Hinton G (2008) Visualizing Data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Makarenkov V, Legendre P (2002) Nonlinear Redundancy Analysis and Canonical Correspondence Analysis Based on Polynomial Regression. Ecology 83(4):1146–1161
Maniyar D, Nabney I (2006) Data visualization with simultaneous feature selection. In: Computational Intelligence and Bioinformatics and Computational Biology CIBCB ’06, pp 1–8
McLachlan GJ, Basford KE (1988) Mixture Models. Inference and applications to clustering. Marcel Dekker, New York
MATH Google Scholar
McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York
Book MATH Google Scholar
Mirisaee SH, Gaussier E, Termier A (2015) Improved local search for binary matrix factorization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp 1198–1204
Noack A (2003) Energy models for drawing clustered small-world graphs. Tech. rep, BTU Cottbus
Google Scholar
Oh MS, Raftery AE (2001) Bayesian Multidimensional Scaling and Choice of Dimension. J Am Stat Assoc 96(455):1031–1044
Olier I, Vellido A (2008) Advances in clustering and visualization of time series using GTM through time. Neural Netw 21(7):904–913
Article MATH Google Scholar
Olier I, Vellido A (2008) Variational bayesian generative topographic mapping. J Math Model Algor 7(4):371–387
Article MathSciNet MATH Google Scholar
Olier, I, Vellido A, Giraldo J (2010) Kernel generative topographic mapping. In: ESANN, pp 481–486
Paatero P (1997) Least squares formulation of robust non-negative factor analysis. Chemom Intell Lab Syst 37(1):23–35
Park M, Jitkrittum W, Qamar A, Szabo Z, Buesing L, Sahani M (2015) Bayesian Manifold Learning: Locally Linear Latent Variable Model (LL-LVM). Arxiv preprint. http://arxiv.org/pdf/1410.6791v3.pdf
Priam R (2005) CASOM: Som for contingency tables and biplot. In: Proceedings of the WSOM'05, Paris, pp 379–386
Priam R, Nadif M (2012) Generative topographic mapping and factor analyzers. In: ICPRAM (1), pp 284–287
Priam R, Nadif M, Govaert G (2014) Topographic bernoulli block mixture mapping for binary tables. Patt Anal Appl 17(4):839–847
Article MathSciNet MATH Google Scholar
Priam R, Nadif M, Govaert G (2015) Generalized topographic block model. Neurocomputing (in press)
Roweis S, Ghahramani Z (1999) A unifying review of linear gaussian models. Neural Comp 11(2):305–345
Article Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems 20, pp 1257–1264
Salter-Townshend M, Murphy TB (2013) Variational bayesian inference for the latent position cluster model for network data. Comp Stat Data Anal 57(1):661–671
Article MathSciNet Google Scholar
Salter-Townshend M, White A, Gollini I, Murphy TB (2012) Review of statistical network analysis: models, algorithms, and software. Stat Anal Data Mining 5(4):243–264
Article MathSciNet Google Scholar
Sammon J (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comp C-18(5):401–409
Article Google Scholar
Schein AI, Saul LK, Ungar LH (2003) A Generalized Linear Model for Principal Component Analysis of Binary Data. In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS-9), pp 14–21
Silvestre C, Cardoso M, Figueiredo M (2014) Identifying the number of clusters in discrete mixture models. Arxiv preprint. http://arxiv.org/pdf/1409.7419.pdf
Singh AP, Gordon GJ (2008) A unified view of matrix factorization models. In: ECML PKDD, LNAI 5212, pp 358–373
Stulp F, Sigaud O (2015) Many regression algorithms, one unified model: a review. Neural Netw 69:60–79
Sun S (2013) A review of deterministic approximate inference techniques for bayesian machine learning. Neural Comp Appl 23(7):2039–2050
Article Google Scholar
Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37:35–43
Article MathSciNet MATH Google Scholar
Tino P, Nabney I (2002) Hierarchical GTM: constructing localized nonlinear projection manifolds in a principled way. IEEE Trans Pattern Anal Mach Intell 24(5):639–656
Article Google Scholar
Tino P, Nabney I, Sun Y (2001) Using directional curvatures to visualize folding patterns of the GTM projection manifolds. In: ICANN, pp 421–428
Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Advances in Neural Information Processing Systems 11, pp 592–598
Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analyzers. Neural Comp 11(2):443–482
Article Google Scholar
Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J Royal Stat Soc B 61(3):611–622
Article MathSciNet MATH Google Scholar
Titsias MK, Lawrence ND (2010) Bayesian gaussian process latent variable model. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS-10), 9:844–851
Hastie T, Stuetzle W (1989) Principal Curves. J Am Stat Assoc 84(406):502–516
Utsugi A (1997) Hyperparameter selection for self-organizing maps. Neural Comp 9:623–635
Article Google Scholar
Utsugi A (2000) Bayesian sampling and ensemble learning in generative topographic mapping. Neural Process Lett 12(3):277–290
Article MATH Google Scholar
Van Hulle M (2012) Self-organizing maps. In: Handbook of Natural Computing. Springer, Berlin, Heidelberg, pp 585–622
Vellido A (2006) Assessment of an unsupervised feature selection method for generative topographic mapping. In: ICANN, pp 361–370
Vellido A, El-Deredy W, Lisboa PJG (2003) Selective smoothing of the generative topographic mapping. IEEE Trans Neural Networks 14(4):847–852
Article Google Scholar
Verbeek JJ, Vlassis N, Krose BJA (2002) The generative self-organizing map: a probabilistic generalization of Kohonen’s SOM. Tech. rep., IAS-UVA-02-03
Willenbockel CT, Schütte C (2015) A variational bayesian algorithm for clustering of large and complex networks. Tech. Rep. 15-25, ZIB
Yamaguchi N (2012) Variational bayesian inference with automatic relevance determination for generative topographic mapping. In: SCIS-ISIS, pp 2124–2129
Yin H (2008) The self-organizing maps: Background, theories, extensions and applications. In: Computational intelligence: a compendium, studies in computational intelligence, vol 115. Springer, Berlin, Heidelberg, pp 715–762
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

S3RI, University of Southampton, University Road, Southampton, SO17 1BJ, UK
Rodolphe Priam
LIPADE, Université Paris Descartes, UFR Mathématiques- Informatique, 45, rue des Saints Pères, 75270, Paris, France
Mohamed Nadif

Authors

Rodolphe Priam
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nadif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodolphe Priam.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Priam, R., Nadif, M. Data visualization via latent variables and mixture models: a brief survey. Pattern Anal Applic 19, 807–819 (2016). https://doi.org/10.1007/s10044-015-0521-z

Download citation

Received: 30 January 2015
Accepted: 17 October 2015
Published: 05 November 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10044-015-0521-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data visualization via latent variables and mixture models: a brief survey

Abstract

Access this article

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation