Skip to main content
Log in

Data visualization via latent variables and mixture models: a brief survey

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In the literature, data visualization is extensively studied via diverse parametric probabilistic distributions for the exploration of continuous, binary, and counting data. An overview of the existing methods for non-symmetric data matrices is presented in an unified framework via the Bernoulli law and binary variables. An extension to continuous or counting variables is available by using instead any another univariate distribution such as the Poisson or Gaussian one. Several approaches are possible when the model is with a distribution on the rows, the columns, the row clusters, the column clusters, the cells, the blocks, or a transformed matrix of the distances from the pairs of rows or columns. The objective functions are presented with their full expressions in separated sections, one for each method: Kohonen’s map and related methods of self-organizing maps, generative topographic mapping as a probabilistic self-organizing map, linear principal component analysis and related matricial methods (non-negative factorization, factorization), probabilistic parametric embedding, probabilistic latent semantic visualization, latent cluster position model, t-distributed stochastic neighbor embedding. The conclusion is a discussion of the contribution with perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Notes

  1. Note that an optimal parameter \(\lambda\) may be found with a measure of the quality of the mapping as proposed in [20].

  2. This distribution illustrates well the double purpose of a visual representation. Here, only the central positions are visualized and the corresponding sampled data just shown at the same coordinates. This can be summarized by (1) First a non-soft clustering (2) A projection of the cluster means. This is the global part to achieve in order to get a skeleton of the data cloud. The local part comes with the fuzzification when the data vectors scatter around the mean centers.

  3. The R package named VBLCPM was used for training the model. The convergence has been observed after less than 61 steps.

  4. The representation has to keep enough 1) the local relations of vicinity in the data cloud to access similar data in a same area of the map, 2) the global relations which make the shape and the form of the data cloud in order to access a suitable view of its appearance and also of the relative distances between the different sub-structures. When classes exist, it might be preferred the higher possible separation for their projections, without canceling the visual information on nearest neighbors for the points projections.

References

  1. Ambroise C, Govaert G (1996) Constrained clustering and kohonen self-organizing maps. J Classif 13(2):299–313

    Article  MathSciNet  MATH  Google Scholar 

  2. Anouar F, Badran F, Thiria S (1997) Self organizing map: a probabilistic approach. In: Proceedings of the WSOM'07, Finland, pp 339–344

  3. Bacciu D, Micheli A, Sperduti A (2012) Compositional generative mapping for tree-structured data - part I: bottom-up probabilistic modeling of trees. IEEE Trans Neural Netw Learn Syst 23(12):1987–2002

    Article  Google Scholar 

  4. Baek J, McLachlan G, Flack LK (2009) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309

  5. Bakker R, Poole KT (2013) Bayesian metric multidimensional scaling. Polit Anal 21(1):125–140

    Article  Google Scholar 

  6. Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises-Fisher distributions. J Mach Learn Res 6:1345–1382

    MathSciNet  MATH  Google Scholar 

  7. Basseville M (2013) Divergence measures for statistical data processing—An annotated bibliography. Signal Process 93(4):621–633

  8. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comp 15(6):1373–1396

    Article  MATH  Google Scholar 

  9. Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Tech. rep., ICSI, U.C. Berkeley

  10. Bingham E, Kabán A, Fortelius M (2009) The aspect Bernoulli model: multiple causes of presences and absences. Pat Anal Appl 12(1):55–78

    Article  MathSciNet  Google Scholar 

  11. Bingham E, Mannila H (2001) Random projection in dimensionality reduction: Applications to image and text data. In: KDD ’01: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 245–250

  12. Bishop C, Svensén M, Williams CKI (1997) Magnification factors for the gtm algorithm. In: Fifth international conference on artificial neural networks, pp 64–69

  13. Bishop CM, Svensén M, Williams CKI (1997) GTM: a principled alternative to the self-organizing map. In: Advances in neural information processing systems 9, pp 354–360

    Google Scholar 

  14. Bishop CM, Svensén M, Williams CKI (1998) Developments of the generative topographic mapping. Neurocomputing 21:203–224

    Article  MATH  Google Scholar 

  15. Careirra-Perpinan MA, Lu Z (2007) The Laplacian eigenmaps latent variable model. In: Proceedings of the Eleventh international conference on artificial intelligence and statistics (AISTATS -7), pp 59–66

  16. Carreira-Perpina MA (2010) The elastic embedding algorithm for dimensionality reduction. In: Proceedings of the 27th international conference on machine learning (ICML '10), pp 167–174

  17. Carter KM, Raich R, Hero AO (2008) FINE: Information embedding for document classification. In: ICASSP, pp 1861–1864

  18. Chaibi A, Lebbah M, Azzag H (2013) A new bi-clustering approach using topological maps. In: Neural Networks (IJCNN), pp 1–7

  19. Chang KY, Ghosh J (2001) A unified model for probabilistic principal surfaces. IEEE Trans Pattern Anal Mach Intell 23(1):22–41

    Article  Google Scholar 

  20. Chen L, Buja A (2009) Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. J Am Stat Assoc 104(485):209–219

    Article  MathSciNet  MATH  Google Scholar 

  21. Choi JY, Qiu J, Pierce M, Fox G (2010) Generative topographic mapping by deterministic annealing. Procedia Comp Sci 1(1):47–56

    Article  Google Scholar 

  22. Cruz-Barbosa R, Vellido A (2010) Semi-supervised geodesic generative topographic mapping. Pattern Recogn Lett 31(3):202–209

    Article  Google Scholar 

  23. Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comp 18(2):173–183

    Article  MathSciNet  Google Scholar 

  24. Deerwester S, Dumais ST, Furnas GW, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Info Sci 41(6):391–407

    Article  Google Scholar 

  25. Dempster AP, Laird NM, Rubin DB (1977) Maximum-likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38

    MathSciNet  MATH  Google Scholar 

  26. Ding C, Li T, Peng W (2008) On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comp Stat Data Anal 52:3913–3927

    Article  MathSciNet  MATH  Google Scholar 

  27. Estévez PA, Figueroa CJ, Saito K (2005) Special issue: Cross-entropy embedding of high-dimensional data using the neural gas model. Neural Netw 18(5–6):727–737

    Article  Google Scholar 

  28. Fevotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the itakura-saito divergence. with application to music analysis. Neural Comp 21(3):793–830

  29. Fort JC, Letremy P, Cottrell M (2002) Advantages and drawbacks of the batch kohonen algorithm. In: ESANN, pp 223–230

  30. Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comp J 41(8):578–588

    Article  MATH  Google Scholar 

  31. Girolami M (2001) The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models. IEEE Trans Neural Netw 12(6):1367–1374

    Article  Google Scholar 

  32. Gisbrecht A, Mokbel B, Hammer B (2011) Relational generative topographic mapping. Neurocomputing 74(9):1359–1371

    Article  Google Scholar 

  33. Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighbourhood components analysis. In: Advances in neural information processing systems 17, pp 513–520

  34. Gollini I, Murphy TB (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comp 24(4):569–588

    Article  MathSciNet  MATH  Google Scholar 

  35. Govaert G (1983) Classification croisée. Thèse d’état, Université Paris 6, France

  36. Govaert G (1995) Simultaneous clustering of rows and columns. Control Cybern 24(4):437–458

    MATH  Google Scholar 

  37. Govaert G, Nadif M (2003) Clustering with block mixture models. Patt Recogn 36(2):463–473

    Article  MATH  Google Scholar 

  38. Govaert G, Nadif M (2005) An EM algorithm for the block mixture model. IEEE Trans Pattern Anal Mach Intell 27(4):643–647

    Article  MATH  Google Scholar 

  39. Gupta MR, Chen Y (2011) Theory and use of the em algorithm. Found Trends Signal Process 4(3):223–296

    Article  MATH  Google Scholar 

  40. Handcock MS, Raftery AE, Tantrum JM (2007) Model-based clustering for social networks. J Royal Stat Soc A 170(2):301–354

    Article  MathSciNet  Google Scholar 

  41. Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probab Lett 4(2):53–56

    Article  MathSciNet  MATH  Google Scholar 

  42. Hernandez-lobato JM, Houlsby N, Ghahramani Z (2014) Stochastic inference for scalable probabilistic modeling of binary matrices. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp 379–387

  43. Hinton G, Roweis S (2003) Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems 15, pp 857–864

  44. Hoff PD, Raftery AE, Handcock MS (2002) Latent space approaches to social network analysis. J Am Stat Assoc 97(460):1090–1098

  45. Hofmann T (2000) ProbMap - a probabilistic approach for mapping large document collections. Intell Data Anal 4(2):149–164

    MATH  Google Scholar 

  46. Hofmann T, Puzicha J (1998) Statistical models for co-occurrence data. Tech. Rep. AIM-1625, MIT

  47. Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220

    Article  MathSciNet  MATH  Google Scholar 

  48. Hsu CC (2006) Generalizing self-organizing map for categorical data. IEEE Trans Neural Netw 17(2):294–304

  49. Iwata T, Saito K, Ueda N, Stromsten S, Griffiths TL, Tenenbaum JB (2007) Parametric embedding for class visualization. Neural Comp 19(9):2536–2556

    Article  MATH  Google Scholar 

  50. Iwata T, Yamada T, Ueda N (2008) Probabilistic latent semantic visualization: topic model for visualizing documents. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining., KDD ’08, New York, pp 363–371

  51. Jolliffe I (2002) Principal component analysis. Springer Verlag

  52. Juan A, Vidal E (2002) On the use of bernoulli mixture models for text classification. Patt Recogn 35(12):2705–2710

    Article  MATH  Google Scholar 

  53. Kabán A (2007) Predictive modelling of heterogeneous sequence collections by topographic ordering of histories. Mach Learn 68(1):63–95

    Article  Google Scholar 

  54. Kabán A, Bingham E, Hirsimaki T (2004) Learning to read between the lines: the aspect bernoulli model. In: Proceedings of the 2004 SIAM International Conference on Data Mining (SIAM DM04), pp 462–66

  55. Kabán A, Girolami M (2001) A combined latent class and trait model for analysis and visualisation of discrete data. IEEE Trans Pattern Anal Mach Intell 23(8):859–872

  56. Kabán A, Sun J, Raychaudhury S, Nolan L (2006) On class visualisation for high dimensional data: Exploring scientific data sets. In: Proceedings of the 9th International Conference on Discovery Science, DS’06, Springer-Verlag pp 125–136

  57. Kiang MY (2001) Extending the kohonen, self-organizing map networks for clustering analysis. Comp Stat Data Anal 38(2):161–180

    Article  MathSciNet  MATH  Google Scholar 

  58. Klock H, Buhmann JM (2000) Data visualization by multidimensional scaling: a deterministic annealing approach. Patt Recogn 33(4):651–669

  59. Kohonen T (1997) Self-organizing maps. Springer

  60. Kozma L, Ilin A, Raiko T (2009) Binary principal component analysis in the netflix collaborative filtering task. In: IEEE International Workshop on Machine Learning for Signal Processing, pp 1–6

  61. Lawrence ND (2005) Probabilistic non-linear principal component analysis with gaussian process latent variable models. J Mach Learn Res 6:1783–1816

    MathSciNet  MATH  Google Scholar 

  62. Lawrence ND (2012) A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models. J Mach Learn Res 13:1609–1638

    MathSciNet  MATH  Google Scholar 

  63. Le TV, Lauw HW (2014) Manifold learning for jointly modeling topic and visualization. In: Twenty-Eighth AAAI Conference on Artificial Intelligence, pp 1960–1967

  64. Le TV, Lauw HW (2014) Probabilistic latent document network embedding. In: ICDM, pp 270–279

  65. Lebart L, Morineau A, Warwick K (1984) Multivariate descriptive statistical analysis. Wiley

  66. Lebbah M, Rogovschi N, Bennani Y (2007) Besom : Bernoulli on self-organizing map. In: IJCNN, pp 631–636

  67. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems 13, pp 556–562

  68. Lee JA, Renard E, Bernard G, Dupont P, Verleysen M (2013) Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing  112:92–108

  69. Lee JA, Verleysen M (2009) Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 72(7–9):1431–1443

    Article  Google Scholar 

  70. Lee JA, Verleysen M (2010) Scale-independent quality criteria for dimensionality reduction. Pattern Recogn Lett 31(14):2248–2257

    Article  Google Scholar 

  71. Lee S, Huang JZ, Hu J (2010) Sparse logistic principal components analysis for binary data. Ann Appl Stat 4(3):1579–1601

    Article  MathSciNet  MATH  Google Scholar 

  72. López-Rubio E (2010) Probabilistic self-organizing maps for qualitative data. Neural Netw 23(10):1208–1225

    Article  Google Scholar 

  73. Luttrell SP (1994) A bayesian analysis of self-organising maps. Neural Comp 6(5):767–794

  74. van der Maaten L, Hinton G (2008) Visualizing Data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  75. Makarenkov V, Legendre P (2002) Nonlinear Redundancy Analysis and Canonical Correspondence Analysis Based on Polynomial Regression. Ecology 83(4):1146–1161

  76. Maniyar D, Nabney I (2006) Data visualization with simultaneous feature selection. In: Computational Intelligence and Bioinformatics and Computational Biology CIBCB ’06, pp 1–8

  77. McLachlan GJ, Basford KE (1988) Mixture Models. Inference and applications to clustering. Marcel Dekker, New York

    MATH  Google Scholar 

  78. McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York

    Book  MATH  Google Scholar 

  79. Mirisaee SH, Gaussier E, Termier A (2015) Improved local search for binary matrix factorization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp 1198–1204

  80. Noack A (2003) Energy models for drawing clustered small-world graphs. Tech. rep, BTU Cottbus

    Google Scholar 

  81. Oh MS, Raftery AE (2001) Bayesian Multidimensional Scaling and Choice of Dimension. J Am Stat Assoc 96(455):1031–1044

  82. Olier I, Vellido A (2008) Advances in clustering and visualization of time series using GTM through time. Neural Netw 21(7):904–913

    Article  MATH  Google Scholar 

  83. Olier I, Vellido A (2008) Variational bayesian generative topographic mapping. J Math Model Algor 7(4):371–387

    Article  MathSciNet  MATH  Google Scholar 

  84. Olier, I, Vellido A, Giraldo J (2010) Kernel generative topographic mapping. In: ESANN, pp 481–486

  85. Paatero P (1997) Least squares formulation of robust non-negative factor analysis. Chemom Intell Lab Syst 37(1):23–35

  86. Park M, Jitkrittum W, Qamar A, Szabo Z, Buesing L, Sahani M (2015) Bayesian Manifold Learning: Locally Linear Latent Variable Model (LL-LVM). Arxiv preprint. http://arxiv.org/pdf/1410.6791v3.pdf

  87. Priam R (2005) CASOM: Som for contingency tables and biplot. In: Proceedings of the WSOM'05, Paris, pp 379–386

  88. Priam R, Nadif M (2012) Generative topographic mapping and factor analyzers. In: ICPRAM (1), pp 284–287

  89. Priam R, Nadif M, Govaert G (2014) Topographic bernoulli block mixture mapping for binary tables. Patt Anal Appl 17(4):839–847

    Article  MathSciNet  MATH  Google Scholar 

  90. Priam R, Nadif M, Govaert G (2015) Generalized topographic block model. Neurocomputing (in press)

  91. Roweis S, Ghahramani Z (1999) A unifying review of linear gaussian models. Neural Comp 11(2):305–345

    Article  Google Scholar 

  92. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  93. Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems 20, pp 1257–1264

  94. Salter-Townshend M, Murphy TB (2013) Variational bayesian inference for the latent position cluster model for network data. Comp Stat Data Anal 57(1):661–671

    Article  MathSciNet  Google Scholar 

  95. Salter-Townshend M, White A, Gollini I, Murphy TB (2012) Review of statistical network analysis: models, algorithms, and software. Stat Anal Data Mining 5(4):243–264

    Article  MathSciNet  Google Scholar 

  96. Sammon J (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comp C-18(5):401–409

    Article  Google Scholar 

  97. Schein AI, Saul LK, Ungar LH (2003) A Generalized Linear Model for Principal Component Analysis of Binary Data. In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS-9), pp 14–21

  98. Silvestre C, Cardoso M, Figueiredo M (2014) Identifying the number of clusters in discrete mixture models. Arxiv preprint. http://arxiv.org/pdf/1409.7419.pdf

  99. Singh AP, Gordon GJ (2008) A unified view of matrix factorization models. In: ECML PKDD, LNAI 5212, pp 358–373

  100. Stulp F, Sigaud O (2015) Many regression algorithms, one unified model: a review. Neural Netw 69:60–79

  101. Sun S (2013) A review of deterministic approximate inference techniques for bayesian machine learning. Neural Comp Appl 23(7):2039–2050

    Article  Google Scholar 

  102. Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37:35–43

    Article  MathSciNet  MATH  Google Scholar 

  103. Tino P, Nabney I (2002) Hierarchical GTM: constructing localized nonlinear projection manifolds in a principled way. IEEE Trans Pattern Anal Mach Intell 24(5):639–656

    Article  Google Scholar 

  104. Tino P, Nabney I, Sun Y (2001) Using directional curvatures to visualize folding patterns of the GTM projection manifolds. In: ICANN, pp 421–428

  105. Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Advances in Neural Information Processing Systems 11, pp 592–598

  106. Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analyzers. Neural Comp 11(2):443–482

    Article  Google Scholar 

  107. Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J Royal Stat Soc B 61(3):611–622

    Article  MathSciNet  MATH  Google Scholar 

  108. Titsias MK, Lawrence ND (2010) Bayesian gaussian process latent variable model. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS-10), 9:844–851

  109. Hastie T, Stuetzle W (1989) Principal Curves. J Am Stat Assoc 84(406):502–516

  110. Utsugi A (1997) Hyperparameter selection for self-organizing maps. Neural Comp 9:623–635

    Article  Google Scholar 

  111. Utsugi A (2000) Bayesian sampling and ensemble learning in generative topographic mapping. Neural Process Lett 12(3):277–290

    Article  MATH  Google Scholar 

  112. Van Hulle M (2012) Self-organizing maps. In: Handbook of Natural Computing. Springer, Berlin, Heidelberg, pp 585–622

  113. Vellido A (2006) Assessment of an unsupervised feature selection method for generative topographic mapping. In: ICANN, pp 361–370

  114. Vellido A, El-Deredy W, Lisboa PJG (2003) Selective smoothing of the generative topographic mapping. IEEE Trans Neural Networks 14(4):847–852

    Article  Google Scholar 

  115. Verbeek JJ, Vlassis N, Krose BJA (2002) The generative self-organizing map: a probabilistic generalization of Kohonen’s SOM. Tech. rep., IAS-UVA-02-03

  116. Willenbockel CT, Schütte C (2015) A variational bayesian algorithm for clustering of large and complex networks. Tech. Rep. 15-25, ZIB

  117. Yamaguchi N (2012) Variational bayesian inference with automatic relevance determination for generative topographic mapping. In: SCIS-ISIS, pp 2124–2129

  118. Yin H (2008) The self-organizing maps: Background, theories, extensions and applications. In: Computational intelligence: a compendium, studies in computational intelligence, vol 115. Springer, Berlin, Heidelberg, pp 715–762

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodolphe Priam.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Priam, R., Nadif, M. Data visualization via latent variables and mixture models: a brief survey. Pattern Anal Applic 19, 807–819 (2016). https://doi.org/10.1007/s10044-015-0521-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-015-0521-z

Keywords

Navigation