Abstract
In the literature, data visualization is extensively studied via diverse parametric probabilistic distributions for the exploration of continuous, binary, and counting data. An overview of the existing methods for non-symmetric data matrices is presented in an unified framework via the Bernoulli law and binary variables. An extension to continuous or counting variables is available by using instead any another univariate distribution such as the Poisson or Gaussian one. Several approaches are possible when the model is with a distribution on the rows, the columns, the row clusters, the column clusters, the cells, the blocks, or a transformed matrix of the distances from the pairs of rows or columns. The objective functions are presented with their full expressions in separated sections, one for each method: Kohonen’s map and related methods of self-organizing maps, generative topographic mapping as a probabilistic self-organizing map, linear principal component analysis and related matricial methods (non-negative factorization, factorization), probabilistic parametric embedding, probabilistic latent semantic visualization, latent cluster position model, t-distributed stochastic neighbor embedding. The conclusion is a discussion of the contribution with perspectives.
Notes
Note that an optimal parameter \(\lambda\) may be found with a measure of the quality of the mapping as proposed in [20].
This distribution illustrates well the double purpose of a visual representation. Here, only the central positions are visualized and the corresponding sampled data just shown at the same coordinates. This can be summarized by (1) First a non-soft clustering (2) A projection of the cluster means. This is the global part to achieve in order to get a skeleton of the data cloud. The local part comes with the fuzzification when the data vectors scatter around the mean centers.
The R package named VBLCPM was used for training the model. The convergence has been observed after less than 61 steps.
The representation has to keep enough 1) the local relations of vicinity in the data cloud to access similar data in a same area of the map, 2) the global relations which make the shape and the form of the data cloud in order to access a suitable view of its appearance and also of the relative distances between the different sub-structures. When classes exist, it might be preferred the higher possible separation for their projections, without canceling the visual information on nearest neighbors for the points projections.
References
Ambroise C, Govaert G (1996) Constrained clustering and kohonen self-organizing maps. J Classif 13(2):299–313
Anouar F, Badran F, Thiria S (1997) Self organizing map: a probabilistic approach. In: Proceedings of the WSOM'07, Finland, pp 339–344
Bacciu D, Micheli A, Sperduti A (2012) Compositional generative mapping for tree-structured data - part I: bottom-up probabilistic modeling of trees. IEEE Trans Neural Netw Learn Syst 23(12):1987–2002
Baek J, McLachlan G, Flack LK (2009) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309
Bakker R, Poole KT (2013) Bayesian metric multidimensional scaling. Polit Anal 21(1):125–140
Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises-Fisher distributions. J Mach Learn Res 6:1345–1382
Basseville M (2013) Divergence measures for statistical data processing—An annotated bibliography. Signal Process 93(4):621–633
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comp 15(6):1373–1396
Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Tech. rep., ICSI, U.C. Berkeley
Bingham E, Kabán A, Fortelius M (2009) The aspect Bernoulli model: multiple causes of presences and absences. Pat Anal Appl 12(1):55–78
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: Applications to image and text data. In: KDD ’01: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 245–250
Bishop C, Svensén M, Williams CKI (1997) Magnification factors for the gtm algorithm. In: Fifth international conference on artificial neural networks, pp 64–69
Bishop CM, Svensén M, Williams CKI (1997) GTM: a principled alternative to the self-organizing map. In: Advances in neural information processing systems 9, pp 354–360
Bishop CM, Svensén M, Williams CKI (1998) Developments of the generative topographic mapping. Neurocomputing 21:203–224
Careirra-Perpinan MA, Lu Z (2007) The Laplacian eigenmaps latent variable model. In: Proceedings of the Eleventh international conference on artificial intelligence and statistics (AISTATS -7), pp 59–66
Carreira-Perpina MA (2010) The elastic embedding algorithm for dimensionality reduction. In: Proceedings of the 27th international conference on machine learning (ICML '10), pp 167–174
Carter KM, Raich R, Hero AO (2008) FINE: Information embedding for document classification. In: ICASSP, pp 1861–1864
Chaibi A, Lebbah M, Azzag H (2013) A new bi-clustering approach using topological maps. In: Neural Networks (IJCNN), pp 1–7
Chang KY, Ghosh J (2001) A unified model for probabilistic principal surfaces. IEEE Trans Pattern Anal Mach Intell 23(1):22–41
Chen L, Buja A (2009) Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. J Am Stat Assoc 104(485):209–219
Choi JY, Qiu J, Pierce M, Fox G (2010) Generative topographic mapping by deterministic annealing. Procedia Comp Sci 1(1):47–56
Cruz-Barbosa R, Vellido A (2010) Semi-supervised geodesic generative topographic mapping. Pattern Recogn Lett 31(3):202–209
Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comp 18(2):173–183
Deerwester S, Dumais ST, Furnas GW, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Info Sci 41(6):391–407
Dempster AP, Laird NM, Rubin DB (1977) Maximum-likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38
Ding C, Li T, Peng W (2008) On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comp Stat Data Anal 52:3913–3927
Estévez PA, Figueroa CJ, Saito K (2005) Special issue: Cross-entropy embedding of high-dimensional data using the neural gas model. Neural Netw 18(5–6):727–737
Fevotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the itakura-saito divergence. with application to music analysis. Neural Comp 21(3):793–830
Fort JC, Letremy P, Cottrell M (2002) Advantages and drawbacks of the batch kohonen algorithm. In: ESANN, pp 223–230
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comp J 41(8):578–588
Girolami M (2001) The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models. IEEE Trans Neural Netw 12(6):1367–1374
Gisbrecht A, Mokbel B, Hammer B (2011) Relational generative topographic mapping. Neurocomputing 74(9):1359–1371
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighbourhood components analysis. In: Advances in neural information processing systems 17, pp 513–520
Gollini I, Murphy TB (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comp 24(4):569–588
Govaert G (1983) Classification croisée. Thèse d’état, Université Paris 6, France
Govaert G (1995) Simultaneous clustering of rows and columns. Control Cybern 24(4):437–458
Govaert G, Nadif M (2003) Clustering with block mixture models. Patt Recogn 36(2):463–473
Govaert G, Nadif M (2005) An EM algorithm for the block mixture model. IEEE Trans Pattern Anal Mach Intell 27(4):643–647
Gupta MR, Chen Y (2011) Theory and use of the em algorithm. Found Trends Signal Process 4(3):223–296
Handcock MS, Raftery AE, Tantrum JM (2007) Model-based clustering for social networks. J Royal Stat Soc A 170(2):301–354
Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probab Lett 4(2):53–56
Hernandez-lobato JM, Houlsby N, Ghahramani Z (2014) Stochastic inference for scalable probabilistic modeling of binary matrices. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp 379–387
Hinton G, Roweis S (2003) Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems 15, pp 857–864
Hoff PD, Raftery AE, Handcock MS (2002) Latent space approaches to social network analysis. J Am Stat Assoc 97(460):1090–1098
Hofmann T (2000) ProbMap - a probabilistic approach for mapping large document collections. Intell Data Anal 4(2):149–164
Hofmann T, Puzicha J (1998) Statistical models for co-occurrence data. Tech. Rep. AIM-1625, MIT
Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220
Hsu CC (2006) Generalizing self-organizing map for categorical data. IEEE Trans Neural Netw 17(2):294–304
Iwata T, Saito K, Ueda N, Stromsten S, Griffiths TL, Tenenbaum JB (2007) Parametric embedding for class visualization. Neural Comp 19(9):2536–2556
Iwata T, Yamada T, Ueda N (2008) Probabilistic latent semantic visualization: topic model for visualizing documents. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining., KDD ’08, New York, pp 363–371
Jolliffe I (2002) Principal component analysis. Springer Verlag
Juan A, Vidal E (2002) On the use of bernoulli mixture models for text classification. Patt Recogn 35(12):2705–2710
Kabán A (2007) Predictive modelling of heterogeneous sequence collections by topographic ordering of histories. Mach Learn 68(1):63–95
Kabán A, Bingham E, Hirsimaki T (2004) Learning to read between the lines: the aspect bernoulli model. In: Proceedings of the 2004 SIAM International Conference on Data Mining (SIAM DM04), pp 462–66
Kabán A, Girolami M (2001) A combined latent class and trait model for analysis and visualisation of discrete data. IEEE Trans Pattern Anal Mach Intell 23(8):859–872
Kabán A, Sun J, Raychaudhury S, Nolan L (2006) On class visualisation for high dimensional data: Exploring scientific data sets. In: Proceedings of the 9th International Conference on Discovery Science, DS’06, Springer-Verlag pp 125–136
Kiang MY (2001) Extending the kohonen, self-organizing map networks for clustering analysis. Comp Stat Data Anal 38(2):161–180
Klock H, Buhmann JM (2000) Data visualization by multidimensional scaling: a deterministic annealing approach. Patt Recogn 33(4):651–669
Kohonen T (1997) Self-organizing maps. Springer
Kozma L, Ilin A, Raiko T (2009) Binary principal component analysis in the netflix collaborative filtering task. In: IEEE International Workshop on Machine Learning for Signal Processing, pp 1–6
Lawrence ND (2005) Probabilistic non-linear principal component analysis with gaussian process latent variable models. J Mach Learn Res 6:1783–1816
Lawrence ND (2012) A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models. J Mach Learn Res 13:1609–1638
Le TV, Lauw HW (2014) Manifold learning for jointly modeling topic and visualization. In: Twenty-Eighth AAAI Conference on Artificial Intelligence, pp 1960–1967
Le TV, Lauw HW (2014) Probabilistic latent document network embedding. In: ICDM, pp 270–279
Lebart L, Morineau A, Warwick K (1984) Multivariate descriptive statistical analysis. Wiley
Lebbah M, Rogovschi N, Bennani Y (2007) Besom : Bernoulli on self-organizing map. In: IJCNN, pp 631–636
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems 13, pp 556–562
Lee JA, Renard E, Bernard G, Dupont P, Verleysen M (2013) Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing 112:92–108
Lee JA, Verleysen M (2009) Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 72(7–9):1431–1443
Lee JA, Verleysen M (2010) Scale-independent quality criteria for dimensionality reduction. Pattern Recogn Lett 31(14):2248–2257
Lee S, Huang JZ, Hu J (2010) Sparse logistic principal components analysis for binary data. Ann Appl Stat 4(3):1579–1601
López-Rubio E (2010) Probabilistic self-organizing maps for qualitative data. Neural Netw 23(10):1208–1225
Luttrell SP (1994) A bayesian analysis of self-organising maps. Neural Comp 6(5):767–794
van der Maaten L, Hinton G (2008) Visualizing Data using t-SNE. J Mach Learn Res 9:2579–2605
Makarenkov V, Legendre P (2002) Nonlinear Redundancy Analysis and Canonical Correspondence Analysis Based on Polynomial Regression. Ecology 83(4):1146–1161
Maniyar D, Nabney I (2006) Data visualization with simultaneous feature selection. In: Computational Intelligence and Bioinformatics and Computational Biology CIBCB ’06, pp 1–8
McLachlan GJ, Basford KE (1988) Mixture Models. Inference and applications to clustering. Marcel Dekker, New York
McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York
Mirisaee SH, Gaussier E, Termier A (2015) Improved local search for binary matrix factorization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp 1198–1204
Noack A (2003) Energy models for drawing clustered small-world graphs. Tech. rep, BTU Cottbus
Oh MS, Raftery AE (2001) Bayesian Multidimensional Scaling and Choice of Dimension. J Am Stat Assoc 96(455):1031–1044
Olier I, Vellido A (2008) Advances in clustering and visualization of time series using GTM through time. Neural Netw 21(7):904–913
Olier I, Vellido A (2008) Variational bayesian generative topographic mapping. J Math Model Algor 7(4):371–387
Olier, I, Vellido A, Giraldo J (2010) Kernel generative topographic mapping. In: ESANN, pp 481–486
Paatero P (1997) Least squares formulation of robust non-negative factor analysis. Chemom Intell Lab Syst 37(1):23–35
Park M, Jitkrittum W, Qamar A, Szabo Z, Buesing L, Sahani M (2015) Bayesian Manifold Learning: Locally Linear Latent Variable Model (LL-LVM). Arxiv preprint. http://arxiv.org/pdf/1410.6791v3.pdf
Priam R (2005) CASOM: Som for contingency tables and biplot. In: Proceedings of the WSOM'05, Paris, pp 379–386
Priam R, Nadif M (2012) Generative topographic mapping and factor analyzers. In: ICPRAM (1), pp 284–287
Priam R, Nadif M, Govaert G (2014) Topographic bernoulli block mixture mapping for binary tables. Patt Anal Appl 17(4):839–847
Priam R, Nadif M, Govaert G (2015) Generalized topographic block model. Neurocomputing (in press)
Roweis S, Ghahramani Z (1999) A unifying review of linear gaussian models. Neural Comp 11(2):305–345
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems 20, pp 1257–1264
Salter-Townshend M, Murphy TB (2013) Variational bayesian inference for the latent position cluster model for network data. Comp Stat Data Anal 57(1):661–671
Salter-Townshend M, White A, Gollini I, Murphy TB (2012) Review of statistical network analysis: models, algorithms, and software. Stat Anal Data Mining 5(4):243–264
Sammon J (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comp C-18(5):401–409
Schein AI, Saul LK, Ungar LH (2003) A Generalized Linear Model for Principal Component Analysis of Binary Data. In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS-9), pp 14–21
Silvestre C, Cardoso M, Figueiredo M (2014) Identifying the number of clusters in discrete mixture models. Arxiv preprint. http://arxiv.org/pdf/1409.7419.pdf
Singh AP, Gordon GJ (2008) A unified view of matrix factorization models. In: ECML PKDD, LNAI 5212, pp 358–373
Stulp F, Sigaud O (2015) Many regression algorithms, one unified model: a review. Neural Netw 69:60–79
Sun S (2013) A review of deterministic approximate inference techniques for bayesian machine learning. Neural Comp Appl 23(7):2039–2050
Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37:35–43
Tino P, Nabney I (2002) Hierarchical GTM: constructing localized nonlinear projection manifolds in a principled way. IEEE Trans Pattern Anal Mach Intell 24(5):639–656
Tino P, Nabney I, Sun Y (2001) Using directional curvatures to visualize folding patterns of the GTM projection manifolds. In: ICANN, pp 421–428
Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Advances in Neural Information Processing Systems 11, pp 592–598
Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analyzers. Neural Comp 11(2):443–482
Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J Royal Stat Soc B 61(3):611–622
Titsias MK, Lawrence ND (2010) Bayesian gaussian process latent variable model. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS-10), 9:844–851
Hastie T, Stuetzle W (1989) Principal Curves. J Am Stat Assoc 84(406):502–516
Utsugi A (1997) Hyperparameter selection for self-organizing maps. Neural Comp 9:623–635
Utsugi A (2000) Bayesian sampling and ensemble learning in generative topographic mapping. Neural Process Lett 12(3):277–290
Van Hulle M (2012) Self-organizing maps. In: Handbook of Natural Computing. Springer, Berlin, Heidelberg, pp 585–622
Vellido A (2006) Assessment of an unsupervised feature selection method for generative topographic mapping. In: ICANN, pp 361–370
Vellido A, El-Deredy W, Lisboa PJG (2003) Selective smoothing of the generative topographic mapping. IEEE Trans Neural Networks 14(4):847–852
Verbeek JJ, Vlassis N, Krose BJA (2002) The generative self-organizing map: a probabilistic generalization of Kohonen’s SOM. Tech. rep., IAS-UVA-02-03
Willenbockel CT, Schütte C (2015) A variational bayesian algorithm for clustering of large and complex networks. Tech. Rep. 15-25, ZIB
Yamaguchi N (2012) Variational bayesian inference with automatic relevance determination for generative topographic mapping. In: SCIS-ISIS, pp 2124–2129
Yin H (2008) The self-organizing maps: Background, theories, extensions and applications. In: Computational intelligence: a compendium, studies in computational intelligence, vol 115. Springer, Berlin, Heidelberg, pp 715–762
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Priam, R., Nadif, M. Data visualization via latent variables and mixture models: a brief survey. Pattern Anal Applic 19, 807–819 (2016). https://doi.org/10.1007/s10044-015-0521-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-015-0521-z