Abstract
Real-world applications of multivariate data analysis often stumble upon the barrier of interpretability. Simple data analysis methods are usually easy to interpret, but they risk providing poor data models. More involved methods may instead yield faithful data models, but limited interpretability. This is the case of linear and nonlinear methods for multivariate data visualization through dimensionality reduction. Even though the latter have provided some of the most exciting visualization developments, their practicality is hindered by the difficulty of explaining them in an intuitive manner. The interpretability, and therefore the practical applicability, of data visualization through nonlinear dimensionality reduction (NLDR) methods would improve if, first, we could accurately calculate the distortion introduced by these methods in the visual representation and, second, if we could faithfully reintroduce this distortion into such representation. In this paper, we describe a technique for the reintroduction of the distortion into the visualization space of NLDR models. It is based on the concept of density-equalizing maps, or cartograms, recently developed for the representation of geographic information. We illustrate it using Generative Topographic Mapping (GTM), a nonlinear manifold learning method that can provide both multivariate data visualization and a measure of the local distortion that the model generates. Although illustrated here with GTM, it could easily be extended to other NLDR visualization methods, provided a local distortion measure could be calculated. It could also serve as a guiding tool for interactive data visualization.
Similar content being viewed by others
References
Alahakoon D, Halgamuge SK, Srinivasan B (2000) Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans Neural Netw 11(3): 601–614
Aupetit M (2007) Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing 70(7–9): 1304–1330
Bishop CM (1998) Latent variable models. In: Jordan MI (eds) Learning in graphical models. The MIT Press, Cambridge, pp 371–404
Bishop CM, Tipping ME (1998) A hierarchical latent variable model for data visualization. IEEE Trans Pattern Anal 20(3): 281–293
Bishop CM, Svensén M, Williams CKI (1997a) Magnification factors for the GTM algorithm. In: Proceedings of the IEE Fifth international conference on artificial neural networks. Cambridge, U.K., pp 64–69
Bishop CM, Svensén M, Williams CKI (1997b) Magnification factors for the SOM and GTM algorithms. In: WSOM’97, Helsinki, Finland, pp 333–338
Bishop CM, Svensén M, Williams CKI (1998) GTM: the generative topographic mapping. Neural Comput 10(1): 215–234
Cruz R, Vellido A (2010) Semi-supervised geodesic generative topographic mapping. Pattern Recognit Lett 31(3): 202–209
Cruz R, Vellido A (2011) Semi-supervised analysis of human brain tumours from partially labeled MRS information, using manifold learning models. Int J Neural Syst 21(1): 17–29
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal 1(2): 224–227
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1): 1–38
Dey TK, Edelsbrunner H, Guha S (1999) Computational topology. In: Chazelle B, Goodman JE, Pollack R (eds) Advances in discrete and computational geometry (Contemporary Mathematics, 223), pp 109–143. American Mathematical Society
Du Q, Faber V, Gunzburger M (1999) Centroidal Voronoi tessellations: applications and algorithms. SIAM Rev 41(4): 637–676
Fayyad U, Piatetski-Shapiro G, Smith P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3): 37–54
Furukawa T (2009) SOM of SOMs. Neural Netw 22(4): 463–478
Gastner MT, Newman MEJ (2004) Diffusion-based method for producing density-equalizing maps. Proc Natl Acad Sci USA 101(20): 7499–7504
Gisbrecht A, Mokbel B, Hammer B (2011) Relational generative topographic mapping. Neurocomputing 74(9): 1359–1371
Govindaraju V, Young K, Maudsley AA (2000) Proton NMR chemical shifts and coupling constants for brain metabolites. NMR Biomed 13(3): 129–153
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications. Studies in Fuzziness and Soft Computing. Springer, Berlin
Hammer B, Villmann Th (2003) Mathematical aspects of neural networks. In: ESANN 2003, d-side pub, Brussels, Belgium, pp 59–72
Hammer B, Hasenfuss A, Villmann Th (2007) Magnification control for batch neural gas. Neurocomputing 70(7–9): 1225–1234
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8): 651–666
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3): 264–323
Jeanny H (2010) Vision: images, signals and neural networks. Models of neural processing in visual perception. World Scientific Publishing, Singapore
Jolliffe IT (2002) Principal component analysis (2nd ed.) Springer Series in Statistics. Springer, Berlin
Julià-Sapé M, Acosta D, Mier M, Arús C, Watson D, The INTERPRET Consortium (2006) A multi-centre, web-accessible and quality control checked database of in vivo MR spectra of brain tumour patients. Magn Reson Mater Phys 19: 22–33
Kohonen T (2000) Self-organizing maps, (3rd ed.) Information Science Series. Springer, Berlin
Kim M, Ramakrishna RS (2005) New indices for cluster validity assessment. Pattern Recognit Lett 26(15): 2353–2363
Leban G, Zupan B, Vidmar G, Bratko I (2006) VizRank: data visualization guided by machine learning. Data Min Knowl Discov 13(2): 119–136
Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction, information science and statistics. Springer, Berlin
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 140: 1–55
Lisboa PJG, Vellido A, Tagliaferri R, Napolitano F, Ceccarelli M, Martin-Guerrero JD, Biganzoli E (2010) Data mining in cancer research. IEEE Comput Intell Mag 5(1): 14–18
McLachlan G, Peel D (2000) Finite mixture models. Series in Probability and Statistics. Wiley-Blackwell
Meyers LS, Guarino A, Gamst G (2005) Applied multivariate research: design and interpretation. Sage Publications, Thousand Oaks
Miikkulainen R, Bednar JA, Choe Y, Sirosh J (2005) Computational maps in the visual cortex. Springer, Berlin
Okabe A, Boots B, Sugihara K, Chiu SN (2000) Spatial tessellations: concepts and applications of Voronoi diagrams (2nd ed.). Wiley-Blackwell, New York
Paulovich FV, Eler DM, Poco J, Botha CP, Minghim R, Nonato LG (2011) Piecewise Laplacian-based projection for interactive data exploration and organization. Comput Graph Forum (Proceedings EuroVis) 30(3): 1091–1100
Peel D, McLachlan GJ (2000) Robust mixture modelling using the t-distribution. Stat Comput 10: 339–348
Pointer JS (1986) The cortical magnification factor and photopic vision. Biol Rev 61(2): 97–119
Rauber A, Merkl D, Dittenbach M (2002) The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Trans Neural Netw 13(6): 1331–1341
Rong G, Liu Y, Wang W, Yin X, Gu XD, Guo X (2011) GPU-assisted computation of centroidal Voronoi tessellation. IEEE Trans Vis Comput Graph 17(3): 345–356
Rossi F (2006) Visual data mining and machine learning. In: ESANN 2006, d-side pub, Brussels, Belgium, pp 251–264
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500): 2323–2326
Shearer C (2000) The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4): 13–22
Svensén M (1998) GTM: The Generative Topographic Mapping. PhD Thesis. Birmingham, UK: Aston University
Tino P, Nabney I (2002) Hierarchical GTM: Constructing localized nonlinear projection manifolds in a principled way. IEEE Trans Pattern Anal 24(5): 639–656
Tobler WR (2004) Thirty-five years of computer cartograms. Ann Assoc Am Geogr 94: 58–73
Tosi A, Vellido A (2012) Cartogram representation of the batch-SOM magnification factor. In ESANN 2012, Bruges, Belgium, 25–27th of April, pp 203–208
Ultsch A (1992) Self-organizing neural networks for visualization and classification. In: GfKl 1992, Dortmund, Germany.
Ultsch A, Mörchen F (2005) ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Technical Report 46, CS Department, Philipps-University Marburg, Germany
Vellido A (2006) Missing data imputation through GTM as a mixture of t-distributions. Neural Netw 19(10): 1624–1635
Vellido A, Romero E, González-Navarro FF, Belanche-Muñoz L, Julià-Sapé M, Arús C (2009) Outlier exploration and diagnostic classification of a multi-centre 1H-MRS brain tumour database. Neurocomputing 72(13-15): 3085–3097
Vellido A, Martín JD, Rossi F, Lisboa PJG (2011) Seeing is believing: the importance of visualization in real-world machine learning applications. In: ESANN 2011, d-side pub, Brussels, Belgium, pp 219–226
Vellido A, Martín-Guerrero JD, Lisboa PJG, Making machine learning models interpretable. In: ESANN 2012, d-side pub, Brussels, Belgium, pp 163–172
Venna, J (2007) Dimensionality reduction for visual exploration of similarity structures. Doctoral thesis, Helsinki University of Technology, Dissertations in Computer and Information Science, Report D20, Espoo, Finland
Villmann Th, Claussen JC (2006) Magnification control in self-organizing maps and neural gas. Neural Comput 18(2): 446–469
Wässle H, Grünert U, Röhrenbeck J, Boycott BB (1990) Retinal ganglion cell density and cortical magnification factor in the primate. Vision Res 30(11): 1897–1911
Ziemkiewicz C, Kosara R (2009) Preconceptions and individual differences in understanding visual metaphors. Comput Graph Forum (Proceedings EuroVis) 28(3): 911–918
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Barbara Hammer; Daniel Keim; Guy Lebanon; Neil Lawrence.
Rights and permissions
About this article
Cite this article
Vellido, A., García, D.L. & Nebot, À. Cartogram visualization for nonlinear manifold learning models. Data Min Knowl Disc 27, 22–54 (2013). https://doi.org/10.1007/s10618-012-0294-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0294-6