Skip to main content
Log in

Cartogram visualization for nonlinear manifold learning models

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Real-world applications of multivariate data analysis often stumble upon the barrier of interpretability. Simple data analysis methods are usually easy to interpret, but they risk providing poor data models. More involved methods may instead yield faithful data models, but limited interpretability. This is the case of linear and nonlinear methods for multivariate data visualization through dimensionality reduction. Even though the latter have provided some of the most exciting visualization developments, their practicality is hindered by the difficulty of explaining them in an intuitive manner. The interpretability, and therefore the practical applicability, of data visualization through nonlinear dimensionality reduction (NLDR) methods would improve if, first, we could accurately calculate the distortion introduced by these methods in the visual representation and, second, if we could faithfully reintroduce this distortion into such representation. In this paper, we describe a technique for the reintroduction of the distortion into the visualization space of NLDR models. It is based on the concept of density-equalizing maps, or cartograms, recently developed for the representation of geographic information. We illustrate it using Generative Topographic Mapping (GTM), a nonlinear manifold learning method that can provide both multivariate data visualization and a measure of the local distortion that the model generates. Although illustrated here with GTM, it could easily be extended to other NLDR visualization methods, provided a local distortion measure could be calculated. It could also serve as a guiding tool for interactive data visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Alahakoon D, Halgamuge SK, Srinivasan B (2000) Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans Neural Netw 11(3): 601–614

    Article  Google Scholar 

  • Aupetit M (2007) Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing 70(7–9): 1304–1330

    Article  Google Scholar 

  • Bishop CM (1998) Latent variable models. In: Jordan MI (eds) Learning in graphical models. The MIT Press, Cambridge, pp 371–404

    Chapter  Google Scholar 

  • Bishop CM, Tipping ME (1998) A hierarchical latent variable model for data visualization. IEEE Trans Pattern Anal 20(3): 281–293

    Article  Google Scholar 

  • Bishop CM, Svensén M, Williams CKI (1997a) Magnification factors for the GTM algorithm. In: Proceedings of the IEE Fifth international conference on artificial neural networks. Cambridge, U.K., pp 64–69

  • Bishop CM, Svensén M, Williams CKI (1997b) Magnification factors for the SOM and GTM algorithms. In: WSOM’97, Helsinki, Finland, pp 333–338

  • Bishop CM, Svensén M, Williams CKI (1998) GTM: the generative topographic mapping. Neural Comput 10(1): 215–234

    Article  Google Scholar 

  • Cruz R, Vellido A (2010) Semi-supervised geodesic generative topographic mapping. Pattern Recognit Lett 31(3): 202–209

    Article  Google Scholar 

  • Cruz R, Vellido A (2011) Semi-supervised analysis of human brain tumours from partially labeled MRS information, using manifold learning models. Int J Neural Syst 21(1): 17–29

    Article  Google Scholar 

  • Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal 1(2): 224–227

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1): 1–38

    MathSciNet  MATH  Google Scholar 

  • Dey TK, Edelsbrunner H, Guha S (1999) Computational topology. In: Chazelle B, Goodman JE, Pollack R (eds) Advances in discrete and computational geometry (Contemporary Mathematics, 223), pp 109–143. American Mathematical Society

  • Du Q, Faber V, Gunzburger M (1999) Centroidal Voronoi tessellations: applications and algorithms. SIAM Rev 41(4): 637–676

    Article  MathSciNet  MATH  Google Scholar 

  • Fayyad U, Piatetski-Shapiro G, Smith P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3): 37–54

    Google Scholar 

  • Furukawa T (2009) SOM of SOMs. Neural Netw 22(4): 463–478

    Article  Google Scholar 

  • Gastner MT, Newman MEJ (2004) Diffusion-based method for producing density-equalizing maps. Proc Natl Acad Sci USA 101(20): 7499–7504

    Article  MathSciNet  MATH  Google Scholar 

  • Gisbrecht A, Mokbel B, Hammer B (2011) Relational generative topographic mapping. Neurocomputing 74(9): 1359–1371

    Article  Google Scholar 

  • Govindaraju V, Young K, Maudsley AA (2000) Proton NMR chemical shifts and coupling constants for brain metabolites. NMR Biomed 13(3): 129–153

    Article  Google Scholar 

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182

    MATH  Google Scholar 

  • Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications. Studies in Fuzziness and Soft Computing. Springer, Berlin

    Google Scholar 

  • Hammer B, Villmann Th (2003) Mathematical aspects of neural networks. In: ESANN 2003, d-side pub, Brussels, Belgium, pp 59–72

  • Hammer B, Hasenfuss A, Villmann Th (2007) Magnification control for batch neural gas. Neurocomputing 70(7–9): 1225–1234

    Article  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8): 651–666

    Article  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3): 264–323

    Article  Google Scholar 

  • Jeanny H (2010) Vision: images, signals and neural networks. Models of neural processing in visual perception. World Scientific Publishing, Singapore

    MATH  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis (2nd ed.) Springer Series in Statistics. Springer, Berlin

    Google Scholar 

  • Julià-Sapé M, Acosta D, Mier M, Arús C, Watson D, The INTERPRET Consortium (2006) A multi-centre, web-accessible and quality control checked database of in vivo MR spectra of brain tumour patients. Magn Reson Mater Phys 19: 22–33

    Article  Google Scholar 

  • Kohonen T (2000) Self-organizing maps, (3rd ed.) Information Science Series. Springer, Berlin

    Google Scholar 

  • Kim M, Ramakrishna RS (2005) New indices for cluster validity assessment. Pattern Recognit Lett 26(15): 2353–2363

    Article  Google Scholar 

  • Leban G, Zupan B, Vidmar G, Bratko I (2006) VizRank: data visualization guided by machine learning. Data Min Knowl Discov 13(2): 119–136

    Article  MathSciNet  Google Scholar 

  • Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction, information science and statistics. Springer, Berlin

    Book  Google Scholar 

  • Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 140: 1–55

    Google Scholar 

  • Lisboa PJG, Vellido A, Tagliaferri R, Napolitano F, Ceccarelli M, Martin-Guerrero JD, Biganzoli E (2010) Data mining in cancer research. IEEE Comput Intell Mag 5(1): 14–18

    Article  Google Scholar 

  • McLachlan G, Peel D (2000) Finite mixture models. Series in Probability and Statistics. Wiley-Blackwell

  • Meyers LS, Guarino A, Gamst G (2005) Applied multivariate research: design and interpretation. Sage Publications, Thousand Oaks

    Google Scholar 

  • Miikkulainen R, Bednar JA, Choe Y, Sirosh J (2005) Computational maps in the visual cortex. Springer, Berlin

    Google Scholar 

  • Okabe A, Boots B, Sugihara K, Chiu SN (2000) Spatial tessellations: concepts and applications of Voronoi diagrams (2nd ed.). Wiley-Blackwell, New York

    Book  MATH  Google Scholar 

  • Paulovich FV, Eler DM, Poco J, Botha CP, Minghim R, Nonato LG (2011) Piecewise Laplacian-based projection for interactive data exploration and organization. Comput Graph Forum (Proceedings EuroVis) 30(3): 1091–1100

    Article  Google Scholar 

  • Peel D, McLachlan GJ (2000) Robust mixture modelling using the t-distribution. Stat Comput 10: 339–348

    Article  Google Scholar 

  • Pointer JS (1986) The cortical magnification factor and photopic vision. Biol Rev 61(2): 97–119

    Article  Google Scholar 

  • Rauber A, Merkl D, Dittenbach M (2002) The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Trans Neural Netw 13(6): 1331–1341

    Article  Google Scholar 

  • Rong G, Liu Y, Wang W, Yin X, Gu XD, Guo X (2011) GPU-assisted computation of centroidal Voronoi tessellation. IEEE Trans Vis Comput Graph 17(3): 345–356

    Article  Google Scholar 

  • Rossi F (2006) Visual data mining and machine learning. In: ESANN 2006, d-side pub, Brussels, Belgium, pp 251–264

  • Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500): 2323–2326

    Article  Google Scholar 

  • Shearer C (2000) The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4): 13–22

    Google Scholar 

  • Svensén M (1998) GTM: The Generative Topographic Mapping. PhD Thesis. Birmingham, UK: Aston University

  • Tino P, Nabney I (2002) Hierarchical GTM: Constructing localized nonlinear projection manifolds in a principled way. IEEE Trans Pattern Anal 24(5): 639–656

    Article  Google Scholar 

  • Tobler WR (2004) Thirty-five years of computer cartograms. Ann Assoc Am Geogr 94: 58–73

    Article  Google Scholar 

  • Tosi A, Vellido A (2012) Cartogram representation of the batch-SOM magnification factor. In ESANN 2012, Bruges, Belgium, 25–27th of April, pp 203–208

  • Ultsch A (1992) Self-organizing neural networks for visualization and classification. In: GfKl 1992, Dortmund, Germany.

  • Ultsch A, Mörchen F (2005) ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Technical Report 46, CS Department, Philipps-University Marburg, Germany

  • Vellido A (2006) Missing data imputation through GTM as a mixture of t-distributions. Neural Netw 19(10): 1624–1635

    Article  MATH  Google Scholar 

  • Vellido A, Romero E, González-Navarro FF, Belanche-Muñoz L, Julià-Sapé M, Arús C (2009) Outlier exploration and diagnostic classification of a multi-centre 1H-MRS brain tumour database. Neurocomputing 72(13-15): 3085–3097

    Article  Google Scholar 

  • Vellido A, Martín JD, Rossi F, Lisboa PJG (2011) Seeing is believing: the importance of visualization in real-world machine learning applications. In: ESANN 2011, d-side pub, Brussels, Belgium, pp 219–226

  • Vellido A, Martín-Guerrero JD, Lisboa PJG, Making machine learning models interpretable. In: ESANN 2012, d-side pub, Brussels, Belgium, pp 163–172

  • Venna, J (2007) Dimensionality reduction for visual exploration of similarity structures. Doctoral thesis, Helsinki University of Technology, Dissertations in Computer and Information Science, Report D20, Espoo, Finland

  • Villmann Th, Claussen JC (2006) Magnification control in self-organizing maps and neural gas. Neural Comput 18(2): 446–469

    Article  MathSciNet  MATH  Google Scholar 

  • Wässle H, Grünert U, Röhrenbeck J, Boycott BB (1990) Retinal ganglion cell density and cortical magnification factor in the primate. Vision Res 30(11): 1897–1911

    Article  Google Scholar 

  • Ziemkiewicz C, Kosara R (2009) Preconceptions and individual differences in understanding visual metaphors. Comput Graph Forum (Proceedings EuroVis) 28(3): 911–918

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfredo Vellido.

Additional information

Responsible editor: Barbara Hammer; Daniel Keim; Guy Lebanon; Neil Lawrence.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vellido, A., García, D.L. & Nebot, À. Cartogram visualization for nonlinear manifold learning models. Data Min Knowl Disc 27, 22–54 (2013). https://doi.org/10.1007/s10618-012-0294-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0294-6

Keywords

Navigation