Abstract
The thematic composition of document collections is commonly conceptualized by clusters of high-dimensional point clouds. However, illustrating these clusters is challenging: typical visualizations such as colored projections or parallel coordinate plots suffer from feature occlusion and noise covering the whole visualization. We propose a method that avoids structural occlusion by using topology-based visualizations to preserve primary clustering features and neglect geometric properties that cannot be preserved in low-dimensional representations. Abstracting the input points as nested dense regions with individual properties, we provide the user with intuitive landscape visualizations that illustrate the high-dimensional clustering structure occlusion-free.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Carr H, Snoeyink J, Axen U (2003) Computing contour trees in all dimensions. Comput Geom 24(2):75–94
Choo J, Bohn S, Park H (2009) Two-stage framework for visualization of clustered high dimensional data. In: IEEE VAST, IEEE, pp 67–74
Davidson GS, Hendrickson B, Johnson DK, Meyers CE, Wylie BN (1998) Knowledge mining with vxinsight: discovery through interaction. J Intell Inform Syst 11:259–285
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391–407
Edelsbrunner H, Letscher D, Zomorodian A (2002) Topological persistence and simplification. Dis Comput Geom 28(4):511–533
Elmqvist N, Dragicevic P, Fekete J-D (2008) Rolling the dice: multidimensional visual exploration using scatterplot matrix navigation. IEEE Trans Vis Comput Graph 14(6):1539–1148
Fekete J-D, Plaisant C (1999) Excentric labeling: dynamic neighborhood labeling for data visualization. In: CHI ’99: proceedings of the SIGCHI conference on human factors in computing systems
Fortune S (1997) Voronoi diagrams and Delaunay triangulations. In: Handbook of discrete and computational geometry. CRC Press, Boca Raton, pp 377–388
Gabriel RK, Sokal RR (1969) A new statistical approach to geographic variation analysis. Syst Zool 18(3):259–270
Hinneburg A, Aggarwal C, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th international conference on very large data bases (VLDB’00). Morgan Kaufmann Publishers Inc., San Francisco, pp 506–515. http://dl.acm.org/citation.cfm?id=645926.671675
Holz F, Teresniak S (2010) Towards automatic detection and tracking of topic change. In: Gelbukh A (ed) Proceedings of CICLing 2010, Iai. LNCS, vol 6008. Springer, LNCS
Ingram S, Munzner T, Olano M (2009) Glimmer: multilevel mds on the gpu. IEEE Trans Vis Comput Graph 15:249–261
Inselberg A (2012) Parallel coordinates: visual multidimensional geometry and its applications. In: Fred ALN, Filipe J (eds) KDIR. SciTePress
Inselberg A, Dimsdale B (1990) Parallel coordinates: a tool for visualizing multi-dimensional geometry. In: VIS ’90: proceedings of the 1st conference on visualization ’90, pp 361–378
Jaromczyk GT, Toussaint JW (1992) Relative neighborhood graphs and their relatives. Proc IEEE 80(9):1502–1517
John M, Chambers WS, Cleveland BK, Tukey PA (eds) (1983) Graphical methods for data analysis. The Wadsworth Statistics/Probability Series
Jolliffe IT (2002) Principal component analysis. Springer, New York
Jonathan KB, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: International conference on database theory, pp 217–235
Kaski S, Honkela T, Lagus K, Kohonen T (1998) Websom-self-organizing maps of document collections. Neurocomputing 21(1):101–117
Kohonen T, Schroeder MR, Huang TS (2001) Self-organizing maps, 3rd edn. Springer, New York
Kruskal JB, Wish M (1978) Multidimensional scaling. SAGE Publications, Beverly Hills, London
Miller NE, Wong PC, Brewster M, Foote H (1998) Topic islands—a wavelet-based text visualization system. In: Proceedings of the conference on Visualization ’98 (VIS ’98). IEEE Computer Society Press, Los Alamitos, CA, pp 189–196
Oesterling P, Scheuermann G, Teresniak S, Heyer G, Koch S, Ertl T, Weber GH (2010) Two-stage framework for a topology-based projection and visualization of classified document collections. In: 2010 IEEE symposium on visual analytics science and technology (IEEE VAST), Utah, October 2010. IEEE Computer Society, pp 91–98
Oesterling P, Heine C, Janicke H, Scheuermann G, Heyer G (2011) Visualization of high-dimensional point clouds using their density distribution’s topology. IEEE Trans Vis Comput Graph 17(11):1547–1559
Oesterling P, Heine C, Weber GH, Scheuermann G (2013) Visualizing nd point clouds as topological landscape profiles to guide local data analysis. IEEE Trans Vis Comput Graph 19(3):514–526
Pascucci V, Mclaughlin KC, Scorzelli G (2005) Multi-resolution computation and presentation of contour trees, Lawrence Livermore National Laboratory. Technical report, in the proceedings of the IASTED conference on visualization, imaging, and image processing (VIIP)
Paulovich FV, Minghim R (2006) Text map explorer: a tool to create and explore document maps. In: 2013 17th international conference on information visualisation, pp 245–251
Paulovich FV, Oliveira MCF, Minghim R (2007) The projection explorer: a flexible tool for projection-based multidimensional visualization. In: Proceedings of the XX Brazilian symposium on computer graphics and image processing (SIBGRAPI ’07), Washington, DC. IEEE Computer Society, Los Alamitos, pp 27–36
Paulovich FV, Nonato LG, Minghim R, Levkowitz H (2008) Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans Vis Comput Graph 14:564–575
Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Šilić A, Bašić BD (2010) Visualization of text streams: a survey. In: Setchi R, Jordanov I, Howlett RJ, Jain LC (eds) Knowledge-based and intelligent information and engineering systems. Lecture notes in computer science, vol 6277. Springer, Berlin, pp 31–43
Steinbach M, Ertöz L, Kumar V (2003) The challenges of clustering high-dimensional data. In: New vistas in statistical physics: applications in econophysics, bioinformatics, and pattern recognition
Teresniak S, Heyer G, Scheuermann G, Holz F (2009) Visualisierung von Bedeutungsverschiebungen in großen diachronen Dokumentkollektionen. Datenbank-Spektrum 31:33–39
Weber G, Bremer P-T, Pascucci V (2007) Topological landscapes: a terrain metaphor for scientific data. IEEE Trans Vis Comput Graph 13:1416–1423
Wise JA, Thomas JJ, Pennock K, Lantrip D, Pottier M, Schur A, Crow V (1995) Visualizing the non-visual: spatial analysis and interaction with information from text documents. In: Gershon ND, Eick SG (eds) INFOVIS. IEEE Computer Society, Los Alamitos, pp 51–58
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Oesterling, P., Heine, C., Weber, G.H., Scheuermann, G. (2014). A Topology-Based Approach to Visualize the Thematic Composition of Document Collections. In: Biemann, C., Mehler, A. (eds) Text Mining. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-12655-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-12655-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12654-8
Online ISBN: 978-3-319-12655-5
eBook Packages: Computer ScienceComputer Science (R0)