Skip to main content

A Topology-Based Approach to Visualize the Thematic Composition of Document Collections

  • Chapter
  • First Online:
Text Mining

Abstract

The thematic composition of document collections is commonly conceptualized by clusters of high-dimensional point clouds. However, illustrating these clusters is challenging: typical visualizations such as colored projections or parallel coordinate plots suffer from feature occlusion and noise covering the whole visualization. We propose a method that avoids structural occlusion by using topology-based visualizations to preserve primary clustering features and neglect geometric properties that cannot be preserved in low-dimensional representations. Abstracting the input points as nested dense regions with individual properties, we provide the user with intuitive landscape visualizations that illustrate the high-dimensional clustering structure occlusion-free.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The diagrams in Figs. 2 and 3 can be arbitrarily magnified in the electronic version of this article.

References

  1. Carr H, Snoeyink J, Axen U (2003) Computing contour trees in all dimensions. Comput Geom 24(2):75–94

    Article  MATH  MathSciNet  Google Scholar 

  2. Choo J, Bohn S, Park H (2009) Two-stage framework for visualization of clustered high dimensional data. In: IEEE VAST, IEEE, pp 67–74

    Google Scholar 

  3. Davidson GS, Hendrickson B, Johnson DK, Meyers CE, Wylie BN (1998) Knowledge mining with vxinsight: discovery through interaction. J Intell Inform Syst 11:259–285

    Article  Google Scholar 

  4. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391–407

    Article  Google Scholar 

  5. Edelsbrunner H, Letscher D, Zomorodian A (2002) Topological persistence and simplification. Dis Comput Geom 28(4):511–533

    Article  MATH  MathSciNet  Google Scholar 

  6. Elmqvist N, Dragicevic P, Fekete J-D (2008) Rolling the dice: multidimensional visual exploration using scatterplot matrix navigation. IEEE Trans Vis Comput Graph 14(6):1539–1148

    Article  Google Scholar 

  7. Fekete J-D, Plaisant C (1999) Excentric labeling: dynamic neighborhood labeling for data visualization. In: CHI ’99: proceedings of the SIGCHI conference on human factors in computing systems

    Google Scholar 

  8. Fortune S (1997) Voronoi diagrams and Delaunay triangulations. In: Handbook of discrete and computational geometry. CRC Press, Boca Raton, pp 377–388

    Google Scholar 

  9. Gabriel RK, Sokal RR (1969) A new statistical approach to geographic variation analysis. Syst Zool 18(3):259–270

    Article  Google Scholar 

  10. Hinneburg A, Aggarwal C, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th international conference on very large data bases (VLDB’00). Morgan Kaufmann Publishers Inc., San Francisco, pp 506–515. http://dl.acm.org/citation.cfm?id=645926.671675

  11. Holz F, Teresniak S (2010) Towards automatic detection and tracking of topic change. In: Gelbukh A (ed) Proceedings of CICLing 2010, Iai. LNCS, vol 6008. Springer, LNCS

    Google Scholar 

  12. Ingram S, Munzner T, Olano M (2009) Glimmer: multilevel mds on the gpu. IEEE Trans Vis Comput Graph 15:249–261

    Article  Google Scholar 

  13. Inselberg A (2012) Parallel coordinates: visual multidimensional geometry and its applications. In: Fred ALN, Filipe J (eds) KDIR. SciTePress

    Google Scholar 

  14. Inselberg A, Dimsdale B (1990) Parallel coordinates: a tool for visualizing multi-dimensional geometry. In: VIS ’90: proceedings of the 1st conference on visualization ’90, pp 361–378

    Google Scholar 

  15. Jaromczyk GT, Toussaint JW (1992) Relative neighborhood graphs and their relatives. Proc IEEE 80(9):1502–1517

    Article  Google Scholar 

  16. John M, Chambers WS, Cleveland BK, Tukey PA (eds) (1983) Graphical methods for data analysis. The Wadsworth Statistics/Probability Series

    MATH  Google Scholar 

  17. Jolliffe IT (2002) Principal component analysis. Springer, New York

    MATH  Google Scholar 

  18. Jonathan KB, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: International conference on database theory, pp 217–235

    Google Scholar 

  19. Kaski S, Honkela T, Lagus K, Kohonen T (1998) Websom-self-organizing maps of document collections. Neurocomputing 21(1):101–117

    Article  MATH  Google Scholar 

  20. Kohonen T, Schroeder MR, Huang TS (2001) Self-organizing maps, 3rd edn. Springer, New York

    Book  MATH  Google Scholar 

  21. Kruskal JB, Wish M (1978) Multidimensional scaling. SAGE Publications, Beverly Hills, London

    Google Scholar 

  22. Miller NE, Wong PC, Brewster M, Foote H (1998) Topic islands—a wavelet-based text visualization system. In: Proceedings of the conference on Visualization ’98 (VIS ’98). IEEE Computer Society Press, Los Alamitos, CA, pp 189–196

    Google Scholar 

  23. Oesterling P, Scheuermann G, Teresniak S, Heyer G, Koch S, Ertl T, Weber GH (2010) Two-stage framework for a topology-based projection and visualization of classified document collections. In: 2010 IEEE symposium on visual analytics science and technology (IEEE VAST), Utah, October 2010. IEEE Computer Society, pp 91–98

    Google Scholar 

  24. Oesterling P, Heine C, Janicke H, Scheuermann G, Heyer G (2011) Visualization of high-dimensional point clouds using their density distribution’s topology. IEEE Trans Vis Comput Graph 17(11):1547–1559

    Article  Google Scholar 

  25. Oesterling P, Heine C, Weber GH, Scheuermann G (2013) Visualizing nd point clouds as topological landscape profiles to guide local data analysis. IEEE Trans Vis Comput Graph 19(3):514–526

    Article  Google Scholar 

  26. Pascucci V, Mclaughlin KC, Scorzelli G (2005) Multi-resolution computation and presentation of contour trees, Lawrence Livermore National Laboratory. Technical report, in the proceedings of the IASTED conference on visualization, imaging, and image processing (VIIP)

    Google Scholar 

  27. Paulovich FV, Minghim R (2006) Text map explorer: a tool to create and explore document maps. In: 2013 17th international conference on information visualisation, pp 245–251

    Google Scholar 

  28. Paulovich FV, Oliveira MCF, Minghim R (2007) The projection explorer: a flexible tool for projection-based multidimensional visualization. In: Proceedings of the XX Brazilian symposium on computer graphics and image processing (SIBGRAPI ’07), Washington, DC. IEEE Computer Society, Los Alamitos, pp 27–36

    Google Scholar 

  29. Paulovich FV, Nonato LG, Minghim R, Levkowitz H (2008) Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans Vis Comput Graph 14:564–575

    Article  Google Scholar 

  30. Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY

    Google Scholar 

  31. Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409

    Article  Google Scholar 

  32. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47

    Article  Google Scholar 

  33. Šilić A, Bašić BD (2010) Visualization of text streams: a survey. In: Setchi R, Jordanov I, Howlett RJ, Jain LC (eds) Knowledge-based and intelligent information and engineering systems. Lecture notes in computer science, vol 6277. Springer, Berlin, pp 31–43

    Chapter  Google Scholar 

  34. Steinbach M, Ertöz L, Kumar V (2003) The challenges of clustering high-dimensional data. In: New vistas in statistical physics: applications in econophysics, bioinformatics, and pattern recognition

    Google Scholar 

  35. Teresniak S, Heyer G, Scheuermann G, Holz F (2009) Visualisierung von Bedeutungsverschiebungen in großen diachronen Dokumentkollektionen. Datenbank-Spektrum 31:33–39

    Google Scholar 

  36. Weber G, Bremer P-T, Pascucci V (2007) Topological landscapes: a terrain metaphor for scientific data. IEEE Trans Vis Comput Graph 13:1416–1423

    Article  Google Scholar 

  37. Wise JA, Thomas JJ, Pennock K, Lantrip D, Pottier M, Schur A, Crow V (1995) Visualizing the non-visual: spatial analysis and interaction with information from text documents. In: Gershon ND, Eick SG (eds) INFOVIS. IEEE Computer Society, Los Alamitos, pp 51–58

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Oesterling .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Oesterling, P., Heine, C., Weber, G.H., Scheuermann, G. (2014). A Topology-Based Approach to Visualize the Thematic Composition of Document Collections. In: Biemann, C., Mehler, A. (eds) Text Mining. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-12655-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12655-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12654-8

  • Online ISBN: 978-3-319-12655-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics