skip to main content
10.1145/3554944.3554961acmotherconferencesArticle/Chapter ViewAbstractPublication PagesvinciConference Proceedingsconference-collections
poster

A Benchmark for the Use of Topic Models for Text Visualization Tasks

Published: 31 October 2022 Publication History

Abstract

Based on the assumption that semantic relatedness between documents is reflected in the distribution of the vocabulary, topic models are a widely used class of techniques for text analysis tasks. The application of topic models results in concepts, the so-called topics, and a high-dimensional description of the documents. For visualization tasks, they can be projected onto a lower-dimensional space using dimensionality reduction techniques. Though the quality of the resulting point layout mainly depends on the chosen topic model and dimensionality reduction technique, it is unclear which particular combinations are suitable for displaying the semantic relatedness between the documents. In this work, we propose a benchmark comprising various datasets, layout algorithms and their hyperparameters, and quality metrics for conducting an empirical study.

References

[1]
Daniel Atzberger, Tim Cech, Merlin de la Haye, Maximilian Söchting, Willy Scheibel, Daniel Limberger, and Jürgen Döllner. 2021. Software Forest: A Visualization of Semantic Similarities in Source Code using a Tree Metaphor. In Proc. 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – Volume 3(IVAPP ’21). INSTICC, SciTePress, 112–122. https://doi.org/10.5220/0010267601120122
[2]
Daniel Atzberger, Tim Cech, Adrian Jobst, Willy Scheibel, Daniel Limberger, Matthias Trapp, and Jürgen Döllner. 2022. Visualization of Knowledge Distribution across Development Teams using 2.5D Semantic Software Maps. In Proc. 13th International Conference on Information Visualization Theory and Applications(IVAPP ’22). INSTICC, SciTePress, 210–217. https://doi.org/10.5220/0010991100003124
[3]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003), 993–1022. https://doi.org/10.5555/944919.944937
[4]
Philippe Caillou, Jonas Renault, Jean-Daniel Fekete, Anne-Catherine Letournel, and Michèle Sebag. 2021. Cartolabe: A Web-Based Scalable Visualization of Large Document Collections. Computer Graphics and Applications 41, 2 (2021), 76–88. https://doi.org/10.1109/MCG.2020.3033401
[5]
Michael A. A. Cox and Trevor F. Cox. 2008. Multidimensional Scaling. In Handbook of Data Visualization. Springer, 315–347. https://doi.org/10.1007/978-3-540-33037-0_14
[6]
Scott Deerwester, Susan T Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41, 6(1990), 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9
[7]
Mateus Espadoto, Rafael M. Martins, Andreas Kerren, Nina S. T. Hirata, and Alexandru C. Telea. 2021. Toward a Quantitative Survey of Dimension Reduction Techniques. Transactions on Visualization and Computer Graphics 27, 3 (2021), 2153–2173. https://doi.org/10.1109/TVCG.2019.2944182
[8]
Emden R. Gansner, Yifan Hu, and Stephen C. North. 2013. Interactive Visualization of Streaming Text Data with Dynamic Maps. Journal of Graph Algorithms and Applications 17, 4(2013), 515–540. https://doi.org/10.7155/jgaa.00302
[9]
T. Kohonen. 1997. Exploration of Very Large Databases by Self-organizing Maps. In Proc. International Conference on Neural Networks(ICNN ’97). IEEE, 1–6. https://doi.org/10.1109/ICNN.1997.611622
[10]
Kostiantyn Kucher, Rafael M. Martins, and Andreas Kerren. 2018. Analysis of VINCI 2009–2017 Proceedings. In Proc. 11th International Symposium on Visual Information Communication and Interaction(VINCI ’18). ACM, 97–101. https://doi.org/10.1145/3231622.3231641
[11]
Adrian Kuhn, David Erni, Peter Loretan, and Oscar Nierstrasz. 2010. Software Cartography: Thematic Software Visualization with Consistent Layout. Journal of Software Maintenance and Evolution: Research and Practice 22, 3(2010), 191–210. https://doi.org/10.1002/smr.414
[12]
Tuan M. V. Le and Hady W. Lauw. 2014. Semantic Visualization for Spherical Representation. In Proc. 20th SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’14). ACM, 1007–1016. https://doi.org/10.1145/2623330.2623620
[13]
Leland McInnes, John Healy, and James Melville. 2020. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv CoRR stat.ML, 1802.03426 (2020), 63 pages. https://doi.org/10.48550/arXiv.1802.03426 pre-print.
[14]
Rosane Minghim, Fernando Vieira Paulovich, and Alneu de Andrade Lopes. 2006. Content-based text mapping using multi-dimensional projections for exploration of document collections. In Visualization and Data Analysis 2006, Vol. 6060. SPIE, 259–270.
[15]
F.V. Paulovich and R. Minghim. 2006. Text Map Explorer: a Tool to Create and Explore Document Maps. In Tenth International Conference on Information Visualisation (IV’06). 245–251. https://doi.org/10.1109/IV.2006.104
[16]
Carson Sievert and Kenneth Shirley. 2014. LDAvis: A Method for Visualizing and Interpreting Topics. In Proc. Workshop on Interactive Language Learning, Visualization, and Interfaces. ACL, 63–70. https://doi.org/10.3115/v1/W14-3110
[17]
A. Skupin. 2004. The World of Geography: Visualizing a Knowledge Domain with Cartographic Means. Proc. National Academy of Sciences 101, suppl 1 (2004), 5274–5278. https://doi.org/10.1073/pnas.0307654100
[18]
Yee Teh and Sam Roweis. 2002. Automatic alignment of local representations. Advances in neural information processing systems 15 (2002).
[19]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 27 pages.
[20]
James A Wise, James J. Thomas, Kelly Pennock, David Lantrip, Marc Pottier, Anne Schur, and Vern Crow. 1995. Visualizing the Non-visual: Spatial Analysis and Interaction with Information from Text Documents. In Proc. Visualization 1995 Conference. IEEE, 51–58. https://doi.org/10.1109/INFVIS.1995.528686

Cited By

View all
  • (2023)Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text SpatializationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332656930:1(902-912)Online publication date: 23-Oct-2023

Index Terms

  1. A Benchmark for the Use of Topic Models for Text Visualization Tasks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      VINCI '22: Proceedings of the 15th International Symposium on Visual Information Communication and Interaction
      August 2022
      136 pages
      ISBN:9781450398060
      DOI:10.1145/3554944
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 October 2022

      Check for updates

      Author Tags

      1. Dimensionality Reduction Techniques
      2. Text Visualization
      3. Topic Modeling

      Qualifiers

      • Poster
      • Research
      • Refereed limited

      Conference

      VINCI'22

      Acceptance Rates

      Overall Acceptance Rate 71 of 193 submissions, 37%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)20
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text SpatializationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332656930:1(902-912)Online publication date: 23-Oct-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media