poster

A Benchmark for the Use of Topic Models for Text Visualization Tasks

Authors:

Daniel Atzberger,

Willy Scheibel,

Daniel Limberger,

Matthias Trapp,

Jürgen DöllnerAuthors Info & Claims

VINCI '22: Proceedings of the 15th International Symposium on Visual Information Communication and Interaction

Article No.: 17, Pages 1 - 4

https://doi.org/10.1145/3554944.3554961

Published: 31 October 2022 Publication History

Abstract

Based on the assumption that semantic relatedness between documents is reflected in the distribution of the vocabulary, topic models are a widely used class of techniques for text analysis tasks. The application of topic models results in concepts, the so-called topics, and a high-dimensional description of the documents. For visualization tasks, they can be projected onto a lower-dimensional space using dimensionality reduction techniques. Though the quality of the resulting point layout mainly depends on the chosen topic model and dimensionality reduction technique, it is unclear which particular combinations are suitable for displaying the semantic relatedness between the documents. In this work, we propose a benchmark comprising various datasets, layout algorithms and their hyperparameters, and quality metrics for conducting an empirical study.

References

[1]

Daniel Atzberger, Tim Cech, Merlin de la Haye, Maximilian Söchting, Willy Scheibel, Daniel Limberger, and Jürgen Döllner. 2021. Software Forest: A Visualization of Semantic Similarities in Source Code using a Tree Metaphor. In Proc. 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – Volume 3(IVAPP ’21). INSTICC, SciTePress, 112–122. https://doi.org/10.5220/0010267601120122

[2]

Daniel Atzberger, Tim Cech, Adrian Jobst, Willy Scheibel, Daniel Limberger, Matthias Trapp, and Jürgen Döllner. 2022. Visualization of Knowledge Distribution across Development Teams using 2.5D Semantic Software Maps. In Proc. 13th International Conference on Information Visualization Theory and Applications(IVAPP ’22). INSTICC, SciTePress, 210–217. https://doi.org/10.5220/0010991100003124

[3]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003), 993–1022. https://doi.org/10.5555/944919.944937

Digital Library

[4]

Philippe Caillou, Jonas Renault, Jean-Daniel Fekete, Anne-Catherine Letournel, and Michèle Sebag. 2021. Cartolabe: A Web-Based Scalable Visualization of Large Document Collections. Computer Graphics and Applications 41, 2 (2021), 76–88. https://doi.org/10.1109/MCG.2020.3033401

[5]

Michael A. A. Cox and Trevor F. Cox. 2008. Multidimensional Scaling. In Handbook of Data Visualization. Springer, 315–347. https://doi.org/10.1007/978-3-540-33037-0_14

[6]

Scott Deerwester, Susan T Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41, 6(1990), 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9

[7]

Mateus Espadoto, Rafael M. Martins, Andreas Kerren, Nina S. T. Hirata, and Alexandru C. Telea. 2021. Toward a Quantitative Survey of Dimension Reduction Techniques. Transactions on Visualization and Computer Graphics 27, 3 (2021), 2153–2173. https://doi.org/10.1109/TVCG.2019.2944182

[8]

Emden R. Gansner, Yifan Hu, and Stephen C. North. 2013. Interactive Visualization of Streaming Text Data with Dynamic Maps. Journal of Graph Algorithms and Applications 17, 4(2013), 515–540. https://doi.org/10.7155/jgaa.00302

[9]

T. Kohonen. 1997. Exploration of Very Large Databases by Self-organizing Maps. In Proc. International Conference on Neural Networks(ICNN ’97). IEEE, 1–6. https://doi.org/10.1109/ICNN.1997.611622

[10]

Kostiantyn Kucher, Rafael M. Martins, and Andreas Kerren. 2018. Analysis of VINCI 2009–2017 Proceedings. In Proc. 11th International Symposium on Visual Information Communication and Interaction(VINCI ’18). ACM, 97–101. https://doi.org/10.1145/3231622.3231641

Digital Library

[11]

Adrian Kuhn, David Erni, Peter Loretan, and Oscar Nierstrasz. 2010. Software Cartography: Thematic Software Visualization with Consistent Layout. Journal of Software Maintenance and Evolution: Research and Practice 22, 3(2010), 191–210. https://doi.org/10.1002/smr.414

[12]

Tuan M. V. Le and Hady W. Lauw. 2014. Semantic Visualization for Spherical Representation. In Proc. 20th SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’14). ACM, 1007–1016. https://doi.org/10.1145/2623330.2623620

Digital Library

[13]

Leland McInnes, John Healy, and James Melville. 2020. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv CoRR stat.ML, 1802.03426 (2020), 63 pages. https://doi.org/10.48550/arXiv.1802.03426 pre-print.

[14]

Rosane Minghim, Fernando Vieira Paulovich, and Alneu de Andrade Lopes. 2006. Content-based text mapping using multi-dimensional projections for exploration of document collections. In Visualization and Data Analysis 2006, Vol. 6060. SPIE, 259–270.

[15]

F.V. Paulovich and R. Minghim. 2006. Text Map Explorer: a Tool to Create and Explore Document Maps. In Tenth International Conference on Information Visualisation (IV’06). 245–251. https://doi.org/10.1109/IV.2006.104

Digital Library

[16]

Carson Sievert and Kenneth Shirley. 2014. LDAvis: A Method for Visualizing and Interpreting Topics. In Proc. Workshop on Interactive Language Learning, Visualization, and Interfaces. ACL, 63–70. https://doi.org/10.3115/v1/W14-3110

[17]

A. Skupin. 2004. The World of Geography: Visualizing a Knowledge Domain with Cartographic Means. Proc. National Academy of Sciences 101, suppl 1 (2004), 5274–5278. https://doi.org/10.1073/pnas.0307654100

[18]

Yee Teh and Sam Roweis. 2002. Automatic alignment of local representations. Advances in neural information processing systems 15 (2002).

[19]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 27 pages.

[20]

James A Wise, James J. Thomas, Kelly Pennock, David Lantrip, Marc Pottier, Anne Schur, and Vern Crow. 1995. Visualizing the Non-visual: Spatial Analysis and Interaction with Information from Text Documents. In Proc. Visualization 1995 Conference. IEEE, 51–58. https://doi.org/10.1109/INFVIS.1995.528686

Cited By

Atzberger DCech TTrapp MRichter RScheibel WDöllner JSchreck T(2023)Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text SpatializationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332656930:1(902-912)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1109/TVCG.2023.3326569

Index Terms

A Benchmark for the Use of Topic Models for Text Visualization Tasks
1. Human-centered computing
  1. Visualization
    1. Visualization application domains
      1. Information visualization
    2. Visualization techniques
      1. Treemaps

Recommendations

Optimizing temporal topic segmentation for intelligent text visualization
IUI '13: Proceedings of the 2013 international conference on Intelligent user interfaces

We are building a topic-based, interactive visual analytic tool that aids users in analyzing large collections of text. To help users quickly discover content evolution and significant content transitions within a topic over time, here we present a ...
Extractive text summarization using clustering-based topic modeling
Abstract
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Text summarization using topic-based vector space model and semantic measure
Abstract
The primary shortcoming associated with extractive text summarization is redundancy, where more than one sentence representing a similar type of information are incorporated in summary. In the last two decades, a lot of extractive text ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

VINCI '22: Proceedings of the 15th International Symposium on Visual Information Communication and Interaction

August 2022

136 pages

ISBN:9781450398060

DOI:10.1145/3554944

Copyright © 2022 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2022

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Conference

VINCI'22

VINCI'22: 15th International Symposium on Visual Information Communication and Interaction

August 16 - 18, 2022

Chur, Switzerland

Acceptance Rates

Overall Acceptance Rate 71 of 193 submissions, 37%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
94
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Atzberger DCech TTrapp MRichter RScheibel WDöllner JSchreck T(2023)Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text SpatializationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332656930:1(902-912)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1109/TVCG.2023.3326569

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten