Skip to main content
Log in

Topic hypergraph: hierarchical visualization of thematic structures in long documents

  • Research Paper
  • Special Focus
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Thematic information of a long document (i.e., a novel) can be multi-faceted: an interleaving of multiple topics, a sequential evolution of a set of themes, or a crossing superimposition of topics and themes. Conventional topic-based visualization approaches are inefficient to capture this complicated thematic structure. This paper introduces a novel topic-based model, called the topic hypergraph, that characterizes the thematic structure of a long document with a hypergraph representation. Each hypergraph node represents a unique document piece, and encodes its theme as a composition of multiple topics. Two types of relationships among nodes are modeled: an edge that connects two consecutive themes to present their sequential transition, and a hyperedge that encodes a topic. The new representation is essentially a 2D reformulation of the linear streamgraph representation, and can be adaptive by constructing a multi-level hierarchy. We design a suite of visualization and interaction tools to allow users to interactively analyze the theme evolution, theme diversities, and topic interleaving. Our approach is also suitable for comparing multiple long documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Šilić A, Bašić B. Visualization of text streams: A survey. In: Proceedings of the 14th International Conference on Knowledge-based and Intelligent Information and Engineering Systems: Part II, Cardiff, 2010. 31–43

  2. Miller N, Wong P, Brewster M, et al. TOPIC ISLANDS-a wavelet-based text visualization system. In: Proceedings of IEEE Visualization, Los Alamitos, 1998. 189–196

  3. Blei D, Ng A, Jordan M. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993–1022

    MATH  Google Scholar 

  4. Blei D, Griffiths T, Jordan M, et al. Hierarchical topic models and the nested Chinese restaurant process. In: Proceeding of Advances in Neural Information Processing Systems 16, Vancouver, 2003. 17–24

  5. Havre S, Hetzler B, Nowell L. ThemeRiver: Visualizing theme changes over time. In: Proceeding of IEEE Symposium on Information Visualization, Boston, 2002. 115–123

  6. Wei F, Liu S, Song Y, et al. TIARA: A visual exploratory text analytic system. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, 2010. 153–162

  7. Fisher D, Hoff A, Robertson G, et al. Narratives: A visualization to track narrative events as they develop. In: IEEE Symposium on Visual Analytics Science and Technology, Columbus Ohio, 2008. 115–122

  8. Obstfeld R. Fiction First Aid: Instant Remedies for Novels, Stories, and Scripts. Writers Digest Books, 2001

    Google Scholar 

  9. Cao N, Sun J, Lin Y, et al. FacetAtlas: Multifaceted visualization for rich text corpora. IEEE Trans Vis Comput Graph, 2010, 15: 1172–1181

    Google Scholar 

  10. Lebanon G, Mao Y, Dillon J. The locally weighted bag of words framework for document representation. J Mach Learn Res, 2007, 8: 2405–2441

    MathSciNet  MATH  Google Scholar 

  11. Deerwester S. Improving information retrieval with latent semantic indexing. In: Proceedings of Annual Meeting of the American Society for Information Science. Information Today Inc., 1988

    Google Scholar 

  12. Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 1999. 50–57

  13. Keller M, Bengio S. Theme topic mixture model: A graphical model for document representation. In: Workshop on Learning Methods for Text Understanding and Mining, 2004

    Google Scholar 

  14. King S. On Writing: A Memoir of the Craft. Scribner, 2000

    Google Scholar 

  15. Segel E, Heer J. Narrative visualization: Telling stories with data. IEEE Trans Vis Comput Graph, 2010, 16: 1139–1148

    Article  Google Scholar 

  16. Hassan-Montero Y, Herrero-Solana V. Improving tag-clouds as visual information retrieval interfaces. In: International Conference on Multidisciplinary Information Sciences and Technologies, 2006

    Google Scholar 

  17. Lee B, Riche N H, Karlson A K, et al. SparkClouds: Visualizing trends in tag clouds. IEEE Trans Vis Comput Graph, 2010, 16, 1182–1189

    Article  Google Scholar 

  18. Karam G. Visualization using timelines. In: Proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis, New Orleans, 1994. 125–137

  19. Zhang J, Song Y, Zhang C, et al. Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, 2010. 1079–1088

  20. Ishikawa Y, Hasegawa M. T-Scroll: Visualizing trends in a time-series of documents for interactive user exploration. Res Adv Technol Digit Libr, 2007, 4675: 235–246

    Article  Google Scholar 

  21. Mao Y, Dillon J, Lebanon G. Sequential document visualization. IEEE Trans Vis Comput Graph, 2007, 13: 1208–1215

    Article  Google Scholar 

  22. Wattenberg M, Viégas F B. The word tree, an interactive visual concordance. IEEE Trans Vis Comput Graph, 2008, 14: 1221–1228

    Article  Google Scholar 

  23. Van Ham F, Wattenberg M, Viégas F B. Mapping text with phrase nets. IEEE Trans Vis Comput Graph, 2009, 15: 1169–1176

    Article  Google Scholar 

  24. Don A, Zheleva E, Gregory M, et al. Discovering interesting usage patterns in text collections: Integrating text mining with visualization. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management, Lisboa, 2007. 213–222

  25. Chen Y, Wang L, Dong M, et al. Exemplar-based visualization of large document corpus. IEEE Trans Vis Comput Graph, 2009, 15: 1169–1176

    Article  Google Scholar 

  26. Brontë C. Jane Eyre. Simith, Elder Co., 1847

    Google Scholar 

  27. Eades P, Feng Q. Multilevel visualization of clustered graphs. In: Proceedings of the Symposium on Graph Drawing, Berkeley, 1996. 101–112

  28. Voloshin V I. Introduction to Graph and Hypergraph Theory. Nova Science Publishers, 2009

    Google Scholar 

  29. Choi F. Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference. Stroudsburg: Association for Computational Linguistics, 2000. 26–33

    Google Scholar 

  30. Hearst M A. TextTiling: Segmenting text into multi-paragraph subtopic passages. Comput Linguist, 1997, 23: 33–64

    Google Scholar 

  31. Ponte J, Croft W. Text segmentation by topic. Res Adv Technol Digit Libr, 1997, 1324: 113–125

    Article  Google Scholar 

  32. Sammon J W. A nonlinear mapping for data structure analysis. IEEE Trans Comput, 1969, C-18: 401–409

    Article  Google Scholar 

  33. Viégas F B, Wattenberg M, Feinberg J. Participatory visualization with Wordle. IEEE Trans Vis Comput Graph, 2009, 15: 1137–1144

    Article  Google Scholar 

  34. Collins C, Penn G, Carpendale S. Bubble sets: Revealing set relations over existing visualizations. IEEE Trans Vis Comput Graph, 2009, 15: 1177–1185

    Article  Google Scholar 

  35. The Prefuse: An Information Visualization Toolkit. http://prefuse.org/

  36. Clinton H R. Living History. New York: Simon & Schuster, 2003

    Google Scholar 

  37. Clinton B. My Life. Random House Digital, Inc., 2005

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, G., Wen, C., Yan, B. et al. Topic hypergraph: hierarchical visualization of thematic structures in long documents. Sci. China Inf. Sci. 56, 1–14 (2013). https://doi.org/10.1007/s11432-013-4831-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-013-4831-8

Keywords

Navigation