Abstract
Thematic information of a long document (i.e., a novel) can be multi-faceted: an interleaving of multiple topics, a sequential evolution of a set of themes, or a crossing superimposition of topics and themes. Conventional topic-based visualization approaches are inefficient to capture this complicated thematic structure. This paper introduces a novel topic-based model, called the topic hypergraph, that characterizes the thematic structure of a long document with a hypergraph representation. Each hypergraph node represents a unique document piece, and encodes its theme as a composition of multiple topics. Two types of relationships among nodes are modeled: an edge that connects two consecutive themes to present their sequential transition, and a hyperedge that encodes a topic. The new representation is essentially a 2D reformulation of the linear streamgraph representation, and can be adaptive by constructing a multi-level hierarchy. We design a suite of visualization and interaction tools to allow users to interactively analyze the theme evolution, theme diversities, and topic interleaving. Our approach is also suitable for comparing multiple long documents.
Similar content being viewed by others
References
Šilić A, Bašić B. Visualization of text streams: A survey. In: Proceedings of the 14th International Conference on Knowledge-based and Intelligent Information and Engineering Systems: Part II, Cardiff, 2010. 31–43
Miller N, Wong P, Brewster M, et al. TOPIC ISLANDS-a wavelet-based text visualization system. In: Proceedings of IEEE Visualization, Los Alamitos, 1998. 189–196
Blei D, Ng A, Jordan M. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993–1022
Blei D, Griffiths T, Jordan M, et al. Hierarchical topic models and the nested Chinese restaurant process. In: Proceeding of Advances in Neural Information Processing Systems 16, Vancouver, 2003. 17–24
Havre S, Hetzler B, Nowell L. ThemeRiver: Visualizing theme changes over time. In: Proceeding of IEEE Symposium on Information Visualization, Boston, 2002. 115–123
Wei F, Liu S, Song Y, et al. TIARA: A visual exploratory text analytic system. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, 2010. 153–162
Fisher D, Hoff A, Robertson G, et al. Narratives: A visualization to track narrative events as they develop. In: IEEE Symposium on Visual Analytics Science and Technology, Columbus Ohio, 2008. 115–122
Obstfeld R. Fiction First Aid: Instant Remedies for Novels, Stories, and Scripts. Writers Digest Books, 2001
Cao N, Sun J, Lin Y, et al. FacetAtlas: Multifaceted visualization for rich text corpora. IEEE Trans Vis Comput Graph, 2010, 15: 1172–1181
Lebanon G, Mao Y, Dillon J. The locally weighted bag of words framework for document representation. J Mach Learn Res, 2007, 8: 2405–2441
Deerwester S. Improving information retrieval with latent semantic indexing. In: Proceedings of Annual Meeting of the American Society for Information Science. Information Today Inc., 1988
Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 1999. 50–57
Keller M, Bengio S. Theme topic mixture model: A graphical model for document representation. In: Workshop on Learning Methods for Text Understanding and Mining, 2004
King S. On Writing: A Memoir of the Craft. Scribner, 2000
Segel E, Heer J. Narrative visualization: Telling stories with data. IEEE Trans Vis Comput Graph, 2010, 16: 1139–1148
Hassan-Montero Y, Herrero-Solana V. Improving tag-clouds as visual information retrieval interfaces. In: International Conference on Multidisciplinary Information Sciences and Technologies, 2006
Lee B, Riche N H, Karlson A K, et al. SparkClouds: Visualizing trends in tag clouds. IEEE Trans Vis Comput Graph, 2010, 16, 1182–1189
Karam G. Visualization using timelines. In: Proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis, New Orleans, 1994. 125–137
Zhang J, Song Y, Zhang C, et al. Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, 2010. 1079–1088
Ishikawa Y, Hasegawa M. T-Scroll: Visualizing trends in a time-series of documents for interactive user exploration. Res Adv Technol Digit Libr, 2007, 4675: 235–246
Mao Y, Dillon J, Lebanon G. Sequential document visualization. IEEE Trans Vis Comput Graph, 2007, 13: 1208–1215
Wattenberg M, Viégas F B. The word tree, an interactive visual concordance. IEEE Trans Vis Comput Graph, 2008, 14: 1221–1228
Van Ham F, Wattenberg M, Viégas F B. Mapping text with phrase nets. IEEE Trans Vis Comput Graph, 2009, 15: 1169–1176
Don A, Zheleva E, Gregory M, et al. Discovering interesting usage patterns in text collections: Integrating text mining with visualization. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management, Lisboa, 2007. 213–222
Chen Y, Wang L, Dong M, et al. Exemplar-based visualization of large document corpus. IEEE Trans Vis Comput Graph, 2009, 15: 1169–1176
Brontë C. Jane Eyre. Simith, Elder Co., 1847
Eades P, Feng Q. Multilevel visualization of clustered graphs. In: Proceedings of the Symposium on Graph Drawing, Berkeley, 1996. 101–112
Voloshin V I. Introduction to Graph and Hypergraph Theory. Nova Science Publishers, 2009
Choi F. Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference. Stroudsburg: Association for Computational Linguistics, 2000. 26–33
Hearst M A. TextTiling: Segmenting text into multi-paragraph subtopic passages. Comput Linguist, 1997, 23: 33–64
Ponte J, Croft W. Text segmentation by topic. Res Adv Technol Digit Libr, 1997, 1324: 113–125
Sammon J W. A nonlinear mapping for data structure analysis. IEEE Trans Comput, 1969, C-18: 401–409
Viégas F B, Wattenberg M, Feinberg J. Participatory visualization with Wordle. IEEE Trans Vis Comput Graph, 2009, 15: 1137–1144
Collins C, Penn G, Carpendale S. Bubble sets: Revealing set relations over existing visualizations. IEEE Trans Vis Comput Graph, 2009, 15: 1177–1185
The Prefuse: An Information Visualization Toolkit. http://prefuse.org/
Clinton H R. Living History. New York: Simon & Schuster, 2003
Clinton B. My Life. Random House Digital, Inc., 2005
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, G., Wen, C., Yan, B. et al. Topic hypergraph: hierarchical visualization of thematic structures in long documents. Sci. China Inf. Sci. 56, 1–14 (2013). https://doi.org/10.1007/s11432-013-4831-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-013-4831-8