ABSTRACT
We are building a topic-based, interactive visual analytic tool that aids users in analyzing large collections of text. To help users quickly discover content evolution and significant content transitions within a topic over time, here we present a novel, constraint-based approach to temporal topic segmentation. Our solution splits a discovered topic into multiple linear, non-overlapping sub-topics along a timeline by satisfying a diverse set of semantic, temporal, and visualization constraints simultaneously. For each derived sub-topic, our solution also automatically selects a set of representative keywords to summarize the main content of the sub-topic. Our extensive evaluation, including a crowd-sourced user study, demonstrates the effectiveness of our method over an existing baseline.
- http://www.crowdflower.comGoogle Scholar
- Alonso, O., Gertz, M. and Baeza-Yates, R. 2009. Clustering and Exploring Search Results using Timeline Constructions. CIKM'09, 97--106. Google ScholarDigital Library
- Andrzejewski, D., Zhu, X., Craven, M., and Recht, B. 2011. A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using First-Order Logic. IJCAI'2011, 1171--1177. Google ScholarDigital Library
- Andrzejewski, D., Zhu, X., and Craven, M. 2009. Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors. ICML, 4. Google ScholarDigital Library
- Banerjee, S. and Rudnicky, A. 2006. A TextTiling Based Approach to Topic Boundary Detection in Meetings. In proceedings of the Interspeech. pp 57--60.Google Scholar
- Basu, S., Bilenko, M. and Mooney, R. 2004. A Probabilistic Framework for Semi-Supervised Clustering. SIGKDD'04, 59--68. Google ScholarDigital Library
- Blei, D., Ng, A. and Jordan, M. 2003. Latent Dirichlet Allocation. J. of Mach. Learn. Res., 3:993--1022. Google ScholarDigital Library
- Blei, D., Lafferty, J. 2006. Dynamic topic models. ICML'06, 113--120. Google ScholarDigital Library
- Basu, S., Banerjee, A., and Mooney, R. J. 2002. Semisupervised clustering by seeding. ICML'02, 27--34. Google ScholarDigital Library
- Brants, T., Chen, F. and Tsochantaridis, I., 2002 Topic-based document segmentation with probabilistic latent semantic analysis, CIKM' 02, 211--218. Google ScholarDigital Library
- Carenini, G., Ng, R. Pauls, A. 2008: Interactive multimedia summaries of evaluative text. IUI'08, 124--131. Google ScholarDigital Library
- Chi, Y., Song, X., Zhou, D., Hino, K. and Tseng, B. 2007. Evolutionary spectral clustering by incorporating temporal smoothness. SIGKDD'07, 153--162. Google ScholarDigital Library
- Chu, C.-S. J. 1995. Time Series Segmentation: A Sliding Window Approach. Information Sciences, 85 (1):147--173. Google ScholarDigital Library
- Chuang, J., Ramage, D., Manning, C., and Heer, J. 2012. Interpretation and trust: designing model-driven visualizations for text analysis. CHI'12, 443--452. Google ScholarDigital Library
- Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z., Qu, H., and Tong, X. Textflow: Towards better understanding of evolving topics in text. IEEE Trans. Vis. Comput. Graph. 17, 12 (2011), 2412--2421. Google ScholarDigital Library
- Dhillon, I., Mallela, S. and Modha, D. 2003. Information Theoretic Co-Clustering. SIGKDD'03, 89--98. Google ScholarDigital Library
- Dredze, M., Wallach, H., Puller, D., and Pereira, F. 2008. Generating Summary Keywords for Emails Using Topics. IUI'09, 199--206. Google ScholarDigital Library
- Galley, M., McKeown, K., Fosler-Lussier, E. and Jing., H. 2003. Discourse Segmentation of Multi-party Conversation. ACL'03, 562--569. Google ScholarDigital Library
- Hearst, M. 1994. Multi-paragraph segmentation of expository text. ACL'94, 9--16. Google ScholarDigital Library
- Hearst, M. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33--64. Google ScholarDigital Library
- Jeong, M. and Titov, I. 2010 Multi-Document Topic Segmentation. CIKM'2010, 1119--1128. Google ScholarDigital Library
- Liu, S., Zhou, M., Pan, S., Qian, W., Cai, W., Lian, X. 2009. Interactive Topic-based Visual Text Summarization and Analysis, CIKM'09, 543--552. Google ScholarDigital Library
- Misra, H., Yvon, F., Jose, J. and Cappe, O. 2009. Text segmentation via topic modeling: an analytical study. CIKM '09, 1553--1556. Google ScholarDigital Library
- Ramage, D., Manning, C. and Dumais, S. 2011. Partially Labeled Topic Models for Interpretable Text Mining. SIGKDD'11, 457--465. Google ScholarDigital Library
- Sanampudi, S. and Kumari, G. Temporal reasoning in natural language processing: a survey. Intl. J. of Comp. Apps. 1(4): 53--57, 2010.Google Scholar
- Schrier, E., Dontcheva, M., Jacobs, C., Wade, G. and Salesin, D. IUI '08, Adaptive layout for dynamically aggregated documents. 99--108. Google ScholarDigital Library
- Song, Y., Pan, S., Liu, S., Wei, F., Zhou, M. and Qian, W. 2010. Constrained co-clustering for textual documents. AAAI'2010, 581--586.Google Scholar
- Sun, B., Mitra, P., Giles, C., Yen, J. and Zha, H., 2007. Topic segmentation with shared topic detection and alignment of multiple documents. SIGIR'07, 199--206. Google ScholarDigital Library
- Tür, G., Stolcke, A., Hakkani-Tür, D. and Shriberg, E. 2001. Integrating prosodic and lexical cues for automatic topic segmentation, Computational Linguistics, 27(1), 31--57. Google ScholarDigital Library
- Wang, F., Tong, H. and Lin, C. 2011. Towards Evolutionary Nonnegative Matrix Factorization. AAAI'11, 501--566.Google Scholar
- Wang, C., Blei, D. and Heckerman, D. 2008. Continuous Time Dynamic Topic Models. Proc. on Uncertainty in AI, 579--586.Google Scholar
- Wang, F., Li, T. and Zhang, C. 2008. Semi-Supervised Clustering via Matrix Factorization. SIAM'08, 1--12.Google Scholar
- Wang, X. and McCallum, A. 2006. Topics over time: a Non-Markov Continuous-Time Model of Topical Trends. SIGKDD'06, 424--433. Google ScholarDigital Library
Index Terms
- Optimizing temporal topic segmentation for intelligent text visualization
Recommendations
Text visualization service for creating comprehended texts
KES'11: Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part IIIAs the growth of the Internet, individuals can transmit text information easily. Though images or movies are also used as mediums, those are hard to be created rather than to create texts information. Since texts on the Web are not always written by ...
Interactive Topic Modeling for Exploring Asynchronous Online Conversations: Design and Evaluation of ConVisIT
Special Issue on New Directions in Eye Gaze for Interactive Intelligent Systems (Part 2 of 2), Regular Articles and Special Issue on Highlights of IUI 2015 (Part 1 of 2)Since the mid-2000s, there has been exponential growth of asynchronous online conversations, thanks to the rise of social media. Analyzing and gaining insights from such conversations can be quite challenging for a user, especially when the discussion ...
Text segmentation via topic modeling: an analytical study
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementIn this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of latent Dirichlet allocation (LDA) topic model to segment a text into semantically coherent segments. A major benefit of the proposed ...
Comments