skip to main content
10.1145/2449396.2449441acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Optimizing temporal topic segmentation for intelligent text visualization

Published:19 March 2013Publication History

ABSTRACT

We are building a topic-based, interactive visual analytic tool that aids users in analyzing large collections of text. To help users quickly discover content evolution and significant content transitions within a topic over time, here we present a novel, constraint-based approach to temporal topic segmentation. Our solution splits a discovered topic into multiple linear, non-overlapping sub-topics along a timeline by satisfying a diverse set of semantic, temporal, and visualization constraints simultaneously. For each derived sub-topic, our solution also automatically selects a set of representative keywords to summarize the main content of the sub-topic. Our extensive evaluation, including a crowd-sourced user study, demonstrates the effectiveness of our method over an existing baseline.

References

  1. http://www.crowdflower.comGoogle ScholarGoogle Scholar
  2. Alonso, O., Gertz, M. and Baeza-Yates, R. 2009. Clustering and Exploring Search Results using Timeline Constructions. CIKM'09, 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andrzejewski, D., Zhu, X., Craven, M., and Recht, B. 2011. A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using First-Order Logic. IJCAI'2011, 1171--1177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrzejewski, D., Zhu, X., and Craven, M. 2009. Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors. ICML, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Banerjee, S. and Rudnicky, A. 2006. A TextTiling Based Approach to Topic Boundary Detection in Meetings. In proceedings of the Interspeech. pp 57--60.Google ScholarGoogle Scholar
  6. Basu, S., Bilenko, M. and Mooney, R. 2004. A Probabilistic Framework for Semi-Supervised Clustering. SIGKDD'04, 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Blei, D., Ng, A. and Jordan, M. 2003. Latent Dirichlet Allocation. J. of Mach. Learn. Res., 3:993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Blei, D., Lafferty, J. 2006. Dynamic topic models. ICML'06, 113--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Basu, S., Banerjee, A., and Mooney, R. J. 2002. Semisupervised clustering by seeding. ICML'02, 27--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brants, T., Chen, F. and Tsochantaridis, I., 2002 Topic-based document segmentation with probabilistic latent semantic analysis, CIKM' 02, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Carenini, G., Ng, R. Pauls, A. 2008: Interactive multimedia summaries of evaluative text. IUI'08, 124--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chi, Y., Song, X., Zhou, D., Hino, K. and Tseng, B. 2007. Evolutionary spectral clustering by incorporating temporal smoothness. SIGKDD'07, 153--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chu, C.-S. J. 1995. Time Series Segmentation: A Sliding Window Approach. Information Sciences, 85 (1):147--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chuang, J., Ramage, D., Manning, C., and Heer, J. 2012. Interpretation and trust: designing model-driven visualizations for text analysis. CHI'12, 443--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z., Qu, H., and Tong, X. Textflow: Towards better understanding of evolving topics in text. IEEE Trans. Vis. Comput. Graph. 17, 12 (2011), 2412--2421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dhillon, I., Mallela, S. and Modha, D. 2003. Information Theoretic Co-Clustering. SIGKDD'03, 89--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dredze, M., Wallach, H., Puller, D., and Pereira, F. 2008. Generating Summary Keywords for Emails Using Topics. IUI'09, 199--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Galley, M., McKeown, K., Fosler-Lussier, E. and Jing., H. 2003. Discourse Segmentation of Multi-party Conversation. ACL'03, 562--569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hearst, M. 1994. Multi-paragraph segmentation of expository text. ACL'94, 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hearst, M. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jeong, M. and Titov, I. 2010 Multi-Document Topic Segmentation. CIKM'2010, 1119--1128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liu, S., Zhou, M., Pan, S., Qian, W., Cai, W., Lian, X. 2009. Interactive Topic-based Visual Text Summarization and Analysis, CIKM'09, 543--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Misra, H., Yvon, F., Jose, J. and Cappe, O. 2009. Text segmentation via topic modeling: an analytical study. CIKM '09, 1553--1556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ramage, D., Manning, C. and Dumais, S. 2011. Partially Labeled Topic Models for Interpretable Text Mining. SIGKDD'11, 457--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sanampudi, S. and Kumari, G. Temporal reasoning in natural language processing: a survey. Intl. J. of Comp. Apps. 1(4): 53--57, 2010.Google ScholarGoogle Scholar
  26. Schrier, E., Dontcheva, M., Jacobs, C., Wade, G. and Salesin, D. IUI '08, Adaptive layout for dynamically aggregated documents. 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Song, Y., Pan, S., Liu, S., Wei, F., Zhou, M. and Qian, W. 2010. Constrained co-clustering for textual documents. AAAI'2010, 581--586.Google ScholarGoogle Scholar
  28. Sun, B., Mitra, P., Giles, C., Yen, J. and Zha, H., 2007. Topic segmentation with shared topic detection and alignment of multiple documents. SIGIR'07, 199--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tür, G., Stolcke, A., Hakkani-Tür, D. and Shriberg, E. 2001. Integrating prosodic and lexical cues for automatic topic segmentation, Computational Linguistics, 27(1), 31--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wang, F., Tong, H. and Lin, C. 2011. Towards Evolutionary Nonnegative Matrix Factorization. AAAI'11, 501--566.Google ScholarGoogle Scholar
  31. Wang, C., Blei, D. and Heckerman, D. 2008. Continuous Time Dynamic Topic Models. Proc. on Uncertainty in AI, 579--586.Google ScholarGoogle Scholar
  32. Wang, F., Li, T. and Zhang, C. 2008. Semi-Supervised Clustering via Matrix Factorization. SIAM'08, 1--12.Google ScholarGoogle Scholar
  33. Wang, X. and McCallum, A. 2006. Topics over time: a Non-Markov Continuous-Time Model of Topical Trends. SIGKDD'06, 424--433. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing temporal topic segmentation for intelligent text visualization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      IUI '13: Proceedings of the 2013 international conference on Intelligent user interfaces
      March 2013
      470 pages
      ISBN:9781450319652
      DOI:10.1145/2449396

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 March 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      IUI '13 Paper Acceptance Rate43of192submissions,22%Overall Acceptance Rate746of2,811submissions,27%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader