skip to main content
10.1145/1081870.1081895acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Published: 21 August 2005 Publication History

Abstract

Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text information collected over time. Since most text information bears some time stamps, TTM has many applications in multiple domains, such as summarizing events in news articles and revealing research trends in scientific literature. In this paper, we study a particular TTM task -- discovering and summarizing the evolutionary patterns of themes in a text stream. We define this new text mining problem and present general probabilistic methods for solving this problem through (1) discovering latent themes from text; (2) constructing an evolution graph of themes; and (3) analyzing life cycles of themes. Evaluation of the proposed methods on two different domains (i.e., news articles and literature) shows that the proposed methods can discover interesting evolutionary theme patterns effectively.

References

[1]
J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of news topics. In Proceedings of ACM SIGIR 2001, pages 10--18, 2001.]]
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.]]
[3]
S. Boykin and A. Merlino. Machine learning of event segmentation for news on demand. Commun. ACM, 43(2):35--41, 2000.]]
[4]
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 1991.]]
[5]
W. B. Croft and J. Lafferty, editors. Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003.]]
[6]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1--38, 1977.]]
[7]
R. Feldman and I. Dagan. Knowledge discovery in textual databases (kdt). In KDD, pages 112--117, 1995.]]
[8]
M. A. Hearst. Untangling text data mining. In Proceedings of the 37th conference on Association for Computational Linguistics (ACL 1999), pages 3--10, 1999.]]
[9]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50--57, 1999.]]
[10]
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 91--101, 2002.]]
[11]
A. Kontostathis, L. Galitsky, W. M. Pottenger, S. Roy, and D. J. Phelps. A survey of emerging trend detection in textual data mining. Survey of Text Mining, pages 185--224, 2003.]]
[12]
R. Kumar, U. Mahadevan, and D. Sivakumar. A graph-theoretic approach to extract storylines from search results. In Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 216--225, 2004.]]
[13]
J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 613--618, 2003.]]
[14]
S. Morinaga and K. Yamanishi. Tracking dynamics of topic trends using a finite mixture model. In Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 811--816, 2004.]]
[15]
R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In Proceedings of the Thirteenth ACM conference on Information and knowledge management, pages 446--453, 2004.]]
[16]
J. Perkio, W. Buntine, and S. Perttu. Exploring independent trends in a topic-based search engine. In Proceedings of the Web Intelligence, IEEE/WIC/ACM International Conference on (WI'04), pages 664--668, 2004.]]
[17]
L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. of the IEEE, 77(2):257--285, Feb. 1989.]]
[18]
K. Rajaraman and A.-H. Tan. Topic detection, tracking, and trend analysis using self-organizing neural networks. In PAKDD, pages 102--107, 2001.]]
[19]
S. Roy, D. Gevry, and W. M. Pottenger. Methodologies for trend detection in textual data mining. In the Textmine '02 Workshop, Second SIAM International Conference on Data Mining, 2002.]]
[20]
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 743--748, 2004.]]

Cited By

View all
  • (2025)Sparse dynamic topic model with topic birth and death over timeKnowledge and Information Systems10.1007/s10115-025-02368-8Online publication date: 23-Feb-2025
  • (2024)SemConvTree: Semantic Convolutional Quadtrees for Multi-Scale Event Detection in Smart CitySmart Cities10.3390/smartcities70501077:5(2763-2780)Online publication date: 28-Sep-2024
  • (2024)Recent Trends in Landscape Sustainability Research—A Bibliometric AssessmentLand10.3390/land1306081113:6(811)Online publication date: 6-Jun-2024
  • Show More Cited By

Index Terms

  1. Discovering evolutionary theme patterns from text: an exploration of temporal text mining

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
      August 2005
      844 pages
      ISBN:159593135X
      DOI:10.1145/1081870
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 August 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustering
      2. evolutionary theme patterns
      3. temporal text mining
      4. theme threads

      Qualifiers

      • Article

      Conference

      KDD05

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)66
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Sparse dynamic topic model with topic birth and death over timeKnowledge and Information Systems10.1007/s10115-025-02368-8Online publication date: 23-Feb-2025
      • (2024)SemConvTree: Semantic Convolutional Quadtrees for Multi-Scale Event Detection in Smart CitySmart Cities10.3390/smartcities70501077:5(2763-2780)Online publication date: 28-Sep-2024
      • (2024)Recent Trends in Landscape Sustainability Research—A Bibliometric AssessmentLand10.3390/land1306081113:6(811)Online publication date: 6-Jun-2024
      • (2024)Visualizing Temporal Topic Embeddings with a CompassIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345614331:1(272-282)Online publication date: 10-Sep-2024
      • (2024)Quantitative text analysisNature Reviews Methods Primers10.1038/s43586-024-00302-w4:1Online publication date: 11-Apr-2024
      • (2023)Collective Adaptive Responses Through Coping and Sensemaking Under StressSage Open10.1177/2158244023120541813:4Online publication date: 31-Oct-2023
      • (2023)Collaboration of issuing agencies and topic evolution of health informatisation policies in ChinaJournal of Information Science10.1177/0165551522107432349:6(1692-1710)Online publication date: 1-Dec-2023
      • (2023)A Survey on Event-Based News Narrative ExtractionACM Computing Surveys10.1145/358474155:14s(1-39)Online publication date: 17-Jul-2023
      • (2023)Tracking the Evolution of Clusters in Social Media StreamsIEEE Transactions on Big Data10.1109/TBDATA.2022.32042079:2(701-715)Online publication date: 1-Apr-2023
      • (2023)An integrated latent Dirichlet allocation and Word2vec method for generating the topic evolution of mental models from global to localExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118695212:COnline publication date: 1-Feb-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media