Article

Topics over time: a non-Markov continuous-time model of topical trends

Authors:
Xuerui Wang

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
Andrew McCallum

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2006Pages 424–433https://doi.org/10.1145/1150402.1150450

Published:20 August 2006Publication History

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 424–433

ABSTRACT

This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.

References

C. Andrieu, N. de Freitas, A. Doucet, and M. Jordan. An introduction to MCMC for machine learning. Machine Learning, 50:5--43, 2003.Google ScholarCross Ref
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning, 2006. Google ScholarDigital Library
E. Erosheva, S. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. Proceedings of the National Academy of Sciences, 101(Suppl. 1), 2004.Google ScholarCross Ref
T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl. 1):5228--5235, 2004.Google ScholarCross Ref
T. Griffiths, M. Steyvers, D. Blei, and J. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems (NIPS) 17, 2004.Google Scholar
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. Google ScholarDigital Library
P. Kumaraswamy. A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46:79--88, 1980.Google ScholarCross Ref
R. E. Madsen, D. Kauchak, and C. Elkan. Modeling word burstiness using the Dirichlet distribution. In Proceedings of the 22nd International Conference on Machine Learning, 2005. Google ScholarDigital Library
A. McCallum, A. Corrada-Emanuel, and X. Wang. Topic and role discovery in social networks. In Proceedings of 19th International Joint Conference on Artificial Intelligence, 2005. Google ScholarDigital Library
U. Nodelman, C. Shelton, and D. Koller. Continuous time Bayesian networks. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pages 378--387, 2002. Google ScholarDigital Library
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2004. Google ScholarDigital Library
P. Sarkar and A. Moore. Dynamic social network analysis using latent space models. In The 19th Annual Conference on Neural Information Processing Systems, 2005.Google ScholarDigital Library
X. Song, C.-Y. Lin, B. L. Tseng, and M.-T. Sun. Modeling and predicting personal information dissemination behavior. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
R. Swan and D. Jensen. Timemines: Constructing timelines with statistical models of word usage. In The 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Text Mining, pages 73--80, 2000.Google Scholar
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Technical report, UC Berkeley Statistics TR-653, 2004.Google Scholar
X. Wang and A. McCallum. A note on topical n-grams. Technical report, UMass UM-CS-2005-071, 2005.Google Scholar
X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. In The 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Link Discovery: Issues, Approachesand Applications, pages 28--35, 2005. Google ScholarDigital Library

Index Terms

Topics over time: a non-Markov continuous-time model of topical trends
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Text, Topics, and Turkers: A Consensus Measure for Statistical Topics
HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social Media

Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of ...
Read More
Group topic model: organizing topics into groups
Abstract
Latent Dirichlet allocation defines hidden topics to capture latent semantics in text documents. However, it assumes that all the documents are represented by the same topics, resulting in the “forced topic” problem. To solve this problem, we ...
Read More
Trend analysis model: trend consists of temporal words, topics, and timestamps
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

This paper presents a topic model that identifies interpretable low dimensional components in time-stamped data for capturing the evolution of trends. Unlike other models for time-stamped data, our proposal, the trend analysis model (TAM), focuses on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Conference Chair:
Tina Eliassi-Rad
LLNL
,
General Chair:
Lyle Ungar
University of Pennsylvania
,
Program Chairs:
Mark Craven
University of Wisconsin
,
Dimitrios Gunopulos
University of California, Riverside
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graphical models
temporal analysis
topic modeling
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 807
  Total Citations
  View Citations
- 5,385
  Total Downloads
- Downloads (Last 12 months)176
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Topics over time: a non-Markov continuous-time model of topical trends

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Text, Topics, and Turkers: A Consensus Measure for Statistical Topics

Group topic model: organizing topics into groups

Trend analysis model: trend consists of temporal words, topics, and timestamps

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Topics over time: a non-Markov continuous-time model of topical trends

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Text, Topics, and Turkers: A Consensus Measure for Statistical Topics

Group topic model: organizing topics into groups

Trend analysis model: trend consists of temporal words, topics, and timestamps

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media