ABSTRACT
This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.
- C. Andrieu, N. de Freitas, A. Doucet, and M. Jordan. An introduction to MCMC for machine learning. Machine Learning, 50:5--43, 2003.Google ScholarCross Ref
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning, 2006. Google ScholarDigital Library
- E. Erosheva, S. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. Proceedings of the National Academy of Sciences, 101(Suppl. 1), 2004.Google ScholarCross Ref
- T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl. 1):5228--5235, 2004.Google ScholarCross Ref
- T. Griffiths, M. Steyvers, D. Blei, and J. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems (NIPS) 17, 2004.Google Scholar
- J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. Google ScholarDigital Library
- P. Kumaraswamy. A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46:79--88, 1980.Google ScholarCross Ref
- R. E. Madsen, D. Kauchak, and C. Elkan. Modeling word burstiness using the Dirichlet distribution. In Proceedings of the 22nd International Conference on Machine Learning, 2005. Google ScholarDigital Library
- A. McCallum, A. Corrada-Emanuel, and X. Wang. Topic and role discovery in social networks. In Proceedings of 19th International Joint Conference on Artificial Intelligence, 2005. Google ScholarDigital Library
- U. Nodelman, C. Shelton, and D. Koller. Continuous time Bayesian networks. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pages 378--387, 2002. Google ScholarDigital Library
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2004. Google ScholarDigital Library
- P. Sarkar and A. Moore. Dynamic social network analysis using latent space models. In The 19th Annual Conference on Neural Information Processing Systems, 2005.Google ScholarDigital Library
- X. Song, C.-Y. Lin, B. L. Tseng, and M.-T. Sun. Modeling and predicting personal information dissemination behavior. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
- R. Swan and D. Jensen. Timemines: Constructing timelines with statistical models of word usage. In The 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Text Mining, pages 73--80, 2000.Google Scholar
- Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Technical report, UC Berkeley Statistics TR-653, 2004.Google Scholar
- X. Wang and A. McCallum. A note on topical n-grams. Technical report, UMass UM-CS-2005-071, 2005.Google Scholar
- X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. In The 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Link Discovery: Issues, Approachesand Applications, pages 28--35, 2005. Google ScholarDigital Library
Index Terms
- Topics over time: a non-Markov continuous-time model of topical trends
Recommendations
Text, Topics, and Turkers: A Consensus Measure for Statistical Topics
HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social MediaTopic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of ...
Group topic model: organizing topics into groups
AbstractLatent Dirichlet allocation defines hidden topics to capture latent semantics in text documents. However, it assumes that all the documents are represented by the same topics, resulting in the “forced topic” problem. To solve this problem, we ...
Trend analysis model: trend consists of temporal words, topics, and timestamps
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data miningThis paper presents a topic model that identifies interpretable low dimensional components in time-stamped data for capturing the evolution of trends. Unlike other models for time-stamped data, our proposal, the trend analysis model (TAM), focuses on ...
Comments