skip to main content
10.1145/1150402.1150450acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Topics over time: a non-Markov continuous-time model of topical trends

Published:20 August 2006Publication History

ABSTRACT

This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.

References

  1. C. Andrieu, N. de Freitas, A. Doucet, and M. Jordan. An introduction to MCMC for machine learning. Machine Learning, 50:5--43, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  2. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Erosheva, S. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. Proceedings of the National Academy of Sciences, 101(Suppl. 1), 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl. 1):5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. T. Griffiths, M. Steyvers, D. Blei, and J. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems (NIPS) 17, 2004.Google ScholarGoogle Scholar
  7. J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Kumaraswamy. A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46:79--88, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  9. R. E. Madsen, D. Kauchak, and C. Elkan. Modeling word burstiness using the Dirichlet distribution. In Proceedings of the 22nd International Conference on Machine Learning, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. McCallum, A. Corrada-Emanuel, and X. Wang. Topic and role discovery in social networks. In Proceedings of 19th International Joint Conference on Artificial Intelligence, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. U. Nodelman, C. Shelton, and D. Koller. Continuous time Bayesian networks. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pages 378--387, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Sarkar and A. Moore. Dynamic social network analysis using latent space models. In The 19th Annual Conference on Neural Information Processing Systems, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Song, C.-Y. Lin, B. L. Tseng, and M.-T. Sun. Modeling and predicting personal information dissemination behavior. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Swan and D. Jensen. Timemines: Constructing timelines with statistical models of word usage. In The 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Text Mining, pages 73--80, 2000.Google ScholarGoogle Scholar
  16. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Technical report, UC Berkeley Statistics TR-653, 2004.Google ScholarGoogle Scholar
  17. X. Wang and A. McCallum. A note on topical n-grams. Technical report, UMass UM-CS-2005-071, 2005.Google ScholarGoogle Scholar
  18. X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. In The 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Link Discovery: Issues, Approachesand Applications, pages 28--35, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Topics over time: a non-Markov continuous-time model of topical trends

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2006
        986 pages
        ISBN:1595933395
        DOI:10.1145/1150402

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 August 2006

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader