skip to main content
research-article

Sequential Modeling of Topic Dynamics with Multiple Timescales

Published: 01 February 2012 Publication History

Abstract

We propose an online topic model for sequentially analyzing the time evolution of topics in document collections. Topics naturally evolve with multiple timescales. For example, some words may be used consistently over one hundred years, while other words emerge and disappear over periods of a few days. Thus, in the proposed model, current topic-specific distributions over words are assumed to be generated based on the multiscale word distributions of the previous epoch. Considering both the long- and short-timescale dependency yields a more robust model. We derive efficient online inference procedures based on a stochastic EM algorithm, in which the model is sequentially updated using newly obtained data; this means that past data are not required to make the inference. We demonstrate the effectiveness of the proposed method in terms of predictive performance and computational efficiency by examining collections of real documents with timestamps.

References

[1]
Ahmed, A. and Xing, E. P. 2008. Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: With applications to evolutionary clustering. In Proceedings of the Siam International Conference on Data Mining (SDM).
[2]
Ahmed, A. and Xing, E. P. 2010. Timeline: A dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In Proceedings of the 26th International Conference on Uncertainty in Artificial Intelligence (UAI).
[3]
AlSumait, L., Barbara, D., and Domeniconi, C. 2008. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proceedings of the IEEE International Conference on Data Mining (ICDM). 3--12.
[4]
Andrieu, C., de Freitas, N., Doucet, A., and Jordan, M. I. 2003. An introduction to MCMC for machine learning. Mach. Learn. 50, 1, 5--43.
[5]
Banerjee, A. and Basu, S. 2007. Topic models over text streams: A study of batch and online unsupervised learning. In Proceedings of the Siam International Conference on Data Mining (SDM).
[6]
Blei, D. and Frazier, P. 2010. Distance dependent Chinese restaurant processes. In Proceedings of the International Conference on Machine Learning (ICML).
[7]
Blei, D. M. and Lafferty, J. D. 2006. Dynamic topic models. In Proceedings of the International Conference on Machine Learning (ICML) . 113--120.
[8]
Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022.
[9]
Canini, K. R., Shi, L., and Griffiths, T. L. 2009. Online inference of topics with latent Dirichlet allocation. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS). Vol. 5, 65--72.
[10]
Gerrish, S. and Blei, D. 2010. A language-based approach to measuring scholarly impact. In Proceedings of the International Conference on Machine Learning (ICML).
[11]
Griffiths, T. L. and Steyvers, M. 2004. Finding scientific topics. Proc. Nat. Acad. Sci. 101 Suppl. 1, 5228--5235.
[12]
Hoffman, M., Blei, D., and Bach, F. 2010. Online learning for latent Dirichlet allocation. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).
[13]
Hofmann, T. 1999. Probabilistic latent semantic analysis. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI). 289--296.
[14]
Hofmann, T. 2003. Collaborative filtering via Gaussian probabilistic latent semantic analysis. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 259--266.
[15]
Iwata, T., Watanabe, S., Yamada, T., and Ueda, N. 2009. Topic tracking model for analyzing consumer purchase behavior. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1427--1432.
[16]
Iwata, T., Yamada, T., and Ueda, N. 2008. Probabilistic latent semantic visualization: Topic model for visualizing documents. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 363--371.
[17]
Minka, T. 2000. Estimating a Dirichlet distribution. Tech. rep., MIT.
[18]
Nallapati, R., Cohen, W., Ditmore, S., Lafferty, J., and Ung, K. 2007. Multiscale topic tomography. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 520--529.
[19]
Papadimitriou, S., Brockwell, A., and Faloutsos, C. 2003. Adaptive, hands-off stream mining. In Proceedings of the International Conference on Very Large Databases (VLDB). 560--571.
[20]
Papadimitriou, S., Sun, J., and Faloutsos, C. 2005. Streaming pattern discovery in multiple time- series. In Proceedings of the International Conference on Very Large Databases (VLDB). 697--708.
[21]
Ren, L., Dunson, D. B., and Carin, L. 2008. The dynamic hierarchical Dirichlet process. In Proceedings of the International Conference on Machine Learning (ICML). 824--831.
[22]
Sakurai, Y., Papadimitriou, S., and Faloutsos, C. 2005. Braid: Stream mining through group lag correlations. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 599--610.
[23]
Sato, I., Kurihara, K., and Nakagawa, H. 2010. Deterministic single-pass algorithm for LDA. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).
[24]
Stephens, M. 2000. Dealing with label switching in mixture models. J. Royal Statist. Society B 62, 795--809.
[25]
Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. 2006. Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101, 476, 1566--1581.
[26]
Wang, C., Blei, D. M., and Heckerman, D. 2008. Continuous time dynamic topic models. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI). 579--586.
[27]
Wang, X. and McCallum, A. 2006. Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 424--433.
[28]
Watanabe, S., Iwata, T., Hori, T., Sako, A., and Ariki, Y. 2011. Topic tracking language model for speech recognition. Comput. Speech Lang. 25, 2, 440--461.
[29]
Wei, X., Sun, J., and Wang, X. 2007. Dynamic mixture models for multiple time-series. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2909--2914.
[30]
Zhang, J., Song, Y., Zhang, C., and Liu, S. 2010. Evolutionary hierarchical Dirichlet processes for multiple correlated time-varying corpora. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1079--1088.

Cited By

View all
  • (2022)New Event Detection for Web Recommendation Using Web MiningMachine Intelligence and Soft Computing10.1007/978-981-16-8364-0_19(153-157)Online publication date: 22-Feb-2022
  • (2021)Distributed Latent Dirichlet Allocation on StreamsACM Transactions on Knowledge Discovery from Data10.1145/345152816:1(1-20)Online publication date: 20-Jul-2021
  • (2021)Time Window Topic Model for Analyzing Customer Browsing Behavior2021 IEEE 12th International Workshop on Computational Intelligence and Applications (IWCIA)10.1109/IWCIA52852.2021.9626043(1-8)Online publication date: 6-Nov-2021
  • Show More Cited By

Index Terms

  1. Sequential Modeling of Topic Dynamics with Multiple Timescales

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 5, Issue 4
      February 2012
      176 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2086737
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 February 2012
      Accepted: 01 September 2011
      Revised: 01 June 2011
      Received: 01 January 2011
      Published in TKDD Volume 5, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Topic model
      2. multiscale
      3. online learning
      4. time-series analysis

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)New Event Detection for Web Recommendation Using Web MiningMachine Intelligence and Soft Computing10.1007/978-981-16-8364-0_19(153-157)Online publication date: 22-Feb-2022
      • (2021)Distributed Latent Dirichlet Allocation on StreamsACM Transactions on Knowledge Discovery from Data10.1145/345152816:1(1-20)Online publication date: 20-Jul-2021
      • (2021)Time Window Topic Model for Analyzing Customer Browsing Behavior2021 IEEE 12th International Workshop on Computational Intelligence and Applications (IWCIA)10.1109/IWCIA52852.2021.9626043(1-8)Online publication date: 6-Nov-2021
      • (2021)Deep dynamic neural networks for temporal language modeling in author communitiesKnowledge and Information Systems10.1007/s10115-020-01539-zOnline publication date: 13-Jan-2021
      • (2020)Infinite Mixtures of Gaussian Process Experts with Latent Variables and its Application to Terminal Location Estimation from Multiple-Sensor ValuesIntelligent Systems and Applications10.1007/978-3-030-55190-2_24(315-330)Online publication date: 25-Aug-2020
      • (2019)Learning Dynamic Author Representations with Temporal Language Models2019 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2019.00022(120-129)Online publication date: Nov-2019
      • (2019)Emerging research topics detection with multiple machine learning modelsJournal of Informetrics10.1016/j.joi.2019.10098313:4(100983)Online publication date: Nov-2019
      • (2018)A framework for semantic connection based topic evolution with DeepWalkIntelligent Data Analysis10.3233/IDA-16328222:1(211-237)Online publication date: 22-Feb-2018
      • (2018)Interactive System Using LDA for Exploratory Visualization to Extract Data Association in a Data Lake2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC.2018.00040(172-177)Online publication date: Oct-2018
      • (2016)Fast Sampling for Time-Varying Determinantal Point ProcessesACM Transactions on Knowledge Discovery from Data10.1145/294378511:1(1-24)Online publication date: 20-Jul-2016
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media