Skip to main content
Log in

Topic evolution based on the probabilistic topic model: a review

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Accurately representing the quantity and characteristics of users’ interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modern online environments. Search engines can provide information retrieval for a specified topic from archived data, but fail to reflect changes in interest toward the topic over time in a structured way. This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade. First, we introduce notations, terminology, and the basic topic model explored in the survey, then we summarize three categories of topic evolution based on the probabilistic topic model: the discrete time topic evolution model, the continuous time topic evolutionmodel, and the online topic evolution model. Next, we describe applications of the topic evolution model and attempt to summarize model generalization performance evaluation and topic evolution evaluation methods, as well as providing comparative experimental results for different models. To conclude the review, we pose some open questions and discuss possible future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Allan J. Introduction to topic detection and tracking. Topic Detection And Tracking. The Information Retrieval Series, Vol 12. Springer US, 2002, 1–16

    Chapter  Google Scholar 

  2. Allan J, Carbonell J G, Doddington G, Yamron J, Yang Y. Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. 1998, 194–218

    Google Scholar 

  3. Nallapati R, Feng A, Peng F, Allan J. Event threading within news topics. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management. 2004, 446–453

    Google Scholar 

  4. Morinaga S, Yamanishi K. Tracking dynamics of topic trends using a finite mixture model. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 811–816

    Google Scholar 

  5. Kumar R, Mahadevan U, Sivakumar D. A graph-theoretic approach to extract storylines from search results. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 216–225

    Google Scholar 

  6. Lin F R, Huang F M, Liang C H. Individualized storyline-based news topic retrospection. In: Proceedings of Pacific Asia Conference on Information Systems: Managing Diversity in Digital Enterprises. 2007

    Google Scholar 

  7. Ahmed A, Ho Q, Teo C H, Eisenstein J, Smola A J, Xing E P. Online inference for the infinite topic-cluster model: storylines from streaming text. In: Proceedings of the International Conference on Artificial Intelligence and Statistics. 2011, 101–109

    Google Scholar 

  8. Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 2001, 42(1): 177–196

    Article  MathSciNet  MATH  Google Scholar 

  9. Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993–1022

    MATH  Google Scholar 

  10. Shan B, Li F. A survey of topic evolution based on LDA. Journal of Chinese Information Processing, 2010, 24(1): 43–49

    Google Scholar 

  11. Elshamy W. Continuous-time infinite dynamic topic models. Dissertation for the Doctoral Degree. Manhattan: Kansas State University, 2013

    Google Scholar 

  12. Daud A, Li J Z, Zhou L Z, Muhammad F. Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China, 2010, 4(2): 280–301

    Article  Google Scholar 

  13. Steyvers M, Griffiths T. Probabilistic topic models. Handbook of Latent Semantic Analysis, 2007, 427(2): 424–440

    Google Scholar 

  14. Blei D M, Lafferty J D. Dynamic topic models. In: Proceedings of the 23rd ACM International Conference on Machine Learning. 2006, 113–120

    Google Scholar 

  15. Blei D M, Lafferty J D. A correlated topic model of science. Annals of Applied Statistics, 2007, 1(1): 17–35

    Article  MathSciNet  MATH  Google Scholar 

  16. Blei DM, Griffiths T L, Jordan M I. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. Journal of the ACM, 2010, 57(2): 7

    Article  MathSciNet  MATH  Google Scholar 

  17. Blei D M, Carin L, Dunson D. Probabilistic topic models. IEEE Signal Processing Magazine, 2010, 27(1): 55–65

    Article  Google Scholar 

  18. Blei D M. Probabilistic topic models. Communications of the ACM, 2012, 55(4): 77–84

    Article  Google Scholar 

  19. Xing E P. On topic evolution. Technical Report CMU-CALD-05-115. 2005

    Google Scholar 

  20. Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Journal of the American Statistical Association, 2006, 101: 1566–1581

    Article  MathSciNet  MATH  Google Scholar 

  21. Mei Q Z, Zhai C X. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005, 198–207

    Google Scholar 

  22. Nallapati R M, Ditmore S, Lafferty J D, Ung K. Multiscale topic tomography. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 520–529

    Chapter  Google Scholar 

  23. Ahmed A, Xing E P. Dynamic non-parametric mixture models and the recurrent Chinese restaurant process with application to evolutionary clustering. In: Proceedings of the SIAM International Conference on Data Mining. 2008, 219–230

    Google Scholar 

  24. Ahmed A, Xing E P. Timeline: dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. 2010, 20–29

    Google Scholar 

  25. Wang J, Liu X H, Wang J L, Zhao W D. News topic evolution tracking by incorporating temporal information. Communications in Computer and Information Science, 2014, 496(12): 465–472

    Article  Google Scholar 

  26. Wang X R, McCallum A. Topics over time: a non-markov continuoustime model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 424–433

    Chapter  Google Scholar 

  27. Wang C, Blei D, Heckerman D. Continuous time dynamic topic models. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence. 2008, 579–586

    Google Scholar 

  28. Kawamae N. Trend analysis model: trend consists of temporal words, topics, and timestamps. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 2011, 317–326

    Google Scholar 

  29. Dubey A, Hefny A, Williamson S, Xing E P. A nonparametric mixture model for topic modeling over time. In: Proceedings of the SIAM International Conference on Data Mining. 2013, 530–538

    Google Scholar 

  30. Li F F, Perona P. A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 524–531

    Google Scholar 

  31. Canini K P, Shi L, Griffiths T L. Online inference of topics with latent Dirichlet allocation. In: Proceedings of the International Conference on Artificial Intelligence and Statistics. 2009, 65–72

    Google Scholar 

  32. Hoffman M, Bach F R, Blei D M. Online learning for latent dirichlet allocation. In: Proceedings of the Neural Information Processing Systems Conference. 2010, 856–864

    Google Scholar 

  33. Sato I, Kurihara K, Nakagawa H. Deterministic single-pass algorithm for LDA. In: Proceedings of the Neural Information Processing Systems Conference. 2010, 2074–2082

    Google Scholar 

  34. AlSumait L, Barbará D, Domeniconi C. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 3–12

    Google Scholar 

  35. Gohr, A, Hinneburg A, Schult R, Spiliopoulou M. Topic evolution in a stream of documents. In: Proceedings of the SIAM International Conference on Data Mining. 2009, 859–870

    Google Scholar 

  36. Iwata T, Yamada T, Sakurai Y, Ueda N. Online multiscale dynamic topic models. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2010, 663–672

    Google Scholar 

  37. Ahmed A, Ho Q, Eisenstein J, Xing E, Smola A J, Teo C H. Unified analysis of streaming news. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 267–276

    Google Scholar 

  38. Griffiths T L, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences, 2004, 101 (suppl 1): 5228–5235

    Article  Google Scholar 

  39. Hall D, Jurafsky D, Manning C D. Studying the history of ideas using topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2008, 363–371

    Google Scholar 

  40. Bolelli L, Ertekin, Giles C L. Topic and trend detection in text collections using latent dirichlet allocation. In: Proceedings of the European Conference on Information Retrieval. 2009, 776–780

    Google Scholar 

  41. Steyvers M, Smyth P, Rosen-Zvi M, Griffiths T. Probabilistic authortopic models for information discovery. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 306–315

    Google Scholar 

  42. Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P. The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2004, 487–494

    Google Scholar 

  43. Nallapati R M, Ahmed A, Xing E P, Cohen W W. Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 542–550

    Chapter  Google Scholar 

  44. Zhou D, Ji X, Zha H Y, Giles C L. Topic evolution and social interactions: how authors effect research. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 248–257

    Google Scholar 

  45. He Q, Chen B, Pei J, Qiu B J, Mitra P, Giles L. Detecting topic evolution in scientific literature: how can citations help? In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 957–966

    Google Scholar 

  46. Wang X L, Zhai C X, Roth D. Understanding evolution of research themes: a probabilistic generative model for citations. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 1115–1123

    Chapter  Google Scholar 

  47. Wang X H, Zhai C X, Hu X, Sproat R. Mining correlated bursty topic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 784–793

    Chapter  Google Scholar 

  48. Hong L J, Dom B, Gurumurthy S, Tsioutsiouliklis K. A timedependent topic model for multiple text streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2011, 832–840

    Google Scholar 

  49. Lin C X, Zhao B, Mei Q Z, Han J W. PET: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 929–938

    Google Scholar 

  50. Lin C X, Mei Q Z, Han J W, Jiang Y L, Danilevsky M. The joint inference of topic diffusion and evolution in social communities. In: Proceedings of the 11th IEEE International Conference on Data Mining. 2011, 378–387

    Google Scholar 

  51. Tang X N, Yang C C. TUT: a statistical model for detecting trends, topics and user interests in social media. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 972–981

    Google Scholar 

  52. Sasaki K, Yoshikawa T, Furuhashi T. Online topic model for twitter considering dynamics of user interests and topic trends. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014, 1977–1985

    Google Scholar 

  53. Iwata T, Watanabe S, Yamada T, Ueda N. Topic tracking model for analyzing consumer purchase behavior. In: Proceedings of the International Joint Conference on Artificial Intelligence. 2009, 1427–1432

    Google Scholar 

  54. Cai G Y, Peng L B, Wang Y. Topic detection and evolution analysis on microblog. In: Shi Z Z, Wu Z H, Leake D, et al. eds. Intelligent Information Processing VII. IFIP Adrances in Information and Communication Technology, Vol 432. Berlin: Springer,2014, 67–77

    Google Scholar 

  55. Wallach H M, Murray I, Salakhutdinov R, Mimno D. Evaluation methods for topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 1105–1112

    Google Scholar 

  56. Saha A, Sindhwani V. Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 2012, 693–702

    Google Scholar 

  57. Vaca C K, Mantrach A, Jaimes A, Saerens M. A time-based collective factorization for topic discovery and monitoring in news. In: Proceedings of the 23rd ACM International Conference on World Wide Web. 2014, 527–538

    Google Scholar 

  58. Chen Y, Zhang H, Wu J J, Wang X G. Modeling emerging, evolving and fading topics using dynamic soft orthogonal nmf with sparse representation. In: Proceedings of the IEEE International Conference on Data Mining. 2015, 61–70

    Google Scholar 

  59. Globerson A, Chechik G, Pereira F, Tishby N. Euclidean embedding of co-occurrence data. The Journal of Machine Learning Research, 2007, 8(4): 2265–2295

    MathSciNet  MATH  Google Scholar 

  60. Chang J, Boyd-Graber J L, Gerrish S, Wang C, Blei D M. Reading tea leaves: how humans interpret topic models. In: Proceedings of the Neural Information Processing Systems Conference. 2009, 288–296

    Google Scholar 

  61. Wallach H M. Topic modeling: beyond bag of words. In: Proceedings of the 23rd International Conference on Machine Learning. 2006, 977–984

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which significantly contributed to improving the manuscript. This work was supported by the National Key Basic Research Project of China (973 Program) (2012CB316400), the National Nature Science Foundation of China (Grant Nos. 61471321, 61202400, 31300539, and 31570629), the Zhejiang Provincial Natural Science Foundation of China (LY15C140005, LY16F010004), Science and Technology Department of Zhejiang Province Public Welfare Project (2016C31G2010057, 2015C31004), Fundamental Research Funds for the Central Universities (172210261) and the Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and Information Technology Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huimin Yu.

Additional information

Houkui Zhou is a PhD student of the Department of Information Science and Electronic Engineering, Zhejiang University (ZJU), China. He got his bachelor degree from Hangzhou Dianzi University, China in 2003 and his master degree from Department of Information Science and Electronic Engineering, ZJU in 2006. His research interests include cross-media analysis and mining and topic evolution.

Huimin Yu received the PhD degree in communication and electronic systems from the Department of Information Science and Electronic Engineering, Zhejiang University (ZJU), China in 1996. He is currently a professor with the Department of Information Science and Electronic Engineering and the State Key Laboratory of CAD&CG, ZJU. His current research interests include cross-media data mining and analysis, and machine learning.

Roland Hu received the BS degree in electrical engineering from Tsinghua University, China, and the PhD degree in audio-visual person recognition from the University of Southampton, UK in 2002 and 2007, respectively. He was a postdoctoral researcher with the Communications and Remote Sensing Laboratory, Université Catholique de Louvain, Belgium from 2007 to 2009. Since 2009, he has been an assistant professor with the Department of Information Science and Electronic Engineering, Zhejiang University, China. His current research interests include computer vision, image processing, pattern recognition, and digital watermarking.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, H., Yu, H. & Hu, R. Topic evolution based on the probabilistic topic model: a review. Front. Comput. Sci. 11, 786–802 (2017). https://doi.org/10.1007/s11704-016-5442-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-5442-5

Keywords

Navigation