Skip to main content
Log in

Dynamic topic modeling via self-aggregation for short text streams

  • Published:
Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Abstract

Social networks such as Twitter, Facebook, and Sina microblogs have emerged as major sources for discovering and sharing the latest topics. Because social network topics change quickly, developing an effective method to model such topics is urgently needed. However, topic modeling is challenging due to the sparsity problem and the dynamic change of topics in microblog streams. In this study, we propose dynamic topic modeling via a self-aggregation method (SADTM) that can capture the time-varying aspect of topic distributions and resolve the sparsity problem. The SADTM aggregates the observable and unordered short texts as the aggregated document without leveraging an external context to overcome the sparsity problem of short text. Furthermore, we exploit word pairs instead of words for each microblog to generate more word co-occurrence patterns. The SADTM models temporal dynamics by using the topic distribution at previous time steps in microblog streams to infer the current topic from sequential data. Extensive experiments on a real-world Sina microblog dataset demonstrate that our SADTM algorithm outperforms several state-of-the-art methods in topic coherence and cluster quality. Additionally, when applied in a search scene, our SADTM significantly outperforms all baseline methods in terms of the quality of the search results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Yin H, Cui B, Chen L, et al. (2014) A temporal context-aware model for user behavior modeling in social media systems. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, pp 1543–1554

  2. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Rosen-Zvi M, Griffiths T, Steyvers M, et al. (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 487–494

  4. Cheng X, Yan X, Lan Y, et al. (2014) BTM: topic modeling over short texts. IEEE Trans Knowl Data Eng 26(12):2928–2941

    Article  Google Scholar 

  5. Zuo Y, Wu J, Zhang H, et al. (2016) Topic modeling of short texts: A pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 2105–2114

  6. Wang Y, Liu J, Huang Y, et al. (2016) Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. IEEE Trans Knowl Data Eng 28(7):1919–1933

    Article  Google Scholar 

  7. Liang S, Yilmaz E, Kanoulas E (2016) Dynamic clustering of streaming short documents. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 995–1004

  8. Xu Z, Chen L, Dai Y, et al. (2017) A dynamic topic model and matrix factorization-based travel recommendation method exploiting ubiquitous data. IEEE Trans Multimed 19(8):1933–1945

    Article  Google Scholar 

  9. Zhao Y, Liang S, Ren Z, et al. (2016) Explainable user clustering in short text streams. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 155–164

  10. Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for twitter considering dynamics of user interests and topic trends. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1977–1985

  11. Liang S, Ren Z, Zhao Y, et al. (2017) Inferring dynamic user interests in streams of short texts for user clustering. ACM Trans Inf Syst 36(1):10–47

    Google Scholar 

  12. Liu S, Yin J, Ouyang J et al (2014) MB-ToT: an effective model for topic mining in microblogs. Appl Math Inf Sci 8(1):299–308

    Article  Google Scholar 

  13. Lim KW, Buntine W (2014) Twitter opinion topic model: extracting product opinions from tweets by leveraging hashtags and sentiment lexicon. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 1319–1328

  14. Zhang C, Sun J (2012) Large scale microblog mining using distributed MB-LDA. In: Proceedings of the 21st international conference on World Wide Web, pp 1035–1042

  15. Lu HM, Lee CH (2015) The topic-over-time mixed membership model (TOT-MMM): a twitter hashtag recommendation model that accommodates for temporal clustering effects. IEEE Intell Syst 30(3):18–25

    Article  Google Scholar 

  16. Lin T, Tian W, Mei Q, et al. (2014) The dual-sparse topic model:mining focused topics and focused terms in short text. In: International conference on World Wide Web, pp 539–550

  17. Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398

    Article  Google Scholar 

  18. Yang Y, Wang F, Zhang J, et al. (2018) A topic model for co-occurring normal documents and short texts. World Wide Web 21(2):487–513

    Article  Google Scholar 

  19. Liu H, Ge Y, Zheng Q, et al. (2018) Detecting global and local topics via mining twitter data. Neurocomputing 273:120–132

    Article  Google Scholar 

  20. Li X, Li C, Chi J, et al. (2017) Short text topic modeling by exploring original documents. Knowl Inf Syst 2(1):1–20

    Article  Google Scholar 

  21. Iwata T, Hirao T, Ueda N (2017) Topic models for unsupervised cluster matching. IEEE Trans Knowl Data Eng 30(4):786–795

    Article  Google Scholar 

  22. Lu H, Xie LY, Kang N, et al. (2017) Don’t forget the quantifiable relationship between words: using recurrent neural network for short text topic discovery. In: Proceedings of AAAI-17, pp 1192–1198

  23. Xun G, Gopalakrishnan V, Ma F et al (2016) Topic discovery for short texts using word embeddings. 2016 IEEE 16th international conference on data mining (ICDM), pp 1299-1304

  24. Yin J, Wang J (2014) A dirichlet multinomial mixture model-based approach for short text clustering. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 233–242

  25. Yin H, Cui B, Chen L, et al. (2015) Dynamic user modeling in social media systems. ACM Transactions on Information Systems (TOIS) 33(3):10–54

    Article  Google Scholar 

  26. Hua T, Ning Y, Chen F, et al. (2016) Topical analysis of interactions between news and social media. In: Proceedings of the 13th AAAI conference on artificial intelligence, pp 2964–2971

  27. Cha Y, Bi B, Hsieh CC, et al. (2013) Incorporating popularity in topic models for social network analysis. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, pp 223–232

  28. Zhao F, Zhu Y, Jin H, et al. (2016) A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Futur Gener Comput Syst 65:196– 206

    Article  Google Scholar 

  29. Alam MH, Ryu WJ, Lee SK (2017) Hashtag-based topic evolution in social media. World Wide Web 20(6):1527–1549

    Article  Google Scholar 

  30. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(1):5228–5235

    Article  Google Scholar 

  31. Mimno D, Wallach HM, Talley E, et al. (2011) Optimizing semantic coherence in topic models. In: proceedings of the conference on empirical methods in natural language processing. Association for computational linguistics, pp 262–272

  32. Croft WB, Metzler D, Strohman T (2010) Search engines: information retrieval in practice. In: Reading: Addison-Wesley, pp 2010

  33. Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 178–185

Download references

Acknowledgements

Supported by the National Natural Science Foundation of China under Grant (No.61320106006, No.61532006, No.61772083)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junping Du.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection: Special Issue on Big Data and Smart Computing in Network Systems

Guest Editors: Jiming Chen, Kaoru Ota, Lu Wang, and Jianping He

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, L., Du, J., Liang, M. et al. Dynamic topic modeling via self-aggregation for short text streams. Peer-to-Peer Netw. Appl. 12, 1403–1417 (2019). https://doi.org/10.1007/s12083-018-0692-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12083-018-0692-7

Keywords

Navigation