Skip to main content
Log in

CS-BTM: a semantics-based hot topic detection method for social network

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Social network mediums play a significant role in our daily life, which has produced enormous documents. Detecting the topics of documents can help users quickly find the interested documents. Consequently, topic detection for Microblog gets progressively attention. However, it is an extraordinary task for the following two major challenges. First, in the aspect of topic extraction, for the short text and sparsity of the Microblog, most of the existing algorithms dealing with the long text message can not deal it well. Second, most of the traditional text semantics processing models have not considered the situation that context semantics and polysemy, which may lead to inaccurate results. To address the above two challenges, a new model is proposed to improve the BTM (Biterm Topic Model) model, called CS-BTM (Context Semantics-based Biterm Topic Model). The CS-BTM model mines the similar biterms in context semantics by Bert model when count the occurrence times of biterms, which is contributed to detect the topic words of each topic. Moreover, by optimizing the Single-pass clustering algorithm, we propose a new algorithm to cluster the topics obtained by CS-BTM for topic detection. Through experimental verification, the proposed method serves an important role in topic detection compared with the several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Landauer TK, Foltz PW, Laham D (1998) Taylor, and Francis, online: an introduction to latent semantic analysis - discourse processes. Discourse Process 25(2):259–284

    Article  Google Scholar 

  2. Blei DM, Ng AY, Jordan MI, Lafferty J (2012) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Yan X, Guo J, Lan Y, et al (2013) A biterm topic model for short texts[C]. Proceedings of the 22nd international conference on World Wide Web, pp 1445–1456

  4. Devlin J, Chang M W, Lee K et al Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  5. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196

    Article  MATH  Google Scholar 

  6. Zheng YC (2014) Text segmentation based on the plsa-texttiling model. Appl Mech Mater 556–562:4018–4022

    Article  Google Scholar 

  7. Rui Zhao KM (2015) Supervised adaptive-transfer plsa for cross-domain text classification. 2014 IEEE international conference on data 305mining workshop, pp 259–266

  8. Yali P, Jian Y, Shaopeng L, Le S (2008) Text classification based on labeled-LDA model. Chin J Comput 31(4):620–627

    MathSciNet  Google Scholar 

  9. Wankun GJW, Qinglie W (2015) Hot topic extraction from e-commerce microblog based on em-LDA integrated mode. Data Analysis and Knowledge Discovery 31(11):33–40

    Google Scholar 

  10. Katyayani J (2020) Hot topic extraction from news websites. Advances in computational and bio-engineering, pp 297–303

  11. Chenyi Z, Jianling S, Yiqun D (2011) Topic mining for microblog based on MB-LDA model. J Comput Res Dev 48(10):795–1802

    Google Scholar 

  12. Zhenxing L, Wang S (2016) Short text classification based on chi-square feature and btm. J Lanzhou Jiaotong Univ 35(01):36–41

    Google Scholar 

  13. Lei L, Zhu Y, Huaji S (2017) Topic mining based on U_BTM model in social networks. Appl Res Comput 34(001):132–135

    Google Scholar 

  14. Yang X, Yang W, Cheng Q (2017) Short-text clustering method combining how net with btm model. Comput Eng Design 38(005):1258–1263

    Google Scholar 

  15. Wu D, Zhang M, Shen C, Huang Z, Gu M (2020) Btm and glove similarity linear fusion-based short text clustering algorithm for microblog hot topic discovery. IEEE Access 8:32215–32225

    Article  Google Scholar 

  16. Wang Y, Yunhua Z (2020) Research on btm topic model based on two-word meaning enhancement. Softw Eng 4:1–6

    Google Scholar 

  17. Geng X, Zhang Y, Jiao Y, Mei Y (2019) A novel hybrid clustering algorithm for topic detection on Chinese microblogging. IEEE Trans Comput Social Syst 6(2):289–300

    Article  Google Scholar 

  18. Martínez-Huertas JÁ, Olmos R, León JA (2021) Enhancing topic-detection in computerized assessments of constructed responses with distributional models of language. Expert Syst Appl 185:115621

    Article  Google Scholar 

  19. Daouadi KE, Reba RZ, Amous I (2021) Optimizing semantic deep Forest for tweet topic classification[J]. Inf Syst 101(2):101801

    Article  Google Scholar 

  20. Li D, Zhou X, Xue A (2020) Open source threat intelligence discovery based on topic detection. 2020 29th international conference on computer communications and networks (ICCCN), pp 1–4

  21. Wang Z, Le X, He Y (2017) Recognizing core topic sentences with improved text rank algorithm based on wmd semantic similarity. Data Anal Knowl Discov 1(4):1–8

    Google Scholar 

  22. Gui L, Jia L, Zhou J, Jia L (2020) Multi-task learning with mutual learning for joint sentiment classification and topic detection. IEEE Trans Knowl Data Eng 99(99):1–1

    Google Scholar 

  23. Xiao K, Qian Z, Qin B (2021) A graphical decomposition and similarity measurement approach for topic detection from online news[J]. Inf Sci 570:262–277

    Article  Google Scholar 

  24. Xu F, Sheng VS, Wang M (2020) Near real-time topic-driven rumor detection in source microblogs. Knowl-Based Syst 207(5):106391

    Article  Google Scholar 

  25. Du X, Zhu R, Zhao F et al (2020) A deceptive detection model based on topic, sentiment, and sentence structure information. Appl Intell 50:3868C3881

    Article  Google Scholar 

  26. Xie W, Zhu F, Jiang J, Lim EP, Wang K (2016) Topicsketch: real-time bursty topic detection from twitter. IEEE Trans Knowl Data Eng 28(8):2216–2229

    Article  Google Scholar 

  27. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35.8:1798–1828

    Article  Google Scholar 

  28. Buckman J et al (2018) Thermometer encoding: one hot way to resist adversarial examples. International conference on learning representations

  29. Salton G, McGill M J (1983) Introduction to modern information retrieval. mcgraw-hill

  30. Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J Financ 66(1):35–65

    Article  Google Scholar 

  31. Hinton GE (1986) Learning distributed representations of concepts. Proceedings of the eighth annual conference of the cognitive science society, 1, pp 145–157

  32. Mikolov T, Le QV, Sutskever I (2013) Exploiting similarities among languages for machine translation[J], arXiv preprint arXiv:1309.4168

  33. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  34. Zhang Q, Yang LT, Chen Z et al (2018) A survey on deep learning for big data. Inf Fusion 42:146–157

    Article  Google Scholar 

  35. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Advances in neural information processing systems, pp 3104–3112

  36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. ICML, arXiv:1706.03762

  37. Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461

    Article  Google Scholar 

  38. Papka R, Allan J (1998) On-line new event detection using single-pass clustering. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. Melbourne, pp 37–45

Download references

Acknowledgments

The paper is supported in part by the National Natural Science Foundation of China under Grants No. 61672022 and No. U1904186, Key Disciplines of Software Engineering of Shanghai Polytechnic University under Grant No.XXKZD1604.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenan Tan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niu, W., Tan, W. & Jia, W. CS-BTM: a semantics-based hot topic detection method for social network. Appl Intell 52, 18187–18200 (2022). https://doi.org/10.1007/s10489-022-03500-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03500-9

Keywords

Navigation