Abstract
Social network mediums play a significant role in our daily life, which has produced enormous documents. Detecting the topics of documents can help users quickly find the interested documents. Consequently, topic detection for Microblog gets progressively attention. However, it is an extraordinary task for the following two major challenges. First, in the aspect of topic extraction, for the short text and sparsity of the Microblog, most of the existing algorithms dealing with the long text message can not deal it well. Second, most of the traditional text semantics processing models have not considered the situation that context semantics and polysemy, which may lead to inaccurate results. To address the above two challenges, a new model is proposed to improve the BTM (Biterm Topic Model) model, called CS-BTM (Context Semantics-based Biterm Topic Model). The CS-BTM model mines the similar biterms in context semantics by Bert model when count the occurrence times of biterms, which is contributed to detect the topic words of each topic. Moreover, by optimizing the Single-pass clustering algorithm, we propose a new algorithm to cluster the topics obtained by CS-BTM for topic detection. Through experimental verification, the proposed method serves an important role in topic detection compared with the several state-of-the-art methods.
Similar content being viewed by others
References
Landauer TK, Foltz PW, Laham D (1998) Taylor, and Francis, online: an introduction to latent semantic analysis - discourse processes. Discourse Process 25(2):259–284
Blei DM, Ng AY, Jordan MI, Lafferty J (2012) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Yan X, Guo J, Lan Y, et al (2013) A biterm topic model for short texts[C]. Proceedings of the 22nd international conference on World Wide Web, pp 1445–1456
Devlin J, Chang M W, Lee K et al Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196
Zheng YC (2014) Text segmentation based on the plsa-texttiling model. Appl Mech Mater 556–562:4018–4022
Rui Zhao KM (2015) Supervised adaptive-transfer plsa for cross-domain text classification. 2014 IEEE international conference on data 305mining workshop, pp 259–266
Yali P, Jian Y, Shaopeng L, Le S (2008) Text classification based on labeled-LDA model. Chin J Comput 31(4):620–627
Wankun GJW, Qinglie W (2015) Hot topic extraction from e-commerce microblog based on em-LDA integrated mode. Data Analysis and Knowledge Discovery 31(11):33–40
Katyayani J (2020) Hot topic extraction from news websites. Advances in computational and bio-engineering, pp 297–303
Chenyi Z, Jianling S, Yiqun D (2011) Topic mining for microblog based on MB-LDA model. J Comput Res Dev 48(10):795–1802
Zhenxing L, Wang S (2016) Short text classification based on chi-square feature and btm. J Lanzhou Jiaotong Univ 35(01):36–41
Lei L, Zhu Y, Huaji S (2017) Topic mining based on U_BTM model in social networks. Appl Res Comput 34(001):132–135
Yang X, Yang W, Cheng Q (2017) Short-text clustering method combining how net with btm model. Comput Eng Design 38(005):1258–1263
Wu D, Zhang M, Shen C, Huang Z, Gu M (2020) Btm and glove similarity linear fusion-based short text clustering algorithm for microblog hot topic discovery. IEEE Access 8:32215–32225
Wang Y, Yunhua Z (2020) Research on btm topic model based on two-word meaning enhancement. Softw Eng 4:1–6
Geng X, Zhang Y, Jiao Y, Mei Y (2019) A novel hybrid clustering algorithm for topic detection on Chinese microblogging. IEEE Trans Comput Social Syst 6(2):289–300
Martínez-Huertas JÁ, Olmos R, León JA (2021) Enhancing topic-detection in computerized assessments of constructed responses with distributional models of language. Expert Syst Appl 185:115621
Daouadi KE, Reba RZ, Amous I (2021) Optimizing semantic deep Forest for tweet topic classification[J]. Inf Syst 101(2):101801
Li D, Zhou X, Xue A (2020) Open source threat intelligence discovery based on topic detection. 2020 29th international conference on computer communications and networks (ICCCN), pp 1–4
Wang Z, Le X, He Y (2017) Recognizing core topic sentences with improved text rank algorithm based on wmd semantic similarity. Data Anal Knowl Discov 1(4):1–8
Gui L, Jia L, Zhou J, Jia L (2020) Multi-task learning with mutual learning for joint sentiment classification and topic detection. IEEE Trans Knowl Data Eng 99(99):1–1
Xiao K, Qian Z, Qin B (2021) A graphical decomposition and similarity measurement approach for topic detection from online news[J]. Inf Sci 570:262–277
Xu F, Sheng VS, Wang M (2020) Near real-time topic-driven rumor detection in source microblogs. Knowl-Based Syst 207(5):106391
Du X, Zhu R, Zhao F et al (2020) A deceptive detection model based on topic, sentiment, and sentence structure information. Appl Intell 50:3868C3881
Xie W, Zhu F, Jiang J, Lim EP, Wang K (2016) Topicsketch: real-time bursty topic detection from twitter. IEEE Trans Knowl Data Eng 28(8):2216–2229
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35.8:1798–1828
Buckman J et al (2018) Thermometer encoding: one hot way to resist adversarial examples. International conference on learning representations
Salton G, McGill M J (1983) Introduction to modern information retrieval. mcgraw-hill
Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J Financ 66(1):35–65
Hinton GE (1986) Learning distributed representations of concepts. Proceedings of the eighth annual conference of the cognitive science society, 1, pp 145–157
Mikolov T, Le QV, Sutskever I (2013) Exploiting similarities among languages for machine translation[J], arXiv preprint arXiv:1309.4168
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Zhang Q, Yang LT, Chen Z et al (2018) A survey on deep learning for big data. Inf Fusion 42:146–157
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Advances in neural information processing systems, pp 3104–3112
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. ICML, arXiv:1706.03762
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461
Papka R, Allan J (1998) On-line new event detection using single-pass clustering. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. Melbourne, pp 37–45
Acknowledgments
The paper is supported in part by the National Natural Science Foundation of China under Grants No. 61672022 and No. U1904186, Key Disciplines of Software Engineering of Shanghai Polytechnic University under Grant No.XXKZD1604.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Niu, W., Tan, W. & Jia, W. CS-BTM: a semantics-based hot topic detection method for social network. Appl Intell 52, 18187–18200 (2022). https://doi.org/10.1007/s10489-022-03500-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03500-9