CS-BTM: a semantics-based hot topic detection method for social network

Niu, Weinan; Tan, Wenan; Jia, Wei

doi:10.1007/s10489-022-03500-9

CS-BTM: a semantics-based hot topic detection method for social network

Published: 10 April 2022

Volume 52, pages 18187–18200, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Weinan Niu¹,
Wenan Tan^1,2 &
Wei Jia¹

565 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Social network mediums play a significant role in our daily life, which has produced enormous documents. Detecting the topics of documents can help users quickly find the interested documents. Consequently, topic detection for Microblog gets progressively attention. However, it is an extraordinary task for the following two major challenges. First, in the aspect of topic extraction, for the short text and sparsity of the Microblog, most of the existing algorithms dealing with the long text message can not deal it well. Second, most of the traditional text semantics processing models have not considered the situation that context semantics and polysemy, which may lead to inaccurate results. To address the above two challenges, a new model is proposed to improve the BTM (Biterm Topic Model) model, called CS-BTM (Context Semantics-based Biterm Topic Model). The CS-BTM model mines the similar biterms in context semantics by Bert model when count the occurrence times of biterms, which is contributed to detect the topic words of each topic. Moreover, by optimizing the Single-pass clustering algorithm, we propose a new algorithm to cluster the topics obtained by CS-BTM for topic detection. Through experimental verification, the proposed method serves an important role in topic detection compared with the several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Article 26 October 2022

A survey on neural topic models: methods, applications, and challenges

Article Open access 25 January 2024

References

Landauer TK, Foltz PW, Laham D (1998) Taylor, and Francis, online: an introduction to latent semantic analysis - discourse processes. Discourse Process 25(2):259–284
Article Google Scholar
Blei DM, Ng AY, Jordan MI, Lafferty J (2012) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Yan X, Guo J, Lan Y, et al (2013) A biterm topic model for short texts[C]. Proceedings of the 22nd international conference on World Wide Web, pp 1445–1456
Devlin J, Chang M W, Lee K et al Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196
Article MATH Google Scholar
Zheng YC (2014) Text segmentation based on the plsa-texttiling model. Appl Mech Mater 556–562:4018–4022
Article Google Scholar
Rui Zhao KM (2015) Supervised adaptive-transfer plsa for cross-domain text classification. 2014 IEEE international conference on data 305mining workshop, pp 259–266
Yali P, Jian Y, Shaopeng L, Le S (2008) Text classification based on labeled-LDA model. Chin J Comput 31(4):620–627
MathSciNet Google Scholar
Wankun GJW, Qinglie W (2015) Hot topic extraction from e-commerce microblog based on em-LDA integrated mode. Data Analysis and Knowledge Discovery 31(11):33–40
Google Scholar
Katyayani J (2020) Hot topic extraction from news websites. Advances in computational and bio-engineering, pp 297–303
Chenyi Z, Jianling S, Yiqun D (2011) Topic mining for microblog based on MB-LDA model. J Comput Res Dev 48(10):795–1802
Google Scholar
Zhenxing L, Wang S (2016) Short text classification based on chi-square feature and btm. J Lanzhou Jiaotong Univ 35(01):36–41
Google Scholar
Lei L, Zhu Y, Huaji S (2017) Topic mining based on U_BTM model in social networks. Appl Res Comput 34(001):132–135
Google Scholar
Yang X, Yang W, Cheng Q (2017) Short-text clustering method combining how net with btm model. Comput Eng Design 38(005):1258–1263
Google Scholar
Wu D, Zhang M, Shen C, Huang Z, Gu M (2020) Btm and glove similarity linear fusion-based short text clustering algorithm for microblog hot topic discovery. IEEE Access 8:32215–32225
Article Google Scholar
Wang Y, Yunhua Z (2020) Research on btm topic model based on two-word meaning enhancement. Softw Eng 4:1–6
Google Scholar
Geng X, Zhang Y, Jiao Y, Mei Y (2019) A novel hybrid clustering algorithm for topic detection on Chinese microblogging. IEEE Trans Comput Social Syst 6(2):289–300
Article Google Scholar
Martínez-Huertas JÁ, Olmos R, León JA (2021) Enhancing topic-detection in computerized assessments of constructed responses with distributional models of language. Expert Syst Appl 185:115621
Article Google Scholar
Daouadi KE, Reba RZ, Amous I (2021) Optimizing semantic deep Forest for tweet topic classification[J]. Inf Syst 101(2):101801
Article Google Scholar
Li D, Zhou X, Xue A (2020) Open source threat intelligence discovery based on topic detection. 2020 29th international conference on computer communications and networks (ICCCN), pp 1–4
Wang Z, Le X, He Y (2017) Recognizing core topic sentences with improved text rank algorithm based on wmd semantic similarity. Data Anal Knowl Discov 1(4):1–8
Google Scholar
Gui L, Jia L, Zhou J, Jia L (2020) Multi-task learning with mutual learning for joint sentiment classification and topic detection. IEEE Trans Knowl Data Eng 99(99):1–1
Google Scholar
Xiao K, Qian Z, Qin B (2021) A graphical decomposition and similarity measurement approach for topic detection from online news[J]. Inf Sci 570:262–277
Article Google Scholar
Xu F, Sheng VS, Wang M (2020) Near real-time topic-driven rumor detection in source microblogs. Knowl-Based Syst 207(5):106391
Article Google Scholar
Du X, Zhu R, Zhao F et al (2020) A deceptive detection model based on topic, sentiment, and sentence structure information. Appl Intell 50:3868C3881
Article Google Scholar
Xie W, Zhu F, Jiang J, Lim EP, Wang K (2016) Topicsketch: real-time bursty topic detection from twitter. IEEE Trans Knowl Data Eng 28(8):2216–2229
Article Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35.8:1798–1828
Article Google Scholar
Buckman J et al (2018) Thermometer encoding: one hot way to resist adversarial examples. International conference on learning representations
Salton G, McGill M J (1983) Introduction to modern information retrieval. mcgraw-hill
Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J Financ 66(1):35–65
Article Google Scholar
Hinton GE (1986) Learning distributed representations of concepts. Proceedings of the eighth annual conference of the cognitive science society, 1, pp 145–157
Mikolov T, Le QV, Sutskever I (2013) Exploiting similarities among languages for machine translation[J], arXiv preprint arXiv:1309.4168
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Zhang Q, Yang LT, Chen Z et al (2018) A survey on deep learning for big data. Inf Fusion 42:146–157
Article Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Advances in neural information processing systems, pp 3104–3112
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. ICML, arXiv:1706.03762
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461
Article Google Scholar
Papka R, Allan J (1998) On-line new event detection using single-pass clustering. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. Melbourne, pp 37–45

Download references

Acknowledgments

The paper is supported in part by the National Natural Science Foundation of China under Grants No. 61672022 and No. U1904186, Key Disciplines of Software Engineering of Shanghai Polytechnic University under Grant No.XXKZD1604.

Author information

Authors and Affiliations

College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Weinan Niu, Wenan Tan & Wei Jia
School of Computer and Information Engineering, Shanghai Polytechnic University, Shanghai, 200000, China
Wenan Tan

Authors

Weinan Niu
View author publications
You can also search for this author in PubMed Google Scholar
Wenan Tan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenan Tan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niu, W., Tan, W. & Jia, W. CS-BTM: a semantics-based hot topic detection method for social network. Appl Intell 52, 18187–18200 (2022). https://doi.org/10.1007/s10489-022-03500-9

Download citation

Accepted: 10 March 2022
Published: 10 April 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10489-022-03500-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CS-BTM: a semantics-based hot topic detection method for social network

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

A survey on neural topic models: methods, applications, and challenges

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CS-BTM: a semantics-based hot topic detection method for social network

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

A survey on neural topic models: methods, applications, and challenges

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation