Abstract
In order to overcome data sparsity and expression diversity problems of short text and to improve the quality of clustering, this paper proposes a text feature enhancement method based on biterm topic model (BTM). First, we obtain the high frequency word matrix of underlying topic based on the extraction on the corpus using BTM and then strengthen the traditional vector space model (VSM) selectively with this matrix to reduce vector dimension and highlight the main features. Also, we propose a heat calculation equation combining with propagation characteristic and time effect of micro-blogs so that we can better demonstrate the evolution of a topic and analyze it. Experiments show that our method has achieved good results in improving the clustering quality and the heat calculation equation is also beneficial to the discovery and evolution of hot topics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allan, J.: Introduction to topic detection and tracking. In: Allan, J. (ed.) Topic Detection and Tracking, pp. 1–16. Springer US, New York (2002)
Yan, X., Guo, J., Lan, Y.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456. ACM (2013)
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 436–442. ACM (2002)
Hu, J., Xu, H., Liu, Y.: Algorithm of repeats-based term extraction and its application in text clustering. Comput. Eng. 33, 65–67 (2007)
Gabrilovich, E.: Feature generation for textual information retrieval using world knowledge. ACM SIGIR Forum 41, 123 (2007)
Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining, pp. 541–544 (2003)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)
Song, L., Zhang, P.: System design of micro-blog public opinion based on LDA topic modeling method. Netw. Secur. Technol. Appl. 4, 5–6 (2014). (in Chinese)
Tang, Q.: Short text clustering method based on BTM. Anhui University, Hefei (2014). (in Chinese)
Zhang, Y.: A short text similarity calculation method based on feature extension using BTM topic mode. Anhui University, Hefei (2014). (in Chinese)
Wang, Y.: Topic model based on mixture LDA model in microblogging services. Nanjing University of Posts and Telecommunications, Nanjing (2015). (in Chinese)
Wu, W., Wu, Q., Gu, J.: Hot topic extraction from E-commerce microblog based on EM-LDA integrated model. Mod. Libr. Inf. Technol. 11, 33–40 (2015). (in Chinese)
Wang, H., Peng, Y.: Public opinion hotspots discovery based on topic model and ARIMA algorithm. Technology Square (2016). (in Chinese)
Jiang, H.: Characteristics of micro blog and its influence on public opinion. News Lovers First Half 5, 85–86 (2011). (in Chinese)
O’Connor, B., Balasubramanyan, R., Routledge, B.R.: From tweets to polls: linking text sentiment to public opinion time series. In: ICWSM, vol. 11, pp. 122–129 (2010)
Cheng, J., Sun, A.R., Hu, D.: An information diffusion based recommendation framework for micro-blogging. J. Assoc. Inf. 12, 463 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Feng, J., Fang, Y. (2017). Research on Hot Topic Discovery Technology of Micro-blog Based on Biterm Topic Model. In: Yuan, H., Geng, J., Bian, F. (eds) Geo-Spatial Knowledge and Intelligence. GRMSE 2016. Communications in Computer and Information Science, vol 699. Springer, Singapore. https://doi.org/10.1007/978-981-10-3969-0_27
Download citation
DOI: https://doi.org/10.1007/978-981-10-3969-0_27
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3968-3
Online ISBN: 978-981-10-3969-0
eBook Packages: Computer ScienceComputer Science (R0)