Abstract
Topic models are important tools for mining the potential topics of text. However, the existing topic model is mostly derived from latent Dirichlet allocation (LDA), which requires the number of topics to be specified in advance. In order to mine the topic of Chines micro-blogs automatically, we propose a nonparametric Bayesian model, named HDP-TUB model, which is derived from hierarchical Dirichlet Process (HDP). In this model, we assume non-exchangeability of data, and use temporal information, user information and theme tags (TUB) to solve the sparsity problem caused by the short text. In order to construct the HDP-TUB model, the CRF (Chinese Restaurant Franchise) method is extended to integrate the temporal information, user information and topic tag information. Experiments show that the HDP-TUB model outperforms the LDA model and the HDP model in the perplexity and the difference between topics.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Statistical Report on the Internet Development of China. China Internet Network Information Center (2016)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. JMLR.org (2003)
Xiangdong, L., Chaozhi, B., Li, H.: Multi granularity sub topic partitioning method based on LDA model and HowNet. Appl. Res. Comput. 32(6), 1625–1629 (2016)
Peng, G., Yuefen, W., Zhu, B.: Analysis of topic extraction in scientific literature based on IDA topic model under different corpus. Libr. Inf. Serv. 60(2), 1120121 (2016)
Zhang, C., Sun, J.: Large scale microblog mining using distributed MB-LDA. In: International Conference on World Wide Web, pp. 1035–1042. ACM (2012)
Wang, Y., Agichtein, E., Benzi, M.: TM-LDA: efficient online modeling of latent topic transitions in social media. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 123–131. ACM (2012)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Li, J., Li, S.: Evolutionary hierarchical Dirichlet process for timeline summarization meeting of the association for computational linguistics, pp. 556–560. ACL (2013)
Ma, T., Qu, D., Ma, R.: Online topic evolution modeling based on hierarchical Dirichlet Process. In: IEEE International Conference on Data Science in Cyberspace, pp. 400–405. IEEE (2016)
Kim, D., Oh, A.: Accounting for data dependencies within a hierarchical Dirichlet process mixture model. In: ACM Conference on Information and Knowledge Management, CIKM 2011, pp. 873–878. DBLP, Glasgow (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Zhang, Y., Yang, B., Yi, L., Liu, Y., Zhang, Y. (2018). HDP-TUB Based Topic Mining Method for Chinese Micro-blogs. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_75
Download citation
DOI: https://doi.org/10.1007/978-3-319-73618-1_75
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)