Skip to main content

HDP-TUB Based Topic Mining Method for Chinese Micro-blogs

  • Conference paper
  • First Online:
  • 3247 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Abstract

Topic models are important tools for mining the potential topics of text. However, the existing topic model is mostly derived from latent Dirichlet allocation (LDA), which requires the number of topics to be specified in advance. In order to mine the topic of Chines micro-blogs automatically, we propose a nonparametric Bayesian model, named HDP-TUB model, which is derived from hierarchical Dirichlet Process (HDP). In this model, we assume non-exchangeability of data, and use temporal information, user information and theme tags (TUB) to solve the sparsity problem caused by the short text. In order to construct the HDP-TUB model, the CRF (Chinese Restaurant Franchise) method is extended to integrate the temporal information, user information and topic tag information. Experiments show that the HDP-TUB model outperforms the LDA model and the HDP model in the perplexity and the difference between topics.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Statistical Report on the Internet Development of China. China Internet Network Information Center (2016)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. JMLR.org (2003)

  3. Xiangdong, L., Chaozhi, B., Li, H.: Multi granularity sub topic partitioning method based on LDA model and HowNet. Appl. Res. Comput. 32(6), 1625–1629 (2016)

    Google Scholar 

  4. Peng, G., Yuefen, W., Zhu, B.: Analysis of topic extraction in scientific literature based on IDA topic model under different corpus. Libr. Inf. Serv. 60(2), 1120121 (2016)

    Google Scholar 

  5. Zhang, C., Sun, J.: Large scale microblog mining using distributed MB-LDA. In: International Conference on World Wide Web, pp. 1035–1042. ACM (2012)

    Google Scholar 

  6. Wang, Y., Agichtein, E., Benzi, M.: TM-LDA: efficient online modeling of latent topic transitions in social media. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 123–131. ACM (2012)

    Google Scholar 

  7. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  8. Li, J., Li, S.: Evolutionary hierarchical Dirichlet process for timeline summarization meeting of the association for computational linguistics, pp. 556–560. ACL (2013)

    Google Scholar 

  9. Ma, T., Qu, D., Ma, R.: Online topic evolution modeling based on hierarchical Dirichlet Process. In: IEEE International Conference on Data Science in Cyberspace, pp. 400–405. IEEE (2016)

    Google Scholar 

  10. Kim, D., Oh, A.: Accounting for data dependencies within a hierarchical Dirichlet process mixture model. In: ACM Conference on Information and Knowledge Management, CIKM 2011, pp. 873–878. DBLP, Glasgow (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Li Yi or Yangsen Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Yang, B., Yi, L., Liu, Y., Zhang, Y. (2018). HDP-TUB Based Topic Mining Method for Chinese Micro-blogs. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_75

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73618-1_75

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73617-4

  • Online ISBN: 978-3-319-73618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics