Skip to main content
Log in

Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

The development of micro-blog, generating large-scale short texts, provides people with convenient communication. In the meantime, discovering topics from short texts genuinely becomes an intractable problem. It was hard for traditional topic model-to-model short texts, such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA). They suffered from the severe data sparsity when disposed short texts. Moreover, K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference among topic documents is distinct. In this paper, BTM topic model is employed to process short texts–micro-blog data for alleviating the problem of sparsity. At the same time, we integrating K-means clustering algorithm into BTM (Biterm Topic Model) for topics discovery further. The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Huang, S.Q., Yang, Y.T., Li, H.K., and Sun, G.Z., Topic detection from microblog based on text clustering and topic model analysis, IEEE Asia-Pac. Serv. Comput. Conf., 2014, no. 12, pp. 88–92.

    Google Scholar 

  2. Hofmann, T., Probabilistic latent semantic indexing, Proc. SIGIR, 1999, pp. 50–57.

    Google Scholar 

  3. Blei, D., Ng, A., and Jordan, M., Latent Dirichlet allocation, J. Mach. Learn. Res., 2003, vol. 3, pp. 993–1022.

    MATH  Google Scholar 

  4. Yan, X.H., Guo, J.F., Lan, Y.Y., and Cheng, X.Q., A biterm topic model for short texts, Int. Conf. World Wide Web, 2013, no. 5, pp. 1445–1456.

    Google Scholar 

  5. Liu, S.B. and Liu, L., Combining parametric and nonparametric topic model to discover microblog event, IEEE Inf. Sci. Electron. Electr. Eng. (ISEEE), 2014, vol. 3, pp. 1527–1531.

    Google Scholar 

  6. Wang, Y.Y., Wang, L., Qi, J., et al., Improved text clustering algorithm and application in microblogging public opinion analysis, IEEE Fourth World Congress on Software Engineering, 2013, pp. 27–31.

    Google Scholar 

  7. Lu, R., Xiang, L., Liu, M.R., and Yang, Q., Discovering news topics from micro-blogs based on hidden topics analysis and text clustering, Pattern Recognit. Artif. Intell., 2012, vol. 3, pp. 382–387.

    Google Scholar 

  8. Xiong, Z.T., Clustering algorithm research in micro-blog short text based on sparse feature, Software Guide, 2014, vol. 13, pp. 133–135.

    Google Scholar 

  9. Xie, H. and Jiang, H., Improved LDA model for micro-blog topic mining, J. East China Nornal Univ. (Nat. Sci.), 2013.

    Google Scholar 

  10. Qi, X.Q. and Jing, X.J., The improvement of LDA applying in micro-blog, Sci. Pap. Online, 2012.

    Google Scholar 

  11. Ramage, D., Dumail, S.T., and Liebling, D.J., Characterizing micro-blogs with topic model, 4th International AAAI Conference on Weblogs and Social Media (ICWSM), 2010, pp. 130–137.

    Google Scholar 

  12. Huang, T., Peng, D.L., and Cao, L.D., Discovering communities with self-adaptive k clustering in micro-blog data, IEEE Second International Conference on Cloud and Green Computing, 2012, pp. 383–390.

    Google Scholar 

  13. Sun, S.P., Research on Chinese Micro-Blog Hot Topic Detection and Tracking, Beijing Jiaotong University, 2011.

    Google Scholar 

  14. Mi, W.L. and Sun, Y.X., Microblog hot topics discovery method based on probabilistic topic model, Comput. Syst. Appl., 2014.

    Google Scholar 

  15. Zheng, L., Research and Application of Topic Detection on Micro-Blog, Harbin Institute of Technology, 2012.

    Google Scholar 

  16. Han, J.W. and Kamber, M., Data Mining: Concepts and Techniques [M], 2007, pp. 263–266.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weijiang Li.

Additional information

The article is published in the original.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, W., Feng, Y., Li, D. et al. Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm. Aut. Control Comp. Sci. 50, 271–277 (2016). https://doi.org/10.3103/S0146411616040040

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411616040040

Keywords

Navigation