Skip to main content
Log in

Detection and Extraction of Hot Topics on Chinese Microblogs

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Peoples’ perceptions of reality are conditioned on how others see the world. Unfortunately, with the vast amount of information made available through online media, such as microblog sites, it is impossible for people to absorb all information in a timely manner. Therefore, the detection of hot topics on a microblog platform is becoming increasingly important. The present paper proposes a new hot-topic detection and extraction approach based on language and topic models, which analyzes the differences in emotion distribution language models between adjacent time intervals to detect hot topics. According to the contents and repost degree of microblogs, we estimate the importance of each microblog and generate topic models. Experiments conducted on the Sina Microblog show that the proposed approach can detect and extract hot topics effectively and can thus assist the Sina Microblog platform in managing and monitoring hot topics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Rochat P. Early social cognition: understanding others in the first months of life. London: Psychology Press; 2014. p. 2014.

    Google Scholar 

  2. Kwak H, Lee C, Park H, Moon S. What is twitter, a social network or a news media? In: 27th World Wide Web. In: Proceedings of the 19th international conference on World Wide Web. 2010. p. 591–600.

  3. Weng J, Lim E, Jiang J, He Q. TwitterRank: finding topic sensitive influential twitterers. In: Proceedings of the third ACM international conference on Web search and data mining. 2010. p. 261–70.

  4. Marchetti-Bowick M, Chambers N. Learning for microblogs with distant supervision: political forecasting with Twitter. In: Proceedings of the 13th conference of the European Chapter of the Association for Computational Linguistics. 2012. p. 603–12.

  5. Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. J Comput Sci. 2011;2(1):1–8.

    Article  Google Scholar 

  6. Hsu C, Liu C, Lee Y. Effect of commitment and trust towards micro-blogs on consumer behavioral intention: a relationship marketing perspective. Int J Electron Bus Manag. 2010;8(4):292–303.

    Google Scholar 

  7. Yu H, Zhang Y, Liu T, Li S. Topic detection and tracking review. J Chin Inf Process. 2007;21(6):71–87.

  8. Li B, Yu S. Research on topic detection and tracking. Comput Eng Appl. 2003;17(1):133–6.

    Google Scholar 

  9. Ku L, Liang Y. Opinion extraction, summarization and tracking in news and blog corpora. In: AAAI spring symposium: computational approaches to analyzing weblogs. 2006. p. 100–7.

  10. Akcora C, Bayir M, Demirbas M, Ferhaosmanoglu H. Identifying breakpoints in public opinion. In: Proceedings of the first workshop on social media analytics. 2010. p. 62–6.

  11. Cambria E, Hussain A. Sentic computing: a common-sense-based framework for concept-level sentiment analysis. Cham: Springer; 2015.

    Book  Google Scholar 

  12. Cambria E, Schuller B, Xia Y, Havasi C. Knowledge-based approaches to concept-level sentiment analysis. IEEE Intell Syst. 2013;28(2):12–4.

    Article  Google Scholar 

  13. Pang B, Lee L. Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing. 2002. p. 79–86.

  14. Pang B, Lee L. A sentimental education: sentiment analysis using subjective summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for computational linguistics. 2004. p. 271–8.

  15. Pandarachalil R, Sendhilkumar S, Mahalakshmi GS. Twitter sentiment analysis for large-scale data: an unsupervised approach. Cognit Comput. 2015;7:254–62.

    Article  Google Scholar 

  16. Cambria E, Livingstone A, Hussain A. The hourglass of emotions. Cognitive Behavioral Systems (LNCS 7403). 2012, p. 144–57.

  17. Wang QF, Cambria E, Liu CL, Hussain A. Common sense knowledge for handwritten Chinese recognition. Cognit Comput. 2013;5(2):234–42.

    Article  Google Scholar 

  18. Chen Y, Zhou Q, Luo W, Du J. Classification of Chinese texts based on recognition of semantic topics. Cognit Comput. 2013;1–11.

  19. Xu L, Lin H, Pan Y, Ren H, Chen J. Constructing the affective lexicon ontology. J China Soc Sci Tech Inf. 2008;27(2):180–5.

    Google Scholar 

  20. Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: 18th International conference on machine learning. 2001. p. 282–9.

  21. Wang G. A dictionary of Chinese praise and blame words. Beijing: Encyclopedia of China Publishing House; 2001.

    Google Scholar 

  22. Zheng H, Meng Q. A dictionary of Chinese adjective. Beijing: The Commercial Press; 2004.

    Google Scholar 

  23. Cheng Z. A dictionary of Chinese idiomatic phrases. Beijing: Encyclopedia of China Publishing House; 2003.

    Google Scholar 

  24. Yang X. A dictionary of Chinese idiom. Chengdu: SiChuan Lexicographical Publishing House; 2005.

    Google Scholar 

  25. Wang J. New century dictionary of Chinese new words. Shanghai: Great Chinese dictionary Press; 2006.

    Google Scholar 

  26. Dong D. A Chinese classified dictionary. Shanghai: Great Chinese dictionary Press; 1998.

    Google Scholar 

  27. HowNet. http://www.keenage.com/.

  28. WordNet. http://wordnet.princeton.edu/.

  29. Zhai C, Lafferty J. A study of smoothing methods for language models applied to information retrieval. Trans Inf Syst. 2004;22(2):180–216.

    Google Scholar 

  30. Blei D, Ng A, Jordan M. Latent Dirichlet allocation. J Mach Learn Res. 2003;2003(3):993–1022.

    Google Scholar 

  31. Stuart G, Donald G. Stochastic relaxation Gibbs distributions and the Bayesian restoration of images. Pattern Anal Mach Intell IEEE Trans. 1984;6:721–41.

    Google Scholar 

  32. Lavrenko V, Croft W. Relevance-based language model. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. 2001. p. 120–7.

  33. Liu X, Croft W. Cluster-based retrieval using language models. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. 2004. p. 186–93.

Download references

Funding

This work is partially supported by grant from the Natural Science Foundation of China (Nos. 61402075, 61572102, 61277370), Natural Science Foundation of Liaoning Province, China (Nos. 201202031, 2014020003), and State Education Ministry and The Research Fund for the Doctoral Program of Higher Education (No. 20090041110002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Yang.

Ethics declarations

Conflict of Interest

Liang Yang, Hongfei Lin, Yuan Lin, and Shengbo Liu declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, L., Lin, H., Lin, Y. et al. Detection and Extraction of Hot Topics on Chinese Microblogs. Cogn Comput 8, 577–586 (2016). https://doi.org/10.1007/s12559-015-9380-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-015-9380-6

Keywords

Navigation