Skip to main content

Micro-blog Post Topic Drift Detection Based on LDA Model

  • Conference paper
Book cover Behavior and Social Computing (BSIC 2013, BSI 2013)

Abstract

Micro-blog posts imply a large number of topics, which contain a lot of useful information as well as a lot of junk information making the micro-blog post topic a characteristic of high drift. The changes of micro-blog post topic over time and noises introduced with the increase of the number of micro-blog posts are two main aspects of micro-blog post topic drift. We propose a method of topic drift detection based on LDA model, using Gibbs sampling algorithm to obtain the probability distribution of micro-blog post words based on words correlation, identifying the topic boundary in dynamic constant method, extracting topic words by computing lexical information entropy in the topic field, and detecting the topic drift by topic words sequence alignment based on discrete-time model. According to the experiment on topic drift detection based on LDA model, we find our method very effective in micro-blog post topic drift detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kang, J.H., Lerman, K., Plangprasopchok, A.: Analyzing Microblogs with affinity propagation. In: Proceedings of the 1st KDD Workshop on Social Media Analytic, pp. 67–70. ACM, New York (2010)

    Chapter  Google Scholar 

  2. Zhang, C., Sun, J., Ding, Y.: Topic Mining for Microblog Based on MB-LDA Model. Journal of Computer Research and Development 48(10), 1795–1802 (2011)

    Google Scholar 

  3. Halliday, M., Hasan, R.: Cohesion in English. Longman Group, New York (1976)

    Google Scholar 

  4. Richmond, K., Smith, A., Amitay, E.: Detecting Subject Boundaries within Text: A Language Independent Statistical Approach. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP 1997), pp. 47–54 (1997)

    Google Scholar 

  5. Nakhimovsky, A.: Aspect, assectual class, and the temporal structure of narrative. Computational Linguistics 14(2), 29–43 (1998)

    Google Scholar 

  6. Grimes, J.E.: The Thread of Discourse. Mouton, The Hague (1975)

    Google Scholar 

  7. Youmans, G.: Measuring lexical style and competence: The type token vocabulary curve. Style 24, 584–599 (1990)

    Google Scholar 

  8. Beeferman, D., Berger, A., Lafferty, J.: Statistical Models for Text Segmentation. Machine Learning 34, 177–210 (1999)

    Article  MATH  Google Scholar 

  9. Choi, F.Y.Y., Wiemar-Hastings, P., Moore, J.: Latent Semantic Analysis for Text Segmentation. In: Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing, pp. 109–117 (2001)

    Google Scholar 

  10. Shi, J., Li, W.: Research on comparison of three topic segmentation approaches. Computer Engineering and Applications 45(18), 135–138 (2009)

    Google Scholar 

  11. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  12. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences, 5228–5235 (2004)

    Google Scholar 

  13. Heinrich, G.: Parameter Estimation for Text Analysis. Technical Report, University of Leipzig, Germany (2008)

    Google Scholar 

  14. Brants, T., Chen, F., Tsochantaridis, I.: Topic-Based Document Segmentation with Probabilistic Latent Semantic Analysis. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 211–218 (2002)

    Google Scholar 

  15. Shi, J., Hu, M., Shi, X., Dai, G.: Text Segmentation Based on Model LDA. Chinese Journal of Computers 31(10), 1865–1873 (2008)

    Article  Google Scholar 

  16. Li, H., Yamanishi, K.: Topic analysis using a finite mixture model. Information Processing and Management 39(4), 521–541 (2003)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, Q., Huang, H., Feng, C. (2013). Micro-blog Post Topic Drift Detection Based on LDA Model. In: Cao, L., et al. Behavior and Social Computing. BSIC BSI 2013 2013. Lecture Notes in Computer Science(), vol 8178. Springer, Cham. https://doi.org/10.1007/978-3-319-04048-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04048-6_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04047-9

  • Online ISBN: 978-3-319-04048-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics