Abstract
Micro-blog posts imply a large number of topics, which contain a lot of useful information as well as a lot of junk information making the micro-blog post topic a characteristic of high drift. The changes of micro-blog post topic over time and noises introduced with the increase of the number of micro-blog posts are two main aspects of micro-blog post topic drift. We propose a method of topic drift detection based on LDA model, using Gibbs sampling algorithm to obtain the probability distribution of micro-blog post words based on words correlation, identifying the topic boundary in dynamic constant method, extracting topic words by computing lexical information entropy in the topic field, and detecting the topic drift by topic words sequence alignment based on discrete-time model. According to the experiment on topic drift detection based on LDA model, we find our method very effective in micro-blog post topic drift detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kang, J.H., Lerman, K., Plangprasopchok, A.: Analyzing Microblogs with affinity propagation. In: Proceedings of the 1st KDD Workshop on Social Media Analytic, pp. 67–70. ACM, New York (2010)
Zhang, C., Sun, J., Ding, Y.: Topic Mining for Microblog Based on MB-LDA Model. Journal of Computer Research and Development 48(10), 1795–1802 (2011)
Halliday, M., Hasan, R.: Cohesion in English. Longman Group, New York (1976)
Richmond, K., Smith, A., Amitay, E.: Detecting Subject Boundaries within Text: A Language Independent Statistical Approach. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP 1997), pp. 47–54 (1997)
Nakhimovsky, A.: Aspect, assectual class, and the temporal structure of narrative. Computational Linguistics 14(2), 29–43 (1998)
Grimes, J.E.: The Thread of Discourse. Mouton, The Hague (1975)
Youmans, G.: Measuring lexical style and competence: The type token vocabulary curve. Style 24, 584–599 (1990)
Beeferman, D., Berger, A., Lafferty, J.: Statistical Models for Text Segmentation. Machine Learning 34, 177–210 (1999)
Choi, F.Y.Y., Wiemar-Hastings, P., Moore, J.: Latent Semantic Analysis for Text Segmentation. In: Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing, pp. 109–117 (2001)
Shi, J., Li, W.: Research on comparison of three topic segmentation approaches. Computer Engineering and Applications 45(18), 135–138 (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences, 5228–5235 (2004)
Heinrich, G.: Parameter Estimation for Text Analysis. Technical Report, University of Leipzig, Germany (2008)
Brants, T., Chen, F., Tsochantaridis, I.: Topic-Based Document Segmentation with Probabilistic Latent Semantic Analysis. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 211–218 (2002)
Shi, J., Hu, M., Shi, X., Dai, G.: Text Segmentation Based on Model LDA. Chinese Journal of Computers 31(10), 1865–1873 (2008)
Li, H., Yamanishi, K.: Topic analysis using a finite mixture model. Information Processing and Management 39(4), 521–541 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, Q., Huang, H., Feng, C. (2013). Micro-blog Post Topic Drift Detection Based on LDA Model. In: Cao, L., et al. Behavior and Social Computing. BSIC BSI 2013 2013. Lecture Notes in Computer Science(), vol 8178. Springer, Cham. https://doi.org/10.1007/978-3-319-04048-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-04048-6_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04047-9
Online ISBN: 978-3-319-04048-6
eBook Packages: Computer ScienceComputer Science (R0)