Abstract
It is a critical task to infer discriminative and coherent topics from short texts. Furthermore, people not only want to know what kinds of topics can be extract from these short texts, but also desire to obtain the temporal dynamic evolution of these topics. In this paper, we present a novel model for short texts, referred as topic trend detection (TTD) model. Based on an optimized topic model we proposed, TTD model derives more typical terms and itemsets to represent topics of short texts and improves the coherence of topic representations. Ultimately, we extend the topic itemsets obtained from the optimized topic model by word embeddings to detect topic trends. Through extensive experiments on several real-world short text collections in Sina Microblog, the result demonstrate our method achieves comparable topic representations than state-of-the-art models, measured by topic coherence, and then show its application in identifying topic trends in Sina Microblog.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
In the following paper, the event name on microblog will be replaced by English to avoid the Chinese problems in Tex.
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR (1999)
Li, C., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Topic modeling for short texts with auxiliary word embeddings. In: SIGIR (2016)
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed Gibbs sampling for latent Dirichlet allocation. In: SIGKDD (2008)
Mikolov, T., Chen, K., Corrada, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP (2014)
Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 775–784 (2011)
Wang, J., Li, L., Tan, F., Zhu, Y., Feng, W.: Detecting hotspot information using multi-attribute based topic model. Plos One 10(10), e0140539 (2015)
Zhang, C., Sun, J.: Large scale microblog mining using distributed MB-LDA. In: WWW Companion (2012)
Rumelhar, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back propagating errors. 323(6088), 533–536 (1988). MIT Press
Nguyen, D.Q., Billingsley, R., Du, L., Johnson, M.: Improving topic models with latent feature word representations. TACL 3, 299–313 (2015)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning (ICML) (2006)
Nigam, K., MacCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)
Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD (2014)
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: NIPS (2005)
Yan, X., Guo, J., Lan, Y., Chen, X.: A biterm topic model for short texts. In: WWW (2013)
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: SIGKDD (2011)
Hong, L., Yin, D., Guo, J., Davison, B.D.: Tracking trends: incorporating term volume into temporal topic models. In: SIGKDD (2015)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2009)
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., PaÅŸca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of NAACL (2009)
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1994)
Liu, Y., Liu, Z., Chua, T.-S., Sun, M.: Topical word embeddings. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Reisinger, J., Mooney, R.J.: Multi-prototype vector-space models of word meaning. In: Proceedings of HLT-NAACL (2010)
Newman, D., Karimi, S., Cavedon, L.: External evaluation of topic models. In: Proceedings of ADCS, pp. 11–18 (2009)
Acknowledgement
This work is funded by the National Natural Science Foundation of China under Grant No. 61472329, No. 61532009 and the Innovation Fund of Xihua University. We would like to thank the anonymous reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
He, L., Du, Y., Ye, Y. (2017). Tracking Topic Trends for Short Texts. In: Li, J., Zhou, M., Qi, G., Lao, N., Ruan, T., Du, J. (eds) Knowledge Graph and Semantic Computing. Language, Knowledge, and Intelligence. CCKS 2017. Communications in Computer and Information Science, vol 784. Springer, Singapore. https://doi.org/10.1007/978-981-10-7359-5_12
Download citation
DOI: https://doi.org/10.1007/978-981-10-7359-5_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7358-8
Online ISBN: 978-981-10-7359-5
eBook Packages: Computer ScienceComputer Science (R0)