Tracking Topic Trends for Short Texts

He, Liyan; Du, Yajun; Ye, Yongtao

doi:10.1007/978-981-10-7359-5_12

Liyan He¹⁵,
Yajun Du¹⁵ &
Yongtao Ye¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 784))

Included in the following conference series:

China Conference on Knowledge Graph and Semantic Computing

1049 Accesses
1 Citations

Abstract

It is a critical task to infer discriminative and coherent topics from short texts. Furthermore, people not only want to know what kinds of topics can be extract from these short texts, but also desire to obtain the temporal dynamic evolution of these topics. In this paper, we present a novel model for short texts, referred as topic trend detection (TTD) model. Based on an optimized topic model we proposed, TTD model derives more typical terms and itemsets to represent topics of short texts and improves the coherence of topic representations. Ultimately, we extend the topic itemsets obtained from the optimized topic model by word embeddings to detect topic trends. Through extensive experiments on several real-world short text collections in Sina Microblog, the result demonstrate our method achieves comparable topic representations than state-of-the-art models, measured by topic coherence, and then show its application in identifying topic trends in Sina Microblog.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.zhiweidata.com/.
2.
In the following paper, the event name on microblog will be replaced by English to avoid the Chinese problems in Tex.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR (1999)
Google Scholar
Li, C., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Topic modeling for short texts with auxiliary word embeddings. In: SIGIR (2016)
Google Scholar
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed Gibbs sampling for latent Dirichlet allocation. In: SIGKDD (2008)
Google Scholar
Mikolov, T., Chen, K., Corrada, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP (2014)
Google Scholar
Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 775–784 (2011)
Google Scholar
Wang, J., Li, L., Tan, F., Zhu, Y., Feng, W.: Detecting hotspot information using multi-attribute based topic model. Plos One 10(10), e0140539 (2015)
Article Google Scholar
Zhang, C., Sun, J.: Large scale microblog mining using distributed MB-LDA. In: WWW Companion (2012)
Google Scholar
Rumelhar, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back propagating errors. 323(6088), 533–536 (1988). MIT Press
Google Scholar
Nguyen, D.Q., Billingsley, R., Du, L., Johnson, M.: Improving topic models with latent feature word representations. TACL 3, 299–313 (2015)
Google Scholar
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning (ICML) (2006)
Google Scholar
Nigam, K., MacCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)
Article MATH Google Scholar
Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34
Chapter Google Scholar
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD (2014)
Google Scholar
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: NIPS (2005)
Google Scholar
Yan, X., Guo, J., Lan, Y., Chen, X.: A biterm topic model for short texts. In: WWW (2013)
Google Scholar
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: SIGKDD (2011)
Google Scholar
Hong, L., Yin, D., Guo, J., Davison, B.D.: Tracking trends: incorporating term volume into temporal topic models. In: SIGKDD (2015)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2009)
Google Scholar
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of NAACL (2009)
Google Scholar
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1994)
Google Scholar
Liu, Y., Liu, Z., Chua, T.-S., Sun, M.: Topical word embeddings. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Reisinger, J., Mooney, R.J.: Multi-prototype vector-space models of word meaning. In: Proceedings of HLT-NAACL (2010)
Google Scholar
Newman, D., Karimi, S., Cavedon, L.: External evaluation of topic models. In: Proceedings of ADCS, pp. 11–18 (2009)
Google Scholar

Download references

Acknowledgement

This work is funded by the National Natural Science Foundation of China under Grant No. 61472329, No. 61532009 and the Innovation Fund of Xihua University. We would like to thank the anonymous reviewers for their helpful comments.

Author information

Authors and Affiliations

School of Computer and Software Engineering, Xihua University, Chengdu, 610039, China
Liyan He, Yajun Du & Yongtao Ye

Authors

Liyan He
View author publications
You can also search for this author in PubMed Google Scholar
Yajun Du
View author publications
You can also search for this author in PubMed Google Scholar
Yongtao Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liyan He .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Juanzi Li
Beijing Xigema Center, Beijing, China
Ming Zhou
School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China
Guilin Qi
Google, Mountain View, California, USA
Ni Lao
East China University of Science and Technology, Shanghai, China
Tong Ruan
Guangdong University of Foreign Studies, Guangzhou, China
Jianfeng Du

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, L., Du, Y., Ye, Y. (2017). Tracking Topic Trends for Short Texts. In: Li, J., Zhou, M., Qi, G., Lao, N., Ruan, T., Du, J. (eds) Knowledge Graph and Semantic Computing. Language, Knowledge, and Intelligence. CCKS 2017. Communications in Computer and Information Science, vol 784. Springer, Singapore. https://doi.org/10.1007/978-981-10-7359-5_12

Download citation

DOI: https://doi.org/10.1007/978-981-10-7359-5_12
Published: 20 January 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7358-8
Online ISBN: 978-981-10-7359-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics