Abstract
Nowadays, social network has become a powerful information source. At the advent of new services like WeChat Official Account, long-text contents have been embedded into social network. Compared with tweet-style contents, long-text contents are better-organized and less prone to noise. However, existing methods for real-time topic detection leveraging long-textual data do not produce satisfactory performance on sensitivity and scalability, and long-text based trend prediction methods are also facing absence of stronger rationales. In this paper, we propose a framework specifically adapted for long-text based topic analysis, covering both topic detection and popularity prediction. For topic detection, we design a novel real-time topic model dubbed as a Cost-Effective And Scalable Embedding model (CEASE) based on improved GloVe Models and keyword frequency clustering algorithm. We then propose strategies for topic tracking and renewal by taking topic abortion, mergence and neologies into account. For popularity prediction, we propose Feature-Combined Bass model with Association Analysis (FCA-Bass) with a strong rationale transplanted from economic fields. Our methods are validated by experiments on real-world dataset from WeChat and are proved to outperform several currently existing mainstream methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bass, F.M.: A new product growth for model consumer durables. MS 15(5), 215–227 (1969)
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on twitter. ICWS 11, 438–441 (2011)
Brants, T., Chen, F.: A system for new event detection. In: SIGIR, pp. 330–337 (2003)
Elshamy, W.: Continuous-time infinite dynamic topic models. arXiv:1302.7088 (2013)
Figueiredo, F., Almeida, J.M., Gonçalves, M.A., Benevenuto, F.: TrendLearner: early prediction of popularity trends of user generated content. IS 349, 172–187 (2016)
Gao, S., Ma, J., Chen, Z.: Effective and effortless features for popularity prediction in microblogging network. In: WWW, pp. 269–270 (2014)
Kasiviswanathan, S., Melville, P., Banerjee, A., Sindhwani, V.: Emerging topic detection using dictionary learning. In: CIKM, pp. 745–754 (2011)
Kong, S., Mei, Q., Feng, L., Ye, F., Zhao, Z.: Predicting bursts and popularity of hashtags in real-time. In: SIGIR, pp. 927–930 (2014)
Kong, S., Ye, F., Feng, L., Zhao, Z.: Towards the prediction problems of bursting hashtags on twitter. JASIST 66(12), 2566–2579 (2015)
Kupavskii, A., et al.: Prediction of retweet cascade size over time. In: CIKM, pp. 2335–2338 (2012)
Ma, X., Gao, X., Chen, G.: Beep: a Bayesian perspective early stage event prediction model for online social networks. In: ICDM, pp. 973–978 (2017)
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: EMNLP, pp. 1–8 (2004)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
Naaman, M., Becker, H., Gravano, L.: Hip and Trendy: characterizing emerging trends on twitter. JASIST 62(5), 902–918 (2011)
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Proskurnia, J., Mavlyutov, R., Castillo, C., Aberer, K., Mauroux, P.: Efficient document filtering using vector space topic expansion and pattern-mining: the case of event detection in microposts. In: CIKM, pp. 457–466 (2017)
Rosenfeld, N., Nitzan, M., Globerson, A.: Discriminative learning of infection models. In: WSDM, pp. 563–572 (2016)
Tang, X., Yang, C.: Tut: a statistical model for detecting trends, topics and user interests in social media. In: CIKM, pp. 972–981 (2012)
Wang, C., Paisley, J., Blei, D.: Online variational inference for the hierarchical Dirichlet process. In: AISTATS, pp. 752–760 (2011)
Yan, Y., Tan, Z., Gao, X., Tang, S., Chen, G.: STH-Bass: a spatial-temporal heterogeneous bass model to predict single-tweet popularity. In: DASFAA, pp. 18–32 (2016)
Zhao, Q., Erdogdu, M.A., He, H.Y., Rajaraman, A., Leskovec, J.: SEISMIC: a self-exciting point process model for predicting tweet popularity. In: KDD, pp. 1513–1522 (2015)
Acknowledgements
This work is supported by the National Key R&D Program of China (2018YFB1004703), the National Natural Science Foundation of China (61872238, 61672348, 61672353), the Shanghai Science and Technology Fund (17510740200), the CCF-Tencent Open Research Fund (RAGR20170114), and Huawei Innovation Research Program (HO2018085286), and the National Key Research of China (2018YFB1003800). Quanquan Chu finished the experiments in this paper when he was an intern at Tencent Shenzhen. The authors also would like to thank Chunxia Jia, Yiming Zhang, Chao Wang, and Tianxiang Gao for their contributions on this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Chu, Q., Cao, Z., Gao, X., He, P., Deng, Q., Chen, G. (2018). Cease with Bass: A Framework for Real-Time Topic Detection and Popularity Prediction Based on Long-Text Contents. In: Chen, X., Sen, A., Li, W., Thai, M. (eds) Computational Data and Social Networks. CSoNet 2018. Lecture Notes in Computer Science(), vol 11280. Springer, Cham. https://doi.org/10.1007/978-3-030-04648-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-04648-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04647-7
Online ISBN: 978-3-030-04648-4
eBook Packages: Computer ScienceComputer Science (R0)