Abstract
The real-time detection of bursty topics on microblog has acquired much research efforts in recent years, due to its wide use in a range of user-focused tasks such as information recommendation, trend analysis, and document search. Most existing methods can achieve good performance on real-time detection, but unfortunately, lack of much consideration on topic coherence and topic granularity for better semantic interpretability, which often results in odd topics hard to be interpreted. Therefore, it demands much more efforts on evaluation and improvement of the intrinsic quality of detected topics at their very early stages. In this paper, we propose a refined tensor decomposition model to effectively detect bursty topics, and at the same time, evaluate topic coherence and provide informative bursty topics with different burst levels. We evaluated our method over 7 million microblog stream. The experiment results demonstrate both efficiency in topic detection and effectiveness in topic interpretability. Specifically, our method on a single machine can consistently handle millions of microblogs per day and present ranked interpretable topics with different burst levels.
Similar content being viewed by others
References
Diao, Q., Jiang, J., Zhu, F., Lim, E.-P.: Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 536–544. Association for Computational Linguistics (2012)
Kleinberg, J.: Bursty and hierarchical structure in streams. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 91–101. ACM (2002)
Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: TopicSketch: real-time bursty topic detection from Twitter. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 837–846. IEEE (2013)
Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)
Zhu, Y., Shasha, D.: Efficient elastic burst detection in data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 336–345. ACM (2003)
Huang, J., Peng, M., Wang, H., et al.: A probabilistic method for emerging topic tracking in microblog stream. World Wide Web 20(2), 325–350 (2017)
Magdy, A., et al.: GeoTrend: spatial trending queries on real-time microblogs. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, p. 7. ACM (2016)
Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 31(1), 132–164 (2015)
Schubert, E., Weiler, M., Kriegel, H.-P.: Signitrend: scalable detection of emerging topics in textual streams by hashed significance thresholds. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 871–880. ACM (2014)
Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 155–164. ACM (2012)
Schubert, E., Weiler, M., Kriegel, H.-P.: SPOTHOT: scalable detection of geo-spatial events in large textual streams. In: Proceedings 28th International Conference on Scientific and Statistical Database Management (SSDBM) (2016)
Kim, D., Kim, D., Hwang, E., Rho, S.: TwitterTrends: a spatio-temporal trend detection and related keywords recommendation scheme. Multimedia Syst. 21(1), 73–86 (2015)
Xie, R., Zhu, F., Ma, H., Xie, W., Lin, C.: CLEar: a real-time online observatory for bursty and viral events. Proc. VLDB Endow. 7(13), 1637–1640 (2014)
Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1155–1158. ACM (2010)
Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, p. 4. ACM (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Anandkumar, A., Ge, R., Hsu, D.J., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15(1), 2773–2832 (2014)
He, D., Parker, D.S.: Topic dynamics: an alternative model of bursts in streams of topics. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 443–452. ACM (2010)
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)
Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)–Long Papers, pp. 13–22 (2013)
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: EACL, pp. 530–539 (2014)
Lau, J.H., Baldwin, T.: The Sensitivity of topic coherence evaluation to topic cardinality. In: Proceedings of NAACL-HLT, pp. 483–487 (2016)
Chang, J., Boyd-Graber, J.L., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: NIPS, vol. 31, pp. 1–9 (2009)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Lau, J.H., Newman, D., Karimi, S., Baldwin, T.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 605–613. Association for Computational Linguistics (2010)
Acknowledgments
The authors would like to thank the joint research efforts between NUDT and Eefung.com. This work is partially supported by National Key Fundamental Research and Development Program of China (No. 2013CB329601, No. 2013CB329604, No. 2013CB329606), and National Natural Science Foundation of China (No. 61502517, No. 61372191, No. 61572492). This work is also funded by the major pre-research project of National University of Defense Technology (NUDT).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhang, T., Zhou, B., Huang, J., Jia, Y., Zhang, B., Li, Z. (2017). A Refined Method for Detecting Interpretable and Real-Time Bursty Topic in Microblog Stream. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10569. Springer, Cham. https://doi.org/10.1007/978-3-319-68783-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-68783-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68782-7
Online ISBN: 978-3-319-68783-4
eBook Packages: Computer ScienceComputer Science (R0)