A Refined Method for Detecting Interpretable and Real-Time Bursty Topic in Microblog Stream

Zhang, Tao; Zhou, Bin; Huang, Jiuming; Jia, Yan; Zhang, Bing; Li, Zhi

doi:10.1007/978-3-319-68783-4_1

Tao Zhang²⁴,
Bin Zhou²⁴,
Jiuming Huang²⁴,
Yan Jia²⁴,
Bing Zhang²⁵ &
…
Zhi Li²⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10569))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1587 Accesses

Abstract

The real-time detection of bursty topics on microblog has acquired much research efforts in recent years, due to its wide use in a range of user-focused tasks such as information recommendation, trend analysis, and document search. Most existing methods can achieve good performance on real-time detection, but unfortunately, lack of much consideration on topic coherence and topic granularity for better semantic interpretability, which often results in odd topics hard to be interpreted. Therefore, it demands much more efforts on evaluation and improvement of the intrinsic quality of detected topics at their very early stages. In this paper, we propose a refined tensor decomposition model to effectively detect bursty topics, and at the same time, evaluate topic coherence and provide informative bursty topics with different burst levels. We evaluated our method over 7 million microblog stream. The experiment results demonstrate both efficiency in topic detection and effectiveness in topic interpretability. Specifically, our method on a single machine can consistently handle millions of microblogs per day and present ranked interpretable topics with different burst levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Dynamic topic modeling via self-aggregation for short text streams

Article 14 November 2018

Topic-Level Bursty Study for Bursty Topic Detection in Microblogs

Tracking Topic Trends for Short Texts

Notes

References

Diao, Q., Jiang, J., Zhu, F., Lim, E.-P.: Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 536–544. Association for Computational Linguistics (2012)
Google Scholar
Kleinberg, J.: Bursty and hierarchical structure in streams. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 91–101. ACM (2002)
Google Scholar
Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: TopicSketch: real-time bursty topic detection from Twitter. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 837–846. IEEE (2013)
Google Scholar
Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)
Article Google Scholar
Zhu, Y., Shasha, D.: Efficient elastic burst detection in data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 336–345. ACM (2003)
Google Scholar
Huang, J., Peng, M., Wang, H., et al.: A probabilistic method for emerging topic tracking in microblog stream. World Wide Web 20(2), 325–350 (2017)
Article Google Scholar
Magdy, A., et al.: GeoTrend: spatial trending queries on real-time microblogs. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, p. 7. ACM (2016)
Google Scholar
Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 31(1), 132–164 (2015)
Article MathSciNet Google Scholar
Schubert, E., Weiler, M., Kriegel, H.-P.: Signitrend: scalable detection of emerging topics in textual streams by hashed significance thresholds. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 871–880. ACM (2014)
Google Scholar
Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 155–164. ACM (2012)
Google Scholar
Schubert, E., Weiler, M., Kriegel, H.-P.: SPOTHOT: scalable detection of geo-spatial events in large textual streams. In: Proceedings 28th International Conference on Scientific and Statistical Database Management (SSDBM) (2016)
Google Scholar
Kim, D., Kim, D., Hwang, E., Rho, S.: TwitterTrends: a spatio-temporal trend detection and related keywords recommendation scheme. Multimedia Syst. 21(1), 73–86 (2015)
Article Google Scholar
Xie, R., Zhu, F., Ma, H., Xie, W., Lin, C.: CLEar: a real-time online observatory for bursty and viral events. Proc. VLDB Endow. 7(13), 1637–1640 (2014)
Article Google Scholar
Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1155–1158. ACM (2010)
Google Scholar
Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, p. 4. ACM (2010)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Anandkumar, A., Ge, R., Hsu, D.J., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15(1), 2773–2832 (2014)
MathSciNet MATH Google Scholar
He, D., Parker, D.S.: Topic dynamics: an alternative model of bursts in streams of topics. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 443–452. ACM (2010)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)
Google Scholar
Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)–Long Papers, pp. 13–22 (2013)
Google Scholar
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: EACL, pp. 530–539 (2014)
Google Scholar
Lau, J.H., Baldwin, T.: The Sensitivity of topic coherence evaluation to topic cardinality. In: Proceedings of NAACL-HLT, pp. 483–487 (2016)
Google Scholar
Chang, J., Boyd-Graber, J.L., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: NIPS, vol. 31, pp. 1–9 (2009)
Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Article MathSciNet Google Scholar
Lau, J.H., Newman, D., Karimi, S., Baldwin, T.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 605–613. Association for Computational Linguistics (2010)
Google Scholar

Download references

Acknowledgments

The authors would like to thank the joint research efforts between NUDT and Eefung.com. This work is partially supported by National Key Fundamental Research and Development Program of China (No. 2013CB329601, No. 2013CB329604, No. 2013CB329606), and National Natural Science Foundation of China (No. 61502517, No. 61372191, No. 61572492). This work is also funded by the major pre-research project of National University of Defense Technology (NUDT).

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, Hunan, China
Tao Zhang, Bin Zhou, Jiuming Huang & Yan Jia
Hunan Eefung Software Co., Ltd., Changsha, Hunan, China
Bing Zhang & Zhi Li

Authors

Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jiuming Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Bing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Zhang .

Editor information

Editors and Affiliations

University of Sydney, Darlington, NSW, Australia
Athman Bouguettaya
Zhejiang University, Hangzhou, China
Yunjun Gao
Institute of Computing for Physics and Technology, Protvino, Russia
Andrey Klimenko
Nanyang Technological University, Singapore, Singapore
Lu Chen
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Xiangliang Zhang
Institute of Computing for Physics and Technology, Protvino, Russia
Fedor Dzerzhinskiy
Shanghai Jiao Tong University, Minhang Qu, China
Weijia Jia
Institute of Computing for Physics and Technology, Protvino, Russia
Stanislav V. Klimenko
City University of Hong Kong, Kowloon, Hong Kong
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, T., Zhou, B., Huang, J., Jia, Y., Zhang, B., Li, Z. (2017). A Refined Method for Detecting Interpretable and Real-Time Bursty Topic in Microblog Stream. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10569. Springer, Cham. https://doi.org/10.1007/978-3-319-68783-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-68783-4_1
Published: 04 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68782-7
Online ISBN: 978-3-319-68783-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Refined Method for Detecting Interpretable and Real-Time Bursty Topic in Microblog Stream

Abstract

Access this chapter

Similar content being viewed by others

Dynamic topic modeling via self-aggregation for short text streams

Topic-Level Bursty Study for Bursty Topic Detection in Microblogs

Tracking Topic Trends for Short Texts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Refined Method for Detecting Interpretable and Real-Time Bursty Topic in Microblog Stream

Abstract

Access this chapter

Similar content being viewed by others

Dynamic topic modeling via self-aggregation for short text streams

Topic-Level Bursty Study for Bursty Topic Detection in Microblogs

Tracking Topic Trends for Short Texts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation