Skip to main content

A Refined Method for Detecting Interpretable and Real-Time Bursty Topic in Microblog Stream

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2017 (WISE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10569))

Included in the following conference series:

Abstract

The real-time detection of bursty topics on microblog has acquired much research efforts in recent years, due to its wide use in a range of user-focused tasks such as information recommendation, trend analysis, and document search. Most existing methods can achieve good performance on real-time detection, but unfortunately, lack of much consideration on topic coherence and topic granularity for better semantic interpretability, which often results in odd topics hard to be interpreted. Therefore, it demands much more efforts on evaluation and improvement of the intrinsic quality of detected topics at their very early stages. In this paper, we propose a refined tensor decomposition model to effectively detect bursty topics, and at the same time, evaluate topic coherence and provide informative bursty topics with different burst levels. We evaluated our method over 7 million microblog stream. The experiment results demonstrate both efficiency in topic detection and effectiveness in topic interpretability. Specifically, our method on a single machine can consistently handle millions of microblogs per day and present ranked interpretable topics with different burst levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    http://www.eefung.com/.

  2. 2.

    http://research.pinnacle.smu.edu.sg/clear/.

  3. 3.

    https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch.

References

  1. Diao, Q., Jiang, J., Zhu, F., Lim, E.-P.: Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 536–544. Association for Computational Linguistics (2012)

    Google Scholar 

  2. Kleinberg, J.: Bursty and hierarchical structure in streams. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 91–101. ACM (2002)

    Google Scholar 

  3. Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: TopicSketch: real-time bursty topic detection from Twitter. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 837–846. IEEE (2013)

    Google Scholar 

  4. Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)

    Article  Google Scholar 

  5. Zhu, Y., Shasha, D.: Efficient elastic burst detection in data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 336–345. ACM (2003)

    Google Scholar 

  6. Huang, J., Peng, M., Wang, H., et al.: A probabilistic method for emerging topic tracking in microblog stream. World Wide Web 20(2), 325–350 (2017)

    Article  Google Scholar 

  7. Magdy, A., et al.: GeoTrend: spatial trending queries on real-time microblogs. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, p. 7. ACM (2016)

    Google Scholar 

  8. Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 31(1), 132–164 (2015)

    Article  MathSciNet  Google Scholar 

  9. Schubert, E., Weiler, M., Kriegel, H.-P.: Signitrend: scalable detection of emerging topics in textual streams by hashed significance thresholds. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 871–880. ACM (2014)

    Google Scholar 

  10. Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 155–164. ACM (2012)

    Google Scholar 

  11. Schubert, E., Weiler, M., Kriegel, H.-P.: SPOTHOT: scalable detection of geo-spatial events in large textual streams. In: Proceedings 28th International Conference on Scientific and Statistical Database Management (SSDBM) (2016)

    Google Scholar 

  12. Kim, D., Kim, D., Hwang, E., Rho, S.: TwitterTrends: a spatio-temporal trend detection and related keywords recommendation scheme. Multimedia Syst. 21(1), 73–86 (2015)

    Article  Google Scholar 

  13. Xie, R., Zhu, F., Ma, H., Xie, W., Lin, C.: CLEar: a real-time online observatory for bursty and viral events. Proc. VLDB Endow. 7(13), 1637–1640 (2014)

    Article  Google Scholar 

  14. Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1155–1158. ACM (2010)

    Google Scholar 

  15. Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, p. 4. ACM (2010)

    Google Scholar 

  16. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  17. Anandkumar, A., Ge, R., Hsu, D.J., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15(1), 2773–2832 (2014)

    MathSciNet  MATH  Google Scholar 

  18. He, D., Parker, D.S.: Topic dynamics: an alternative model of bursts in streams of topics. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 443–452. ACM (2010)

    Google Scholar 

  19. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)

    Google Scholar 

  20. Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)–Long Papers, pp. 13–22 (2013)

    Google Scholar 

  21. Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: EACL, pp. 530–539 (2014)

    Google Scholar 

  22. Lau, J.H., Baldwin, T.: The Sensitivity of topic coherence evaluation to topic cardinality. In: Proceedings of NAACL-HLT, pp. 483–487 (2016)

    Google Scholar 

  23. Chang, J., Boyd-Graber, J.L., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: NIPS, vol. 31, pp. 1–9 (2009)

    Google Scholar 

  24. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MathSciNet  Google Scholar 

  25. Lau, J.H., Newman, D., Karimi, S., Baldwin, T.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 605–613. Association for Computational Linguistics (2010)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank the joint research efforts between NUDT and Eefung.com. This work is partially supported by National Key Fundamental Research and Development Program of China (No. 2013CB329601, No. 2013CB329604, No. 2013CB329606), and National Natural Science Foundation of China (No. 61502517, No. 61372191, No. 61572492). This work is also funded by the major pre-research project of National University of Defense Technology (NUDT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhang, T., Zhou, B., Huang, J., Jia, Y., Zhang, B., Li, Z. (2017). A Refined Method for Detecting Interpretable and Real-Time Bursty Topic in Microblog Stream. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10569. Springer, Cham. https://doi.org/10.1007/978-3-319-68783-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68783-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68782-7

  • Online ISBN: 978-3-319-68783-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics