Skip to main content

Nugget-Based First Story Detection in Twitter Stream

  • Conference paper
  • First Online:
Social Media Processing (SMP 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 568))

Included in the following conference series:

Abstract

Twitter First Story Detection (FSD) task refers to the detection of first tweet about the new event in tweet stream, which is a hard but important task in twitter event detection. However, the number of comparisons is too large in traditional online detection methods and there is a lack of global information of each event during detection process. To deal with the shortcomings above, we propose a novel FSD method based on nugget, which can describe the event concisely. Our approach generates and updates dynamically a nugget for each detected event in the process of detection. When a new tweet arrives, it is first compared with the nugget of each event, to be clustered into the event when it hits the nugget. Otherwise it is compared with individual tweets in the event. Our method improves the detection accuracy and reduces the number of comparisons. The experimental results on two public data sets show that our system has reached the state-of-the-art. Besides, we prove theoretically that our method possesses advantages in efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A bucket contains some tweets about the same event which is detected already.

  2. 2.

    ftp://jaguar.ncsl.nist.gov/current_docs/TDT3eval/TDT3fsd.pl.

References

  1. McMinn, A.J, Moshfeghi, Y., Jose, J.M.: Building a large-scale corpus for evaluating event detection on Twitter. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 409–418. ACM (2013)

    Google Scholar 

  2. Sankaranarayanan, J., Samet, H., Teitler, B.E. et al.: Twitterstand: news in tweets. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 42–51. ACM (2009)

    Google Scholar 

  3. Allan, J.: Topic Detection and Tracking: Eventbased Information Organization. Kluwer Academic Publishers, Norwell (2002)

    Book  MATH  Google Scholar 

  4. TDT 2004: Annotation manual (2004)

    Google Scholar 

  5. Allan, J., Lavrenko, V., Jin, H.: First story detection in TDT is hard. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 374–381. ACM (2000)

    Google Scholar 

  6. Zhang, K., Zi, J., Wu, L.G.: New event detection based on indexing-tree and named entity. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 215–222. ACM (2007)

    Google Scholar 

  7. Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 28–36. ACM (1998)

    Google Scholar 

  8. Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 181–189. Association for Computational Linguistics (2010)

    Google Scholar 

  9. Allan, J.: Introduction to topic detection and tracking. In: Allan, J. (ed.) Topic Detection and Tracking. The Information Retrieval Series, vol. 12, pp. 1–16. Springer, US (2002)

    Chapter  Google Scholar 

  10. Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on Twitter[J]. In: ICWSM, vol. 11, pp. 438–441 (2011)

    Google Scholar 

  11. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–45. ACM (1998)

    Google Scholar 

  12. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, vol. 99, pp. 518–529 (1999)

    Google Scholar 

  13. Petrovic, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 338–346. Association for Computational Linguistics (2012)

    Google Scholar 

  14. Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using Twitter. In: Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces, pp. 189–198. ACM (2012)

    Google Scholar 

  15. Chakrabarti, D., Punera, K.: Event summarization using Tweets[J]. In: ICWSM, vol. 11, pp. 66–73 (2011)

    Google Scholar 

  16. Popescu, A.M., Pennacchiotti, M., Paranjpe, D.: Extracting events and event descriptions from Twitter. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 105–106. ACM (2011)

    Google Scholar 

  17. Yang, X., Ghoting, A., Ruan, Y. et al.: A framework for summarizing and analyzing Twitter feeds. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 370–378. ACM (2012)

    Google Scholar 

  18. Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 563–572. ACM (2012). McMinn, A.J., Moshfeghi, Y., Jose, J.M

    Google Scholar 

  19. Spina, D., Gonzalo, J., Amig, E.: Learning similarity functions for topic detection in online reputation monitoring. In: Proceedings of the 37th International ACM SIGIR Conference on Research Development in Information Retrieval, pp. 527–536. ACM (2014)

    Google Scholar 

  20. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media Singapore

About this paper

Cite this paper

Qiu, Y., Li, S., Li, R., Wang, L., Wang, B. (2015). Nugget-Based First Story Detection in Twitter Stream. In: Zhang, X., Sun, M., Wang, Z., Huang, X. (eds) Social Media Processing. SMP 2015. Communications in Computer and Information Science, vol 568. Springer, Singapore. https://doi.org/10.1007/978-981-10-0080-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0080-5_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0079-9

  • Online ISBN: 978-981-10-0080-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics