Skip to main content

Counteracting Novelty Decay in First Story Detection

  • Conference paper
  • First Online:
  • 2466 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10193))

Abstract

In this paper we explore the impact of processing unbounded data streams on First Story Detection (FSD) accuracy. In particular, we study three different types of FSD algorithms: comparison-based, LSH-based and k-term based FSD. Our experiments reveal for the first time that the novelty score of all three algorithms decay over time. We explain why the decay is linked to the increased space saturation and negatively affects detection accuracy. We provide a mathematical decay model, which allows compensating observed novelty scores by their expected decay. Our experiments show significantly increased performance when counteracting the novelty score decay.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Available at: http://demeter.inf.ed.ac.uk/cross/.

References

  1. Allan, J.: Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers, Norwell (2002)

    Book  MATH  Google Scholar 

  2. Allan, J., Lavrenko, V., Jin, H.: First story detection in TDT is hard. In: Proceedings of ACM (2000)

    Google Scholar 

  3. Luo, G., Tang, C., Yu, S.: Resource-adaptive real-time new event detection. In: Proceedings of the 2007 ACM SIGMOD (2007)

    Google Scholar 

  4. Petrovic, S.: Real-time event detection in massive streams. Ph.D. thesis, School of Informatics, University of Edinburgh (2013)

    Google Scholar 

  5. Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: HLT 2010 (2010)

    Google Scholar 

  6. Wurzer, D., Lavrenko, V., Osborne, M.: Twitter-scal new event detection via K-term hashing. In: EMNLP 2015 (2015)

    Google Scholar 

  7. TDT by NIST - 1998–2004. http://www.itl.nist.gov/tdt/ (2008)

  8. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  9. Cataldi, M., Caro, L.D., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the 10th International Workshop on Multimedia Data Mining, pp. 4:1–4:10. ACM (2010)

    Google Scholar 

  10. Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.: TEDAS: a Twitter-based event detection and analysis system. In: Proceedings of 28th International Conference on Data Engineering, pp. 1273–1276. IEEE Computer Society (2012)

    Google Scholar 

  11. Phuvipadawat, S., Murata, T.: Breaking news detection and tracking in Twitter. In: Proceedings of the 2010 IEEE/WIC/ACM International (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominik Wurzer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Qin, Y., Wurzer, D., Lavrenko, V., Tang, C. (2017). Counteracting Novelty Decay in First Story Detection. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56608-5_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56607-8

  • Online ISBN: 978-3-319-56608-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics