Abstract
In this paper we explore the impact of processing unbounded data streams on First Story Detection (FSD) accuracy. In particular, we study three different types of FSD algorithms: comparison-based, LSH-based and k-term based FSD. Our experiments reveal for the first time that the novelty score of all three algorithms decay over time. We explain why the decay is linked to the increased space saturation and negatively affects detection accuracy. We provide a mathematical decay model, which allows compensating observed novelty scores by their expected decay. Our experiments show significantly increased performance when counteracting the novelty score decay.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Available at: http://demeter.inf.ed.ac.uk/cross/.
References
Allan, J.: Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers, Norwell (2002)
Allan, J., Lavrenko, V., Jin, H.: First story detection in TDT is hard. In: Proceedings of ACM (2000)
Luo, G., Tang, C., Yu, S.: Resource-adaptive real-time new event detection. In: Proceedings of the 2007 ACM SIGMOD (2007)
Petrovic, S.: Real-time event detection in massive streams. Ph.D. thesis, School of Informatics, University of Edinburgh (2013)
Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: HLT 2010 (2010)
Wurzer, D., Lavrenko, V., Osborne, M.: Twitter-scal new event detection via K-term hashing. In: EMNLP 2015 (2015)
TDT by NIST - 1998–2004. http://www.itl.nist.gov/tdt/ (2008)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Cataldi, M., Caro, L.D., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the 10th International Workshop on Multimedia Data Mining, pp. 4:1–4:10. ACM (2010)
Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.: TEDAS: a Twitter-based event detection and analysis system. In: Proceedings of 28th International Conference on Data Engineering, pp. 1273–1276. IEEE Computer Society (2012)
Phuvipadawat, S., Murata, T.: Breaking news detection and tracking in Twitter. In: Proceedings of the 2010 IEEE/WIC/ACM International (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Qin, Y., Wurzer, D., Lavrenko, V., Tang, C. (2017). Counteracting Novelty Decay in First Story Detection. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-56608-5_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56607-8
Online ISBN: 978-3-319-56608-5
eBook Packages: Computer ScienceComputer Science (R0)