Abstract
Twitter First Story Detection (FSD) task refers to the detection of first tweet about the new event in tweet stream, which is a hard but important task in twitter event detection. However, the number of comparisons is too large in traditional online detection methods and there is a lack of global information of each event during detection process. To deal with the shortcomings above, we propose a novel FSD method based on nugget, which can describe the event concisely. Our approach generates and updates dynamically a nugget for each detected event in the process of detection. When a new tweet arrives, it is first compared with the nugget of each event, to be clustered into the event when it hits the nugget. Otherwise it is compared with individual tweets in the event. Our method improves the detection accuracy and reduces the number of comparisons. The experimental results on two public data sets show that our system has reached the state-of-the-art. Besides, we prove theoretically that our method possesses advantages in efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A bucket contains some tweets about the same event which is detected already.
- 2.
ftp://jaguar.ncsl.nist.gov/current_docs/TDT3eval/TDT3fsd.pl.
References
McMinn, A.J, Moshfeghi, Y., Jose, J.M.: Building a large-scale corpus for evaluating event detection on Twitter. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 409–418. ACM (2013)
Sankaranarayanan, J., Samet, H., Teitler, B.E. et al.: Twitterstand: news in tweets. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 42–51. ACM (2009)
Allan, J.: Topic Detection and Tracking: Eventbased Information Organization. Kluwer Academic Publishers, Norwell (2002)
TDT 2004: Annotation manual (2004)
Allan, J., Lavrenko, V., Jin, H.: First story detection in TDT is hard. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 374–381. ACM (2000)
Zhang, K., Zi, J., Wu, L.G.: New event detection based on indexing-tree and named entity. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 215–222. ACM (2007)
Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 28–36. ACM (1998)
Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 181–189. Association for Computational Linguistics (2010)
Allan, J.: Introduction to topic detection and tracking. In: Allan, J. (ed.) Topic Detection and Tracking. The Information Retrieval Series, vol. 12, pp. 1–16. Springer, US (2002)
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on Twitter[J]. In: ICWSM, vol. 11, pp. 438–441 (2011)
Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–45. ACM (1998)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, vol. 99, pp. 518–529 (1999)
Petrovic, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 338–346. Association for Computational Linguistics (2012)
Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using Twitter. In: Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces, pp. 189–198. ACM (2012)
Chakrabarti, D., Punera, K.: Event summarization using Tweets[J]. In: ICWSM, vol. 11, pp. 66–73 (2011)
Popescu, A.M., Pennacchiotti, M., Paranjpe, D.: Extracting events and event descriptions from Twitter. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 105–106. ACM (2011)
Yang, X., Ghoting, A., Ruan, Y. et al.: A framework for summarizing and analyzing Twitter feeds. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 370–378. ACM (2012)
Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 563–572. ACM (2012). McMinn, A.J., Moshfeghi, Y., Jose, J.M
Spina, D., Gonzalo, J., Amig, E.: Learning similarity functions for topic detection in online reputation monitoring. In: Proceedings of the 37th International ACM SIGIR Conference on Research Development in Information Retrieval, pp. 527–536. ACM (2014)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media Singapore
About this paper
Cite this paper
Qiu, Y., Li, S., Li, R., Wang, L., Wang, B. (2015). Nugget-Based First Story Detection in Twitter Stream. In: Zhang, X., Sun, M., Wang, Z., Huang, X. (eds) Social Media Processing. SMP 2015. Communications in Computer and Information Science, vol 568. Springer, Singapore. https://doi.org/10.1007/978-981-10-0080-5_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-0080-5_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0079-9
Online ISBN: 978-981-10-0080-5
eBook Packages: Computer ScienceComputer Science (R0)