ABSTRACT
Wikipedia encyclopaedia projects, which consist of vast collections of user-edited articles covering a wide range of topics, are among some of the most popular websites on internet. With so many users working collaboratively, mainstream events are often very quickly reflected by both authors editing content and users reading articles. With temporal signals such as changing article content, page viewing activity and the link graph readily available, Wikipedia has gained attention in recent years as a source of temporal event information. This paper serves as an overview of the characteristics and past work which support Wikipedia (English, in this case) for time-aware information retrieval research. Furthermore, we discuss the main content and meta-data temporal signals available along with illustrative analysis. We briefly discuss the source and nature of each signal, and any issues that may complicate extraction and use. To encourage further temporal research based on Wikipedia, we have released all the distilled datasets referred to in this paper.
- Wikipedia: Wikipedia is not a newspaper. http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_is_not_a_newspaper.Google Scholar
- J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In Research and Development in Information Retrieval, pages 37--45, 1998. Google ScholarDigital Library
- B. Baresch, L. Knight, D. Harp, and C. Yaschur. Friends who choose your news: An analysis of content links on facebook. In ISOJ: The Official Research Journal of International Symposium on Online Journalism, Austin, TX, volume 1, 2011.Google Scholar
- M. Ciglan and K. Nørvåg. Wikipop: personalized event detection system based on wikipedia page view statistics. In CIKM '10, pages 1931--1932, 2010. Google ScholarDigital Library
- P. A. Dow, L. A. Adamic, and A. Friggeri. The anatomy of large facebook cascades. In ICWSM, 2013.Google Scholar
- M. Georgescu, N. Kanhabua, D. Krause, W. Nejdl, and S. Siersdorfer. Extracting event-related information from article updates in wikipedia. In ECIR '13, pages 254--266, 2013. Google ScholarDigital Library
- M. Georgescu, D. D. Pham, N. Kanhabua, S. Zerr, S. Siersdorfer, and W. Nejdl. Temporal summarization of event-related updates in wikipedia. WWW '13 Companion, pages 281--284, 2013. Google ScholarDigital Library
- A. Halavais and D. Lackaff. An analysis of topical coverage of Wikipedia. Journal of Computer-Mediated Communication, 13(2):429--440, 2008.Google ScholarCross Ref
- B. Keegan, D. Gergle, and N. Contractor. Hot off the wiki: Structures and dynamics of wikipedia's coverage of breaking news events. American Behavioral Scientist, 2013.Google ScholarCross Ref
- A. J. McMinn, Y. Moshfeghi, and J. M. Jose. Building a large-scale corpus for evaluating event detection on twitter. CIKM '13, pages 409--418, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- M. Osborne, S. Petrovic, R. McCreadie, C. Macdonald, and I. Ounis. Bieber no more: First Story Detection using Twitter and Wikipedia. SIGIR 2012 Workshop on Time-aware Information Access (#TAIA2012), 2012.Google Scholar
- M. Potthast, B. Stein, and R. Gerling. Automatic vandalism detection in wikipedia. In ECIR, pages 663--668, 2008. Google ScholarDigital Library
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. WWW '10, pages 851--860, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- T. Steiner, S. van Hooland, and E. Summers. Mj no more: using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection. WWW '13 Companion, pages 791--794, 2013. Google ScholarDigital Library
- J. Strötgen and M. Gertz. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation, 2012.Google Scholar
- F. Vis. Wikinews reporting of hurricane katrina. In Citizen Journalism: Global Perspectives, Global Crises and the Media. Peter Lang, 2009.Google Scholar
- M. Wattenberg, F. B. Viégas, and K. Hollenbach. Visualizing activity on wikipedia with chromograms. INTERACT '07, pages 272--287, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarDigital Library
- S. Whiting, K. Zhou, J. Jose, O. Alonso, and T. Leelanupab. Crowdtiles: presenting crowd-based information for event-driven information needs. CIKM '12, pages 2698--2700, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- S. Whiting, K. Zhou, and J. M. Jose. Temporal variance of intents in multi-faceted event-driven information needs. SIGIR '13. ACM, 2013. Google ScholarDigital Library
- K. Zhou, S. Whiting, J. M. Jose, and M. Lalmas. The impact of temporal intent variability on diversity evaluation. ECIR '13, pages 820--823, Berlin, Heidelberg, 2013. Springer-Verlag. Google ScholarDigital Library
Index Terms
- Wikipedia as a time machine
Recommendations
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Evaluating Entity Linking with Wikipedia
Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate ...
Comments