skip to main content
10.1145/3477495.3531731acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Public Access

Wikimarks: Harvesting Relevance Benchmarks from Wikipedia

Published: 07 July 2022 Publication History

Abstract

We provide a resource for automatically harvesting relevance benchmarks from Wikipedia -- which we refer to as "Wikimarks" to differentiate them from manually created benchmarks. Unlike simulated benchmarks, they are based on manual annotations of Wikipedia authors. Studies on the TREC Complex Answer Retrieval track demonstrated that leaderboards under Wikimarks and manually annotated benchmarks are very similar. Because of their availability, Wikimarks can fill an important need for Information Retrieval research.
We provide a meta-resource to harvest Wikimarks for several information retrieval tasks across different languages: paragraph retrieval, entity ranking, query-specific clustering, outline prediction, and relevant entity linking and many more. In addition, we provide example Wikimarks for English, Simple English, and Japanese derived from the 01/01/2022 Wikipedia dump.
Resource available: https://trema-unh.github.io/wikimarks/

References

[1]
Rami Aly, Zhijiang Guo, Michael Sejr Schlichtkrull, James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Oana Cocarascu, and Arpit Mittal. 2021. FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information. arxiv: 2106.05707 [cs.CL]
[2]
Sebastian Arnold, Rudolf Schneider, Philippe Cudré-Mauroux, Felix A Gers, and Alexander Löser. 2019. SECTOR: A Neural Model for Coherent Topic Segmentation and Classification. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 169--184.
[3]
Nima Asadi, Donald Metzler, Tamer Elsayed, and Jimmy Lin. 2011. Pseudo test collections for learning web search ranking functions. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 1073--1082.
[4]
Leif Azzopardi, Maarten De Rijke, and Krisztian Balog. 2007. Building simulated queries for known-item topics: an analysis using six european languages. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 455--462.
[5]
Steven M Beitzel, Eric C Jensen, Abdur Chowdhury, and David Grossman. 2003. Using titles and category names from editor-driven taxonomies for automatic evaluation. In Proceedings of the twelfth international conference on Information and knowledge management. ACM, 17--23.
[6]
Richard Berendsen, Manos Tsagkias, Maarten De Rijke, and Edgar Meij. 2012. Generating pseudo test collections for learning to rank scientific articles. In International Conference of the Cross-Language Evaluation Forum for European Languages . Springer, 42--53.
[7]
Richard Berendsen, Manos Tsagkias, Wouter Weerkamp, and Maarten De Rijke. 2013. Pseudo test collections for training and tuning microblog rankers. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 53--62.
[8]
Shubham Chatterjee and Laura Dietz. 2021. Entity Retrieval Using Fine-Grained Entity Aspects. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval . 1662--1666.
[9]
Gordon V Cormack, Christopher R Palmer, and Charles LA Clarke. 1998. Efficient construction of large test collections. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval . 282--289.
[10]
Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on Wikipedia data. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL) . 708--716.
[11]
Jeffrey Dalton, Laura Dietz, and James Allan. 2014. Entity Query Feature Expansion Using Knowledge Base Links. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (Gold Coast, Queensland, Australia) (SIGIR '14). Association for Computing Machinery, New York, NY, USA, 365--374. https://doi.org/10.1145/2600428.2609628
[12]
Bhavana Dalvi, Einat Minkov, Partha P Talukdar, and William W Cohen. 2015. Automatic gloss finding for a knowledge base using ontological constraints. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 369--378.
[13]
Laura Dietz and Jeff Dalton. 2020. Humans optional? automatic large-scale test collections for entity, passage, and entity-passage retrieval. Datenbank-Spektrum, Vol. 20, 1 (2020), 17--28.
[14]
Demian Gholipour Ghalandari, Chris Hokamp, John Glover, Georgiana Ifrim, et almbox. 2020. A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1302--1308.
[15]
Hiroaki Hayashi, Prashant Budania, Peng Wang, Chris Ackerson, Raj Neervannan, and Graham Neubig. 2021. WikiAsp: A Dataset for Multi-domain Aspect-based Summarization. Transactions of the Association for Computational Linguistics, Vol. 9 (2021), 211--225.
[16]
Daniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, Andrew Fandrianto, Jay Han, Matthew Kelcey, and David Berthelot. 2016. Wikireading: A novel large-scale language understanding task over wikipedia. arXiv preprint arXiv:1608.03542 (2016).
[17]
Gaya K Jayasinghe, William Webber, Mark Sanderson, and J Shane Culpepper. 2014. Improving test collection pools with machine learning. In Proceedings of the 2014 Australasian Document Computing Symposium. 2.
[18]
Sumanta Kashyapi and Laura Dietz. 2022. Query-specific Subtopic Clustering. In The annual Joint Conference on Digital Libraries (JCDL) .
[19]
Gjergji Kasneci, Maya Ramanath, Fabian Suchanek, and Gerhard Weikum. 2009. The YAGO-NAGA approach to knowledge discovery. ACM SIGMOD Record, Vol. 37, 4 (2009), 41--47.
[20]
Federico Nanni, Simone Paolo Ponzetto, and Laura Dietz. 2018. Entity-aspect linking: providing fine-grained semantics of entities in context. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries . 49--58.
[21]
Joel Nothman, James R Curran, and Tara Murphy. 2008. Transforming Wikipedia into named entity training data. In Proceedings of the Australasian Language Technology Association Workshop 2008. 124--132.
[22]
Laura Perez-Beltrachini and Mirella Lapata. 2021. Models and Datasets for Cross-Lingual Summarisation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing . 9408--9423.
[23]
Francesco Piccinno and Paolo Ferragina. 2014. From TagME to WAT: a new entity annotator. In Proceedings of the first international workshop on Entity recognition & disambiguation. 55--62.
[24]
Jordan Ramsdell and Laura Dietz. 2020. A Large Test Collection for Entity Aspect Linking. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management . 3109--3116.
[25]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3982--3992.
[26]
Johannes M van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, and Arjen P de Vries. 2020. Rel: An entity linker standing on the shoulders of giants. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval . 2197--2200.
[27]
Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, Vol. 6 (2018), 287--302.
[28]
Emine Yilmaz, Evangelos Kanoulas, and Javed A Aslam. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the SIGIR. 603--610.
[29]
Haotian Zhang, Gordon V Cormack, Maura R Grossman, and Mark D Smucker. 2018. Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval. arXiv preprint arXiv:1803.08988 (2018).

Cited By

View all
  • (2024)Natural Language Processing in Knowledge-Based Support for Operator AssistanceApplied Sciences10.3390/app1407276614:7(2766)Online publication date: 26-Mar-2024
  • (2023)Answering Topical Information Needs Using Neural Entity-Oriented Information Retrieval and ExtractionACM SIGIR Forum10.1145/3582900.358292656:2(1-2)Online publication date: 31-Jan-2023
  • (2023)Perspectives on Large Language Models for Relevance JudgmentProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605136(39-50)Online publication date: 9-Aug-2023

Index Terms

  1. Wikimarks: Harvesting Relevance Benchmarks from Wikipedia

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2022
    3569 pages
    ISBN:9781450387323
    DOI:10.1145/3477495
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. query-specific clustering
    2. relevant entity linking
    3. test collections

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGIR '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)133
    • Downloads (Last 6 weeks)27
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Natural Language Processing in Knowledge-Based Support for Operator AssistanceApplied Sciences10.3390/app1407276614:7(2766)Online publication date: 26-Mar-2024
    • (2023)Answering Topical Information Needs Using Neural Entity-Oriented Information Retrieval and ExtractionACM SIGIR Forum10.1145/3582900.358292656:2(1-2)Online publication date: 31-Jan-2023
    • (2023)Perspectives on Large Language Models for Relevance JudgmentProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605136(39-50)Online publication date: 9-Aug-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media