research-article

Public Access

Wikimarks: Harvesting Relevance Benchmarks from Wikipedia

Authors:

Shubham Chatterjee,

Sumanta Kashyapi,

Ben GamariAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 3003 - 3012

https://doi.org/10.1145/3477495.3531731

Published: 07 July 2022 Publication History

Abstract

We provide a resource for automatically harvesting relevance benchmarks from Wikipedia -- which we refer to as "Wikimarks" to differentiate them from manually created benchmarks. Unlike simulated benchmarks, they are based on manual annotations of Wikipedia authors. Studies on the TREC Complex Answer Retrieval track demonstrated that leaderboards under Wikimarks and manually annotated benchmarks are very similar. Because of their availability, Wikimarks can fill an important need for Information Retrieval research.

We provide a meta-resource to harvest Wikimarks for several information retrieval tasks across different languages: paragraph retrieval, entity ranking, query-specific clustering, outline prediction, and relevant entity linking and many more. In addition, we provide example Wikimarks for English, Simple English, and Japanese derived from the 01/01/2022 Wikipedia dump.

Resource available: https://trema-unh.github.io/wikimarks/

References

[1]

Rami Aly, Zhijiang Guo, Michael Sejr Schlichtkrull, James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Oana Cocarascu, and Arpit Mittal. 2021. FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information. arxiv: 2106.05707 [cs.CL]

[2]

Sebastian Arnold, Rudolf Schneider, Philippe Cudré-Mauroux, Felix A Gers, and Alexander Löser. 2019. SECTOR: A Neural Model for Coherent Topic Segmentation and Classification. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 169--184.

[3]

Nima Asadi, Donald Metzler, Tamer Elsayed, and Jimmy Lin. 2011. Pseudo test collections for learning web search ranking functions. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 1073--1082.

Digital Library

[4]

Leif Azzopardi, Maarten De Rijke, and Krisztian Balog. 2007. Building simulated queries for known-item topics: an analysis using six european languages. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 455--462.

Digital Library

[5]

Steven M Beitzel, Eric C Jensen, Abdur Chowdhury, and David Grossman. 2003. Using titles and category names from editor-driven taxonomies for automatic evaluation. In Proceedings of the twelfth international conference on Information and knowledge management. ACM, 17--23.

Digital Library

[6]

Richard Berendsen, Manos Tsagkias, Maarten De Rijke, and Edgar Meij. 2012. Generating pseudo test collections for learning to rank scientific articles. In International Conference of the Cross-Language Evaluation Forum for European Languages . Springer, 42--53.

Digital Library

[7]

Richard Berendsen, Manos Tsagkias, Wouter Weerkamp, and Maarten De Rijke. 2013. Pseudo test collections for training and tuning microblog rankers. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 53--62.

Digital Library

[8]

Shubham Chatterjee and Laura Dietz. 2021. Entity Retrieval Using Fine-Grained Entity Aspects. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval . 1662--1666.

Digital Library

[9]

Gordon V Cormack, Christopher R Palmer, and Charles LA Clarke. 1998. Efficient construction of large test collections. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval . 282--289.

Digital Library

[10]

Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on Wikipedia data. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL) . 708--716.

[11]

Jeffrey Dalton, Laura Dietz, and James Allan. 2014. Entity Query Feature Expansion Using Knowledge Base Links. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (Gold Coast, Queensland, Australia) (SIGIR '14). Association for Computing Machinery, New York, NY, USA, 365--374. https://doi.org/10.1145/2600428.2609628

Digital Library

[12]

Bhavana Dalvi, Einat Minkov, Partha P Talukdar, and William W Cohen. 2015. Automatic gloss finding for a knowledge base using ontological constraints. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 369--378.

Digital Library

[13]

Laura Dietz and Jeff Dalton. 2020. Humans optional? automatic large-scale test collections for entity, passage, and entity-passage retrieval. Datenbank-Spektrum, Vol. 20, 1 (2020), 17--28.

[14]

Demian Gholipour Ghalandari, Chris Hokamp, John Glover, Georgiana Ifrim, et almbox. 2020. A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1302--1308.

[15]

Hiroaki Hayashi, Prashant Budania, Peng Wang, Chris Ackerson, Raj Neervannan, and Graham Neubig. 2021. WikiAsp: A Dataset for Multi-domain Aspect-based Summarization. Transactions of the Association for Computational Linguistics, Vol. 9 (2021), 211--225.

[16]

Daniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, Andrew Fandrianto, Jay Han, Matthew Kelcey, and David Berthelot. 2016. Wikireading: A novel large-scale language understanding task over wikipedia. arXiv preprint arXiv:1608.03542 (2016).

[17]

Gaya K Jayasinghe, William Webber, Mark Sanderson, and J Shane Culpepper. 2014. Improving test collection pools with machine learning. In Proceedings of the 2014 Australasian Document Computing Symposium. 2.

Digital Library

[18]

Sumanta Kashyapi and Laura Dietz. 2022. Query-specific Subtopic Clustering. In The annual Joint Conference on Digital Libraries (JCDL) .

[19]

Gjergji Kasneci, Maya Ramanath, Fabian Suchanek, and Gerhard Weikum. 2009. The YAGO-NAGA approach to knowledge discovery. ACM SIGMOD Record, Vol. 37, 4 (2009), 41--47.

Digital Library

[20]

Federico Nanni, Simone Paolo Ponzetto, and Laura Dietz. 2018. Entity-aspect linking: providing fine-grained semantics of entities in context. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries . 49--58.

Digital Library

[21]

Joel Nothman, James R Curran, and Tara Murphy. 2008. Transforming Wikipedia into named entity training data. In Proceedings of the Australasian Language Technology Association Workshop 2008. 124--132.

[22]

Laura Perez-Beltrachini and Mirella Lapata. 2021. Models and Datasets for Cross-Lingual Summarisation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing . 9408--9423.

[23]

Francesco Piccinno and Paolo Ferragina. 2014. From TagME to WAT: a new entity annotator. In Proceedings of the first international workshop on Entity recognition & disambiguation. 55--62.

Digital Library

[24]

Jordan Ramsdell and Laura Dietz. 2020. A Large Test Collection for Entity Aspect Linking. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management . 3109--3116.

Digital Library

[25]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3982--3992.

[26]

Johannes M van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, and Arjen P de Vries. 2020. Rel: An entity linker standing on the shoulders of giants. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval . 2197--2200.

Digital Library

[27]

Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, Vol. 6 (2018), 287--302.

[28]

Emine Yilmaz, Evangelos Kanoulas, and Javed A Aslam. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the SIGIR. 603--610.

Digital Library

[29]

Haotian Zhang, Gordon V Cormack, Maura R Grossman, and Mark D Smucker. 2018. Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval. arXiv preprint arXiv:1803.08988 (2018).

Cited By

Besharati Moghaddam FLopez ADe Vuyst SGautama S(2024)Natural Language Processing in Knowledge-Based Support for Operator AssistanceApplied Sciences10.3390/app1407276614:7(2766)Online publication date: 26-Mar-2024
https://doi.org/10.3390/app14072766
Chatterjee S(2023)Answering Topical Information Needs Using Neural Entity-Oriented Information Retrieval and ExtractionACM SIGIR Forum10.1145/3582900.358292656:2(1-2)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1145/3582900.3582926
Faggioli GDietz LClarke CDemartini GHagen MHauff CKando NKanoulas EPotthast MStein BWachsmuth HYoshioka MKiseleva JAliannejadi M(2023)Perspectives on Large Language Models for Relevance JudgmentProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605136(39-50)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605136

Index Terms

Wikimarks: Harvesting Relevance Benchmarks from Wikipedia
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Cluster-based retrieval using language models
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine ...
Liberal relevance criteria of TREC -: counting on negligible documents?
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval

Most test collections (like TREC and CLEF) for experimental research in information retrieval apply binary relevance assessments. This paper introduces a four-point relevance scale and reports the findings of a project in which TREC-7 and TREC-8 ...
A test collection for entity search in DBpedia
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

We develop and make publicly available an entity search test collection based on the DBpedia knowledge base. This includes a large number of queries and corresponding relevance judgments from previous benchmarking campaigns, covering a broad range of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
318
Total Downloads

Downloads (Last 12 months)133
Downloads (Last 6 weeks)27

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Besharati Moghaddam FLopez ADe Vuyst SGautama S(2024)Natural Language Processing in Knowledge-Based Support for Operator AssistanceApplied Sciences10.3390/app1407276614:7(2766)Online publication date: 26-Mar-2024
https://doi.org/10.3390/app14072766
Chatterjee S(2023)Answering Topical Information Needs Using Neural Entity-Oriented Information Retrieval and ExtractionACM SIGIR Forum10.1145/3582900.358292656:2(1-2)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1145/3582900.3582926
Faggioli GDietz LClarke CDemartini GHagen MHauff CKando NKanoulas EPotthast MStein BWachsmuth HYoshioka MKiseleva JAliannejadi M(2023)Perspectives on Large Language Models for Relevance JudgmentProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605136(39-50)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605136

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten