This volume contains the reports on the two keynote addresses and the papers accepted for presentation at the Data infrastructurEs for Supporting Information Retrieval Evaluation -- DESIRE 2011 -- Workshop1 held on 28 October 2011 in conjunction with the 20th ACM International Conference on Information and Knowledge Management (CIKM), Glasgow, UK.
The theme of the workshop has been envisaged in the context of the area of Evaluation of Information Retrieval. In fact, Information Retrieval has a strong and long tradition dating back to the 1960s in producing and processing scientific data resulting from the experimental evaluation of search algorithms and search systems. This attitude towards evaluation has led to fast and continuous progress in the evolution of information retrieval systems and search engines.
However, in order to make the data test collections, that are used in the context of the evaluation activities, understandable and usable they must be endowed with some auxiliary information, i.e., provenance, quality, context. Therefore, there is a need for metadata models able to describe the main characteristics of evaluation data. In addition, in order to make distributed data collections accessible, sharable, and interoperable, there is a need for advanced data infrastructures.
In contrast, the information retrieval area has barely explored and exploited the possibilities for managing, storing, and effectively accessing the scientific data produced during the evaluation studies by making use of the methods typical of the database and knowledge management areas.
Over the years, the information retrieval area has produced a vast set of large test collections which have become the main benchmark tools of the area and contribute to reproducible and comparable experiments. However, these same collections have not been organised into coherent and integrated infrastructures which make them accessible, searchable, citable, exploitable, and re-usable to all possibly interested researchers, developers, and user communities.
It is thus time for these three communities -- information retrieval, databases, and knowledge management -- to join efforts, meet, and cooperate to address the problem of envisaging and designing useful infrastructures able to coherently manage pertinent data collections and sources of information, and so take concrete steps towards developing them. Indeed, the information retrieval experts need to recognise this need, while the database and knowledge management experts need to understand the problem and work together to solve it by using the methods and techniques specific to information management. Taking into consideration all these issues, the main objective of the workshop is to gather together experts from these three areas, to encourage them to recognise the urgency of addressing the problem in an integrated and coherent way, and to coordinate efforts towards drawing a roadmap and suggesting best practices for an effective solution of the problem. The topics that have been addressed by DESIRE 2011workshop include:
Conceptual and logical data models for representing IR evaluation scientific data
Metadata formats for describing scientific data produced during IR evaluation
Knowledge management for IR experimental evaluation
Data quality, provenance, adaptability and reusability in the IR evaluation
Data pre- and post-processing, metrics, and analyses in the IR evaluation
Data exchange, integration, evolution and migration for IR evaluation infrastructures Workflow, Web services and Web Service Composition for IR evaluation infrastructures
Metadata formats for describing scientific data produced during IR evaluation
Information Extraction and Text Mining for linking scientific literature and experimental data
Data citation
Evaluation, Test collections, Crowdsourcing for IR evaluation
Visualization of scientific data coming from experimental evaluation
The two keynote addresses have been on aspects related to the design of evaluation infrastructures supporting interactive information retrieval and on the management of data that can support the conceptual representation of the domain of interest of an information management application.
To give some introductory information on the contents of the two keynote addresses:
The keynote address of Professor Norbert Fuhr of the University of Duisburg-Essen in Germany, entitled An Infrastructure for Supporting the Evaluation of Interactive Information Retrieval, addressed the presentation of a testbed for the evaluation of interactive information access. Starting with the INEX2 interactive track in 2004, the group lead by professor Fuhr developed the Daffodil (now ezDL) framework, providing an experimental framework for interactive retrieval, that allows for easy exchange or extension of the system components. Moreover, this framework also contains tools for organizing laboratory experiments. Besides extensive logging (including the possibility to exploit eye tracking data), the system allows for presenting questionnaires at all stages of a search session (pre- /post- task/session), as well as the scheduling of search tasks and monitoring task time.
The keynote address of Professor Maurizio Lenzerini of the Sapienza University of Rome, Italy, entitled Ontology-based data management, addressed how the ontology-based data management aims at accessing and using data by means of a conceptual representation of the domain of interest in the underlying information system. The talk provided an introduction to ontology-based data management, by illustrating the main ideas and techniques for using an ontology to access the data layer of an information system. Then, it described an architecture for ontology-based data access and discussed the issue of choosing the appropriate language for expressing the various components of the architecture, by illustrating the main advantages one gains in managing the information system through the ontology. Finally, the issue of developing methodologies and tools for the design and usage of ontology-based data management solutions have been explained.
Two types of contribution have been presented: communications and position papers. The papers have been peer-reviewed by members of the program committee and the selection procedure has taken into account aspects related to originality, clarity, and technical quality. The six accepted and presented papers are included in this volume.
Proceeding Downloads
An infrastructure for supporting the evaluation of interactive information retrieval
A testbed for the evaluation of interactive information access consists of three components: (1) a collection of documents, (2) a set of tasks/usages, and (3) a system. Whereas in most evaluation initiatives only the first two components are provided by ...
Principles for robust evaluation infrastructure
The standard "Cranfield" approach to the evaluation of information retrieval systems has been used and refined for nearly fifty years, and has been a key element in the development of large-scale retrieval systems. The resources created by such ...
A lightweight framework for reproducible parameter sweeping in information retrieval
Information retrieval experiments consist of multiple tasks, such as preprocessing and evaluation, each subject to various parameters affecting their results. Dependencies between tasks exist such that one task may have to use the output of another. ...
Ontology-based data management
Ontology-based data management aims at accessing and using data by means of a conceptual representation of the domain of interest in the underlying information system. In this talk I will provide an introduction to ontology-based data management, by ...
Evaluation with the VIRTUOSO platform: an open source platform for information extraction and retrieval evaluation
- Gérard M. Dupont,
- Gaël de Chalendar,
- Khaled Khelif,
- Dmitri Voitsekhovitch,
- Géraud Canet,
- Stéphan Brunessaux
This paper describes a software architecture designed to enable the evaluation of information processing and retrieval systems. The overall objective of our project is to provide an open technical framework for the integration of tools for collection, ...
Use cases as a component of information access evaluation
Information access research and development, and information retrieval especially, is based on quantitative and systematic benchmarking. Benchmarking of a computational mechanism is always based on some set of assumptions on how a system with the ...
PatOlympics: an infrastructure for interactive evaluation of patent retrieval tools
We present PatOlympics - the interactive evaluation campaign organized by the Information Retrieval Facility in the context of its yearly symposium. In particular, we focus in this paper on the infrastructure behind the event. This infrastructure, ...
Infrastructure and workflow for the formal evaluation of semantic search technologies
This paper describes an infrastructure for the automated evaluation of semantic technologies and, in particular, semantic search technologies. For this purpose, we present an evaluation framework which follows a service-oriented approach for evaluating ...
- Proceedings of the 2011 workshop on Data infrastructurEs for supporting information retrieval evaluation