ABSTRACT
This paper presents a unified framework for the evaluation of a range of structured document retrieval (SDR) approaches and tasks. The framework is based on a model of tree retrieval, evaluated using a novel extension of the Structural elevance (SR) measure. The measure replaces the assumption of independence in traditional information retrieval (IR) with a notion of redundancy that takes into account the user navigation inside documents while seeking relevant information. Unlike existing metrics for SDR, our proposed framework does not require the computation of an ideal ranking which has, thus far, prevented the practical application of such measures. Instead, SR builds on a Markovian model of user navigation that can be estimated through the use of structural summaries. The results of this paper (supported by experimental validation using INEX data) show that SR defined over a tree retrieval model can provide a common basis for the evaluation of SDR approaches across various structured search tasks.
- M. S. Ali, M. P. Consens, and M. Lalmas. Structural Relevance in XML Retrieval Evaluation. In SIGIR 2007 Workshop on Focused Retrieval, pages 1--8, 2007.Google Scholar
- S. Amer-Yahia et al. XQuery 1.0 and XPath 2.0 Full-Text,W3C Working Draft 18 May 2007, 2007.Google Scholar
- R. Baeza-Yates and G. Navarro. Integrating contents and structure in text retrieval. SIGMOD Rec., 25(1):67--79, 1996. Google ScholarDigital Library
- D. Carmel, Y. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer. Searching XML Documents via XML fragments. In SIGIR 2003, pages 151--158, 2003. Google ScholarDigital Library
- C. Clarke. Controlling Overlap in Content-oriented XML Retrieval. In SIGIR 2005, pages 314--321, 2005. Google ScholarDigital Library
- C. Clarke. Range results in XML retrieval. In INEX 2005, volume LNCS(3493), pages 4--5, 2006.Google Scholar
- M. P. Consens, F. Rizzolo, and A. A. Vaisman. AxPRE Summaries: Exploring the (Semi-)Structure of XML Web Collections. In ICDE 2008, pages 1519--1521. IEEE, 2008. Google ScholarDigital Library
- A. Doucet, L. Aunimo, M. Lehtonen, and R. Petit. Accurate retrieval of XML document fragments using EXTIRP. In INEX 2003, 2004.Google Scholar
- C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW 2001, pages 613--622, 2001. Google ScholarDigital Library
- N. Fuhr and K. Großjohann. XIRQL: A query language for information retrieval in XML documents. In SIGIR 2001, pages 172--180, 2001. Google ScholarDigital Library
- L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked Keyword Search over XML documents. In SIGMOD 2003, pages 16--27, 2003. Google ScholarDigital Library
- V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. ICDE 2003, pages 367--395, 2003.Google ScholarCross Ref
- D. Jenkinson and A. Trotman. Focused access to XML documents. In INEX 2007, 2008.Google Scholar
- G. Kazai. Choosing an Ideal Recall-Base for the Evaluation of the Focused Task: Sensitivity Analysis of the XCG Evaluation Measures. In INEX 2006, pages 35--44, 2007.Google Scholar
- G. Kazai and M. Lalmas. Extended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Trans. Inf. Syst., 24(4):503--542, 2006. Google ScholarDigital Library
- G. Kazai, M. Lalmas, and T. Rölleke. Focussed structured document retrieval. In SPIRE 2002, pages 241--247. Springer-Verlag, 2002. Google ScholarDigital Library
- G. Kazai, B. Piwowarski, and S. Robertson. Effort-precision and gain-recall based on a probabilistic navigation model. In ICTIR 2007, 2007.Google Scholar
- B. Piwowarski and G. Dupret. Expected Precision-Recall with User Modelling (EPRUM). In SIGIR 2006, pages 260--267. ACM Press, 2006. Google ScholarDigital Library
- B. Piwowarski and P. Gallinari. Expected ratio of relevant units: A measure for structured document information retrieval. In INEX 2003, pages 158--166, April 2004.Google Scholar
- B. Piwowarski, P. Gallinari, and G. Dupret. Precision recall with user modeling (PRUM): Application to structured information retrieval. ACM Trans. Inf. Syst., 25(1):1, 2007. Google ScholarDigital Library
- B. Piwowarski, A. Trotman, and M. Lalmas. Sound and Complete Relevance Assessment for XML Retrieval. ACM Trans. Inf. Syst., 2008 (To Appear). Google ScholarDigital Library
- V. Raghavan, P. Bollmann, and G. Jung. A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst., 7(3):205--229, 1989. Google ScholarDigital Library
- S. M. Ross. Introduction to Probability Models. Academic Press, New York, 8th edition, 2003. Google ScholarDigital Library
- A. Theobald and G. Weikum. The XXL search engine: ranked retrieval of XML data using indexes and ontologies. In SIGMOD 2002. ACM, 2002. Google ScholarDigital Library
- A. Trotman. Wanted: Element retrieval users. In INEX 2005, volume LNCS(3493), pages 58--64, 2006.Google Scholar
Index Terms
- Structural relevance: a common basis for the evaluation of structured document retrieval
Recommendations
Structural Relevance Feedback in XML Retrieval
FQAS '09: Proceedings of the 8th International Conference on Flexible Query Answering SystemsContrarily to classical information retrieval systems, the systems that treat structured documents include the structural dimension through the document and query comparison. Thus, the retrieval of relevant results means the retrieval of document ...
Sound and complete relevance assessment for XML retrieval
In information retrieval research, comparing retrieval approaches requires test collections consisting of documents, user requests and relevance assessments. Obtaining relevance assessments that are as sound and complete as possible is crucial for the ...
Relevance feedback for structural query expansion
INEX'05: Proceedings of the 4th international conference on Initiative for the Evaluation of XML RetrievalKeyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to enhance retrieval quality. For keyword-based XML queries,...
Comments