Abstract
Content-only queries in hierarchically structured documents should retrieve the most specific document nodes which are exhaustive to the information need. For this problem, we investigate two methods of augmentation, which both yield high retrieval quality. As retrieval effectiveness, we consider the ratio of retrieval quality and response time; thus, fast approximations to the 'correct' retrieval result may yield higher effectiveness. We present a classification scheme for algorithms addressing this issue, and adopt known algorithms from standard document retrieval for XML retrieval. As a new strategy, we propose incremental-interruptible retrieval, which allows for instant presentation of the top ranking documents. We develop a new algorithm implementing this strategy and evaluate the different methods with the INEX collection.
Article PDF
Similar content being viewed by others
References
Amato G, Rabitti F, Savino P and Zezula P (2003) Region proximity in metric spaces and its use for approximate similarity search. ACM Transactions on Information Systems 21(2):192–227.
Beaulieu M and Robertson S (1996) Evaluating interactive systems in TREC. Journal of the American Society for Information Science 47(1):85–94.
Buckley C and Lewit A (1985) Optimization of inverted vector searches. In Proceedings of the 8th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, pp. 97–105.
Chiaramella Y, Mulhem P and Fourel F (1996) A model for Multimedia Information Retrieval. Technical report, FERMI ESPRIT BRA 8134, University of Glasgow.
Fagin R (1996) Combining fuzzy information from multiple systems. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, pp. 216–226.
Fagin R (1999) Combining fuzzy information from multiple systems. Journal of Computer and System Sciences 58(1):83–99.
Fuhr N, Gövert N, Kazai G and Lalmas M (2002) INEX: INitiative for the evaluation of XML retrieval. R Baeza-Yates, N Fuhr, and YS Maarek (Eds.): Proceedings of the SIGIR 2002 Workshop on XML and Information Retrieval. http://www.is.informatik.uni-duisburg.de/bib/xml/Fuhr_etal_02a.html
Fuhr N, Gövert N, Kazai G and Lalmas M (Eds.), (2003) INitiative for the Evaluation of XML Retrieval (INEX). In Proceedings of the first INEX Workshop. Dagstuhl, Germany, December 8–11, 2002, ERCIM Workshop Proceedings. Sophia Antipolis, France: ERCIM. http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf
Fuhr N, Gövert N and Rölleke T (1998) DOLORES: A system for logic-based retrieval of multimedia objects. WB Croft, A Moffat, CJ van Rijsbergen, R Wilkinson, and J Zobel (Eds.), In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, pp. 257–265.
Fuhr N and Großjohann K (2004) XIRQL: An XML Query Language Based on Information Retrieval Concepts. ACM Transactions on Information Systems 22:313–356.
Fuhr N, Lalmas M and Malik S (Eds.), (2004) INitiative for the Evaluation of XML Retrieval (INEX). In Proceedings of the Second INEX Workshop. Dagstuhl, Germany, December 15–17, 2003. http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf
Gövert N and Kazai G (2003) Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002. Fuhr et al. (2003), pp. 1–17, ERCIM. http://www.ercim.org/publication/ws-proceedings/INEX2002.pdf
Güntzer U, Balke W-T and Kießling W (2000) Optimizing multi-feature queries for image database. Proc. VLDB. San Francisco, Morgan Kaufman, pp. 419–428.
Hatano K, Kinutan H, Watanabe M, Mori Y, Yoshikawa M and Uemura S (2004) Keyword-based XML fragment retrieval: experimental evaluation based on INEX 2003 relevance assessments. In: Fuhr et al. (2004), pp. 81–88. http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf
Moffat A and Zobel J (1996) Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems 14(4):349–379.
Persin M, Zobel J and Sacks-Davis R (1996) Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science 47(10):749–764.
Pfeifer U and Pennekamp S (1997) Incremental processing of vague queries in Interactive Retrieval Systems. In: N Fuhr, G Dittrich, and K Tochtermann (Eds.), Hypertext—Information Retrieval—Multimedia (HIM). Universitätsverlag Konstanz. http://ls1-www.cs.uni-dortmund.de/HIM97/
Robertson SE, Walker S, Hancock-Beaulieu M, Gull A and Lau M (1992) Okapi at TREC. In: Text REtrieval Conference. pp. 21–30.
Thom JA, Zobel, J, and Grima B (1995) Design of indexes for structured document databases. Technical Report TR-95-8, Collaborative Information Technology Research Institute, Melbourne, Australia.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fuhr, N., Gövert, N. Retrieval quality vs. effectiveness of specificity-oriented search in XML collections. Inf Retrieval 9, 55–70 (2006). https://doi.org/10.1007/s10791-005-5721-5
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10791-005-5721-5