Abstract
We consider the retrieval of XML-structured documents, and of passages from such documents, defined as elements of the XML structure. These are considered from the point of view of passage retrieval, as a form of document retrieval. A retrievable unit (an element chosen as defining suitable passages for retrieval) is a textual document in its own right, but may inherit information from the other parts of the same document. Again, this inheritance is defined in terms of the XML structure. All retrievable units are mapped onto a common field structure, and the ranking function is a standard document retrieval function with a suitable field weighting. A small experiment to demonstrate the idea, using INEX data, is described.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Callan, J.: Passage-level evidence in document retrieval. In: Croft, W.B., van Rijsbergen, C.J. (eds.) SIGIR 1994: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310. Springer, Heidelberg (1994)
Zaragoza, H., Craswell, N., Taylor, M., Saria, S., Robertson, S.: Microsoft Cambridge at TREC 2004: Web and HARD track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Thirteenth Text REtrieval Conference, TREC 2004. NIST Special Publication 500-261. NIST, Gaithersburg (2005), http://trec.nist.gov/pubs/trec13/t13_proceedings.html
Amitay, E., et al.: Juru at TREC 2003 – topic distillation using query-sensitive tuning and cohesiveness filtering. In: Voorhees, E.M., Buckland, L.P. (eds.) The Twelfth Text REtrieval Conference, TREC 2003. NIST Special Publication 500-255. pp. 276–282, NIST, Gaithersburg (2004), http://trec.nist.gov/pubs/trec12/t12_proceedings.html
Wilkinson, R.: Effective retrieval of structured documents. In: Croft, W.B., van Rijsbergen, C.J. (eds.) SIGIR 1994: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 311–317. Springer, Heidelberg (1994)
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Evans, D.A., Gravano, L., Hertzog, O., Zhai, C.X., Ronthaler, M. (eds.) CIKM 2004: Proceedings of the 13th ACM Conference on Information and Knowledge Management, pp. 42–49. ACM Press, New York (2004)
Craswell, N., Hawking, D.: Overview of the TREC 2004 web track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Thirteenth Text REtrieval Conference, TREC 2004. NIST Special Publication 500-261. pp. 89–97, NIST, Gaithersburg (2005), http://trec.nist.gov/pubs/trec13/t13_proceedings.html
Arvola, P., Junkkair, M., Kekalainen, J.: Generalized contextualisation method for XML information retrieval. In: Herzog, O., Schek, H., Fuhr, N., Chowdhury, A., Teiken, W. (eds.) CIKM 2005: Proceedings of the 14th ACM Conference on Information and Knowledge Management, pp. 20–27. ACM Press, New York (2005)
Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224–237. Springer, Heidelberg (2005)
Sigurbjornsson, B., Kamps, J., de Rijke, M.: An element-based approach to XML retrieval. In: Fuhr, N., Malik, S., Lalmas, M. (eds.) INEX 2003: Second International Workshop of the Initative for the Evaluation of XML Retrieval, INEX, pp. 19–26 (2004)
Sigurbjörnsson, B., Kamps, J., de Rijke, M.: Mixture models, overlap, and structural hints in XML element retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 104–109. Springer, Heidelberg (2005)
Mass, Y., Mandelbrod, M.: Retrieving the most relevant XML components. In: Fuhr, N., Malik, S., Lalmas, M. (eds.) INEX 2003: Second International Workshop of the Initative for the Evaluation of XML Retrieval, pp. 53–58. INEX (2004)
Mass, Y., Mandelbrod, M.: Component ranking and automatic query refinement for XML retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 134–140. Springer, Heidelberg (2005)
Taylor, M., Zaragoza, H., Craswell, N., Robertson, S.: Optimisation methods for ranking functions with multiple parameters (2006) (Submitted for publication)
Lu, W., Robertson, S., MacFarlane, A.: Field-weighted XML retrieval based on BM25. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 161–171. Springer, Heidelberg (2006) (Submitted for publication)
INEX: INitiative for the evaluation of XML retrieval (2006), http://inex.is.informatik.uni-duisburg.de/2005/ (visited February 13, 2006)
Kazai, G., Lalmas, M.: INEX 2005 evaluation metrics (2005), http://inex.is.informatik.uni-duisburg.de/2005/inex-2005-metricsv6.pdf (visited February 22, 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Robertson, S., Lu, W., MacFarlane, A. (2006). XML-Structured Documents: Retrievable Units and Inheritance. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds) Flexible Query Answering Systems. FQAS 2006. Lecture Notes in Computer Science(), vol 4027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11766254_11
Download citation
DOI: https://doi.org/10.1007/11766254_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34638-8
Online ISBN: 978-3-540-34639-5
eBook Packages: Computer ScienceComputer Science (R0)