Abstract
Five years of INEX have produced many competing XML element retrieval methods that make use of the document structure. So far, no clearly best method has been identified, and there is even no clear evidence what parts of the document structure can be used to improve retrieval quality. Little research has been done on simply using standard information retrieval techniques for XML retrieval. This paper aims at addressing this; it contains a detailed analysis of the BM25 similarity measure in this context, revealing that this can form a viable baseline method.
... and it does!
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: SIGIR 2000 proceedings, pp. 33–40. ACM Press, New York (2000)
Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. SIGIR Forum 40(1), 64–69 (2006)
Dopichaj, P.: The University of Kaiserslautern at INEX 2005. In: INEX 2005 proceedings, pp. 196–210. Springer, Heidelberg (2006)
Dopichaj, P.: Improving content-oriented XML retrieval by applying structural patterns. In: ICEIS 2007 proceedings. INSTICC, pp. 5–13 (2007a)
Dopichaj, P.: Space-efficient indexing of XML documents for content-only retrieval. Datenbank-Spektrum 7(23) (November 2007b)
Dopichaj, P.: Content-oriented retrieval on document-centric XML. PhD thesis, University of Kaiserslautern (2008), http://thesis.dopichaj.de
Hersh, W., Turpin, A., Price, S., Chan, B., Kramer, D., Sacherek, L., Olson, D.: Do batch and user evaluations give the same results. In: SIGIR 2000 proceedings, pp. 17–24. ACM Press, New York (2000)
Mass, Y., Mandelbrod, M., Amitay, E., Maarek, Y., Soffer, A.: JuruXML – an XML retrieval system at INEX 2002. In: INEX 2002 proceedings, pp. 73–80 (2002)
Pehcevski, J., Thom, J.A.: HiXEval: Highlighting XML retrieval evaluation. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 43–57. Springer, Heidelberg (2006)
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR 1994 proceedings, pp. 232–241. ACM Press, New York (1994), http://portal.acm.org/citation.cfm?id=188490.188561
Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information and retrieval:development and status. Technical report, Computer Laboratory, University of Cambridge (1998), http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-446.html
Theobald, M.: TopX – Efficient and Versatile Top-k Query Processing for Text, Structured, and Semistructured Data. PhD thesis, Universität des Saarlandes (2006)
Trotman, A.: Wanted: Element retrieval users. In: Trotman, A., Lalmas, M., Fuhr, N. (eds.) Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology, pp. 63–69 (2005), http://www.cs.otago.ac.nz/inexmw/
Trotman, A., Pharo, N., Jenkinson, D.: Can we at least agree on something? In: Trotman, A., Geva, S., Kamps, J. (eds.) Proceedings of the SIGIR 2007 Workshop on Focused Retrieval, pp. 49–56 (2007), http://www.cs.otago.ac.nz/sigirfocus/papers.html
Zobel, J.: How reliable are the results of large-scale information retrieval experiments. In: SIGIR 1998 proceedings, pp. 307–314. ACM Press, New York (1998), http://doi.acm.org/10.1145/290941.291014
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dopichaj, P. (2008). The Simplest XML Retrieval Baseline That Could Possibly Work. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds) Focused Access to XML Documents. INEX 2007. Lecture Notes in Computer Science, vol 4862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85902-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-85902-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85901-7
Online ISBN: 978-3-540-85902-4
eBook Packages: Computer ScienceComputer Science (R0)