Abstract
This is the first year for the Centre for Interactive Systems Research participation of INEX. Based on a newly developed XML indexing and retrieval system on Okapi, we extend Robertson’s field-weighted BM25F for document retrieval to element level retrieval function BM25E. In this paper, we introduce this new function and our experimental method in detail, and then show how we tuned weights for our selected fields by using INEX 2004 topics and assessments. Based on the tuned models we submitted our runs for CO.Thorough, CO.FetchBrowse, the methods we propose show real promise. Existing problems and future work are also discussed.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Deutsch, A., Fernandez, M., Suciu, D.: Storing semistructured data with STORED. In: Proc. SIGMOD (1999)
Harding, J., Li, Q., Moon, B.: XISS/R: XML Indexing and Storage System Using RDBMS. In: Proceedings of the 29th VLDB Conference (2003)
Software AG. Tamino XML database, http://www.softwareag.com/tamino/
XYZFind. XML Database, http://www.xyzfind.com
Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. In: Research and Development in Information Retrieval (2001)
Wolff, J.E., Florke, H., Cremers, A.B.: Searching and Browsing Collections of Structural Information. In: Proc. IEEE Forum on Research and Technology Advances in Digital Libraries (2000)
Schlieder, T., Meuss, H.: Querying and Ranking XML Documents. Special Topic Issue Journal American Society for Informations Systems on XML and Information Retrieval (2002)
Schlieder, T.: Similarity Search in XML Data using Cost-Based Query Transformations. In: Proc. 4th Intern. Workshop on the Web and Databases (2001)
Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 477. Springer, Heidelberg (2002)
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 Extension to Multiple Weighted Fields. In: CIKM 2004 (2004)
Wilkinson, R.: Effective retrieval of structured documents. In: Research and Development in Information Retrieval (1994)
Ogilvie, P., Callan, J.: Combining document representations for known item search. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003 (2003)
Kraaij, W., Westerveld, T., Hiemstra, D.: The importance of prior probabilities for entry page search. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2002)
Myaeng, S., Jang, D., Kim, M., Zhoo, Z.: A flexible model for retrieval of SGML documents. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1998)
Clarke, C.L.A., Tilker, P.L.: MultiText experiments for INEX 2004. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 85–87. Springer, Heidelberg (2005)
Vittaut, J.-N., Piwowarski, B., Gallinari, P.: An algebra for structured queries in bayesian networks. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 100–112. Springer, Heidelberg (2005)
Kekäläinen, J., Junkkari, M., Arvola, P., Aalto, T.: TRIX 2004 – struggling with the overlap. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 127–139. Springer, Heidelberg (2005)
Larson, R.R.: Cheshire II at INEX 2004: Fusion and feedback for the adhoc and heterogeneous tracks. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 322–336. Springer, Heidelberg (2005)
Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224–237. Springer, Heidelberg (2005)
Trotman, A.: Choosing document structure weights. Information Processing & Management 41(2), 243–264
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, W., Robertson, S., MacFarlane, A. (2006). Field-Weighted XML Retrieval Based on BM25. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds) Advances in XML Information Retrieval and Evaluation. INEX 2005. Lecture Notes in Computer Science, vol 3977. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-34963-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-34963-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34962-4
Online ISBN: 978-3-540-34963-1
eBook Packages: Computer ScienceComputer Science (R0)