Abstract
We consider the Structured Information Retrieval task which consists in ranking nested textual units according to their relevance for a given query, in a collection of structured documents. We propose to improve the performance of a baseline Information Retrieval system by using a learning ranking algorithm which operates on scores computed from document elements and from their local structural context. This model is trained to optimize a Ranking Loss criterion using a training set of annotated examples composed of queries and relevance judgments on a subset of the document elements. The model can produce a ranked list of documents elements which fulfills a given information need expressed in the query. We analyze the performance of our algorithm on the INEX collection and compare it to a baseline model which is an adaptation of Okapi to Structured Information Retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.): INEX 2004. LNCS, vol. 3493. Springer, Heidelberg (2005)
Baeza-Yates, R., Maarek, Y.S., Roelleke, T., de Vries, A.P.: Third edition of the XML and Information Retrieva’ workshop. In: First workshop on integration of ir and db (wird) jointly held at SIGIR 2004, Sheffield, UK, July 29 (2004), pp. 24–30. SIGIR Forum (2004)
Lalmas, M.: Dempster-shafer’s theory of evidence applied to structured documents: Modelling uncertainty (1997)
Lalmas, M., Moutogianni, E.: A dempster-shafer indexing for the focussed retrieval of hierarchically structured documents: Implementation and experiments on a web museum collection, RIAO, Paris, France (2000)
Piwowarski, B., Gallinari, P.: A bayesian network for XML Information Retrieval: Searching and learning with the INEX collection. Information Retrieval (2004)
Ogilvie, P., Callan, J.: Using Language Models for Flat Text Queries in XML Retrieval. In: Proceedings of INEX 2003, pp. 12–18 (2004)
Cohen, W.W., Schapire, R.E., Singer, Y.: Learning to order things. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10. The MIT Press, Cambridge (1998)
Bartell, B.T., Cottrell, G.W., Belew, R.K.: Automatic combination of multiple ranked retrieval systems. In: Research and Development in Information Retrieval, pp. 173–181 (1994)
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. In: Shavlik, J.W. (ed.) Proceedings of ICML1998, 15th International Conference on Machine Learning, Madison, US, pp. 170–178. Morgan Kaufmann Publishers, San Francisco (1998)
Amini, M.R., Usunier, N., Gallinari, P.: Automatic text summarization based on word-clusters and ranking algorithms. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 142–156. Springer, Heidelberg (2005)
Craswell, N., Robertson, S., Zaragoza, H., Taylor, M.: Relevance weighting for query independent evidence. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 416–423. ACM Press, New York (2005)
Auer, P., Meir, R. (eds.): COLT 2005. LNCS, vol. 3559. Springer, Heidelberg (2005)
Cooper, W.S., Gey, F.C., Dabney, D.P.: Probabilistic retrieval based on staged logistic regression. In: Belkin, N.J., Ingwersen, P., Pejtersen, A.M. (eds.) Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, June 21-24 (1992) pp. 198–210. ACM, New York (1992)
Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at TREC. In: Text REtrieval Conference, pp. 21–30 (1992)
Kazai, G., Lalmas, M., Rölleke, T.: A model for the representation and focused retrieval of structured documents based on fuzzy aggregation. In: SPIRE, pp. 123–135 (2001)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst, 422–446 (2002)
Kazai, G., Lalmas, M.: Inex 2005 evaluation metrics. Technical document (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vittaut, JN., Gallinari, P. (2006). Machine Learning Ranking for Structured Information Retrieval. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_30
Download citation
DOI: https://doi.org/10.1007/11735106_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33347-0
Online ISBN: 978-3-540-33348-7
eBook Packages: Computer ScienceComputer Science (R0)