Abstract
Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them better findable for certain information needs. We use these annotations and the associated documents as a source for pairs of queries and relevant documents. We investigate how learning to rank performance varies when we use different methods for sampling annotations, and show how our pseudo test collection ranks systems compared to editorial topics with editorial judgements. Our results demonstrate that it is possible to train a learning to rank algorithm on generated pseudo judgments. In some cases, performance is on par with learning on manually obtained ground truth.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asadi, N., Metzler, D., Elsayed, T., Lin, J.: Pseudo test collections for learning web search ranking functions. In: SIGIR 2011, pp. 1073–1082. ACM (2011)
Azzopardi, L., de Rijke, M., Balog, K.: Building simulated queries for known-item topics: an analysis using six european languages. In: SIGIR 2007, pp. 455–462. ACM (2007)
Beitzel, S., Jensen, E., Chowdhury, A., Grossman, D.: Using titles and category names from editor-driven taxonomies for automatic evaluation. In: CIKM 2003, pp. 17–23. ACM (2003)
Cronen-Townsend, S., Croft, W.: Quantifying query ambiguity. In: HLT 2002, pp. 104–109. Morgan Kaufmann Publishers Inc. (2002)
Di Nunzio, G.M.: Working notes CLEF 2007, Appendix C, Results of the Domain Specific Track. In: Working notes CLEF 2007 (2007)
Di Nunzio, G.M.: Working notes CLEF 2008, Appendix D, Results of the Domain Specific Track. In: Working notes CLEF 2008 (2008)
Easley, D., Kleinberg, J.: Networks, crowds, and markets. Cambridge University Press (2010)
Huurnink, B., Hofmann, K., de Rijke, M.: Simulating searches from transaction logs. In: SIGIR 2010 Workshop on the Simulation of Interaction (2010)
Huurnink, B., Hofmann, K., de Rijke, M., Bron, M.: Validating Query Simulators: An Experiment Using Commercial Searches and Purchases. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 40–51. Springer, Heidelberg (2010)
Kim, J., Croft, W.B.: Retrieval experiments using pseudo-desktop collections. In: CIKM 2009, pp. 1297–1306. ACM (2009)
Kluck, M., Gey, F.C.: The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-Language Information Retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 48–56. Springer, Heidelberg (2001)
Kluck, M., Stempfhuber, M.: Domain-Specific Track CLEF 2005: Overview of Results and Approaches, Remarks on the Assessment Analysis. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 212–221. Springer, Heidelberg (2006)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127. ACM (2001)
Liu, T.-Y.: Learning to Rank for Information Retrieval. Springer (2011) ISBN 978-3-642-14266-6
Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press (1999)
Meij, E., de Rijke, M.: The University of Amsterdam at the CLEF 2008 Domain Specific Track - parsimonious relevance and concept models. In: CLEF 2008 Working Notes (2008)
Petras, V.: How one word can make all the difference - using subject metadata for automatic query expansion and reformulation. In: Working notes CLEF 2005 (2005)
Petras, V.: The domain-specific track at CLEF 2008. In: Working notes CLEF 2008 (2008)
Sculley, D.: Combined regression and ranking. In: KDD 2010, pp. 979–988. ACM (2010)
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for SVM. In: 24th International Conference on Machine Learning, pp. 807–814. ACM (2007)
Smucker, M., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM 2007, pp. 623–632. ACM (2007)
Tague, J., Nelson, M.: Simulation of user judgments in bibliographic retrieval systems. In: SIGIR 1981, pp. 66–71 (1981)
Tague, J., Nelson, M., Wu, H.: Problems in the simulation of bibliographic retrieval systems. In: SIGIR 1980, pp. 236–255 (1980)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management 36(5), 697–716 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berendsen, R., Tsagkias, M., de Rijke, M., Meij, E. (2012). Generating Pseudo Test Collections for Learning to Rank Scientific Articles. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds) Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics. CLEF 2012. Lecture Notes in Computer Science, vol 7488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33247-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-33247-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33246-3
Online ISBN: 978-3-642-33247-0
eBook Packages: Computer ScienceComputer Science (R0)