Abstract
Relevance evaluation is an important topic in Web search engine research. Traditional evaluation methods resort to huge amount of human efforts which lead to an extremely time-consuming process in practice. With analysis on large scale user query logs and click-through data, we propose a performance evaluation method that fully automatically generates large scale Web search topics and answer sets under Cranfield framework. These query-to-answer pairs are directly utilized in relevance evaluation with several widely-adopted precision/recall-related retrieval performance metrics. Besides single search engine log analysis, we propose user behavior models on multiple search engines’ click-through logs to reduce potential bias among different search engines. Experimental results show that the evaluation results are similar to those gained by traditional human annotation, and our method avoids the propensity and subjectivity of manual judgments by experts in traditional ways.
Supported by the Chinese National Key Foundation Research & Development Plan (2004CB318108), Natural Science Foundation (60621062, 60503064, 60736044) and National 863 High Technology Project (2006AA01Z141).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agichtein, E., Brill, E., Dumais, S., Ragno, R.: Learning user interaction models for predicting web search result preferences. In: SIGIR 2006, pp. 3–10. ACM, New York (2006)
Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)
Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling for large collections. Inf. Retr. 10(6), 491–508 (2007)
Cleverdon, C., Mills, J., Keen, M.: Aslib Cranfield research project - Factors determining the performance of indexing systems; Design; Part 1, vol. 1 (1966)
Craswell, M., Hawking, D.: Overview of the TREC 2003 Web track. In: Voorhees, E.M., Buckland, L.P. (eds.) NIST Special Publication 500-261: TREC 2004 (2004)
Fuxman, A., Tsaparas, P., Achan, K., Agrawal, R.: Using the wisdom of the crowds for keyword generation. In: Proc. of WWW 2008, pp. 61–70. ACM, New York (2008)
Hawking, D., Craswell, N.: Overview of the TREC 2003 Web track. In: Voorhees, E.M., Buckland, L.P. (eds.) NIST Special Publication 500-255: TREC 2003 (2003)
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: SIGIR 2005, pp. 154–161. ACM, New York (2005)
Dou, Z., Song, R., Yuan, X., Wen, J.R.: Are click-through data adequate for learning web search rankings? In: CIKM 2008, New York, NY, pp. 73–82 (2008)
Kent, A., Berry, M., Leuhrs, F.U., Perry, J.W.: Machine literature searching VIII. Operational criteria for designing information retrieval systems. American Documentation 6(2), 93–101 (1955)
Liu, Y., Zhang, M., Ru, L., Ma, S.: Automatic Query Type Identification Based on Click Through Information. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 593–600. Springer, Heidelberg (2006)
Nuray, R., Can, F.: Automatic ranking of retrieval systems in imperfect environments. In: Proc. of SIGIR 2003, pp. 379–380. ACM, New York (2003)
Oard, D.W., Kim, J.: Modeling information content using observable behavior. In: Proc. of ASIST 2001, Washington, D.C., USA, pp. 38–45 (2001)
Rose, D.E., Levinson, D.: Understanding user goals in web search. In: Proc. of WWW 2004, pp. 13–19. ACM, New York (2004)
Saracevic, T.: Evaluation of evaluation in information retrieval. In: Proc. of SIGIR 1995, pp. 138–146. ACM, New York (1995)
Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. SIGIR Forum 33(1), 6–12 (1999)
Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proc. of SIGIR 2001, pp. 66–73. ACM, New York (2001)
Soboroff, I., Voorhees, E., Craswell, N.: Summary of the SIGIR 2003 workshop on defining evaluation methodologies for terabyte-scale test collections. SIGIR Forum 37(2), 55–58 (2003)
Voorhees, E.M.: The Philosophy of Information Retrieval Evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 355–370. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cen, R., Liu, Y., Zhang, M., Ru, L., Ma, S. (2009). Automatic Search Engine Performance Evaluation with the Wisdom of Crowds. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-04769-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04768-8
Online ISBN: 978-3-642-04769-5
eBook Packages: Computer ScienceComputer Science (R0)