Skip to main content

Automatic Search Engine Performance Evaluation with the Wisdom of Crowds

  • Conference paper
Information Retrieval Technology (AIRS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Included in the following conference series:

Abstract

Relevance evaluation is an important topic in Web search engine research. Traditional evaluation methods resort to huge amount of human efforts which lead to an extremely time-consuming process in practice. With analysis on large scale user query logs and click-through data, we propose a performance evaluation method that fully automatically generates large scale Web search topics and answer sets under Cranfield framework. These query-to-answer pairs are directly utilized in relevance evaluation with several widely-adopted precision/recall-related retrieval performance metrics. Besides single search engine log analysis, we propose user behavior models on multiple search engines’ click-through logs to reduce potential bias among different search engines. Experimental results show that the evaluation results are similar to those gained by traditional human annotation, and our method avoids the propensity and subjectivity of manual judgments by experts in traditional ways.

Supported by the Chinese National Key Foundation Research & Development Plan (2004CB318108), Natural Science Foundation (60621062, 60503064, 60736044) and National 863 High Technology Project (2006AA01Z141).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agichtein, E., Brill, E., Dumais, S., Ragno, R.: Learning user interaction models for predicting web search result preferences. In: SIGIR 2006, pp. 3–10. ACM, New York (2006)

    Google Scholar 

  2. Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)

    Article  MATH  Google Scholar 

  3. Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling for large collections. Inf. Retr. 10(6), 491–508 (2007)

    Article  Google Scholar 

  4. Cleverdon, C., Mills, J., Keen, M.: Aslib Cranfield research project - Factors determining the performance of indexing systems; Design; Part 1, vol. 1 (1966)

    Google Scholar 

  5. Craswell, M., Hawking, D.: Overview of the TREC 2003 Web track. In: Voorhees, E.M., Buckland, L.P. (eds.) NIST Special Publication 500-261: TREC 2004 (2004)

    Google Scholar 

  6. Fuxman, A., Tsaparas, P., Achan, K., Agrawal, R.: Using the wisdom of the crowds for keyword generation. In: Proc. of WWW 2008, pp. 61–70. ACM, New York (2008)

    Google Scholar 

  7. Hawking, D., Craswell, N.: Overview of the TREC 2003 Web track. In: Voorhees, E.M., Buckland, L.P. (eds.) NIST Special Publication 500-255: TREC 2003 (2003)

    Google Scholar 

  8. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: SIGIR 2005, pp. 154–161. ACM, New York (2005)

    Google Scholar 

  9. Dou, Z., Song, R., Yuan, X., Wen, J.R.: Are click-through data adequate for learning web search rankings? In: CIKM 2008, New York, NY, pp. 73–82 (2008)

    Google Scholar 

  10. Kent, A., Berry, M., Leuhrs, F.U., Perry, J.W.: Machine literature searching VIII. Operational criteria for designing information retrieval systems. American Documentation 6(2), 93–101 (1955)

    Google Scholar 

  11. Liu, Y., Zhang, M., Ru, L., Ma, S.: Automatic Query Type Identification Based on Click Through Information. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 593–600. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Nuray, R., Can, F.: Automatic ranking of retrieval systems in imperfect environments. In: Proc. of SIGIR 2003, pp. 379–380. ACM, New York (2003)

    Google Scholar 

  13. Oard, D.W., Kim, J.: Modeling information content using observable behavior. In: Proc. of ASIST 2001, Washington, D.C., USA, pp. 38–45 (2001)

    Google Scholar 

  14. Rose, D.E., Levinson, D.: Understanding user goals in web search. In: Proc. of WWW 2004, pp. 13–19. ACM, New York (2004)

    Google Scholar 

  15. Saracevic, T.: Evaluation of evaluation in information retrieval. In: Proc. of SIGIR 1995, pp. 138–146. ACM, New York (1995)

    Google Scholar 

  16. Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. SIGIR Forum 33(1), 6–12 (1999)

    Article  Google Scholar 

  17. Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proc. of SIGIR 2001, pp. 66–73. ACM, New York (2001)

    Google Scholar 

  18. Soboroff, I., Voorhees, E., Craswell, N.: Summary of the SIGIR 2003 workshop on defining evaluation methodologies for terabyte-scale test collections. SIGIR Forum 37(2), 55–58 (2003)

    Article  Google Scholar 

  19. Voorhees, E.M.: The Philosophy of Information Retrieval Evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 355–370. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cen, R., Liu, Y., Zhang, M., Ru, L., Ma, S. (2009). Automatic Search Engine Performance Evaluation with the Wisdom of Crowds. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04769-5_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04768-8

  • Online ISBN: 978-3-642-04769-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics