skip to main content
10.1145/2009916.2009939acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Active learning to maximize accuracy vs. effort in interactive information retrieval

Published:24 July 2011Publication History

ABSTRACT

We consider an interactive information retrieval task in which the user is interested in finding several to many relevant documents with minimal effort. Given an initial document ranking, user interaction with the system produces relevance feedback (RF) which the system then uses to revise the ranking. This interactive process repeats until the user terminates the search. To maximize accuracy relative to user effort, we propose an active learning strategy. At each iteration, the document whose relevance is maximally uncertain to the system is slotted high into the ranking in order to obtain user feedback for it. Simulated feedback on the Robust04 TREC collection shows our active learning approach dominates several standard RF baselines relative to the amount of feedback provided by the user. Evaluation on Robust04 under noisy feedback and on LETOR collections further demonstrate the effectiveness of active learning, as well as value of negative feedback in this task scenario.

References

  1. B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and K. Weinberger. Learning to rank with (a lot of) word features. Information Retrieval, 13(3):291--314, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS, 1:585--592, 2002.Google ScholarGoogle Scholar
  3. C. Brandt, T. Joachims, Y. Yue, and J. Bank. Dynamic ranked retrieval. In Proc. WSDM, pages 247--256. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Buckley and S. Robertson. Relevance feedback track overview. In 17th TREC Notebook, 2008.Google ScholarGoogle Scholar
  5. W. Cooper. Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation, 19(1):30--41, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  6. G. Cormack, C. Clarke, and S. Buttcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In SIGIR, pages 758--759, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Craswell and D. Hawking. Overview of the TREC-2004 Web track. In Proc. TREC, 2005.Google ScholarGoogle Scholar
  8. S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  9. F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In Proc. SIGIR, pages 154--161. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. He, D. Cai, and P. Niyogi. Laplacian score for feature selection. NIPS, 18:507, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. He and P. Niyogi. Locality preserving projections. NIPS, 16:153, 2004.Google ScholarGoogle Scholar
  12. T. Hofmann. Probabilistic latent semantic indexing. In Proc. SIGIR, pages 50--57. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Kapoor, E. Horvitz, and S. Basu. Selective supervision: guiding supervised learning with decision-theoretic active learning. In IJCAI, pages 877--882, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Over. The TREC interactive track: an annotated bibliography. Information Processing & Management, 37(3):369--381, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Qin, T. Liu, J. Xu, and H. Li. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, pages 1--29, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In Proc. SIGIR, pages 691--692. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In Proc. SIGKDD, pages 570--579. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW, pages 781--790. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Robertson. The probability ranking principle in IR. Journal of Documentation, 33(4):294--304, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288--297, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Schapire. A brief introduction to boosting. In Proc. IJCAI, volume 16, pages 1401--1406, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In Proc. of CIKM, pages 623--632, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis, 2004.Google ScholarGoogle Scholar
  24. S. Tong and D. Koller. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2:45--66, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proc. SIGIR, pages 115--122. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Wang, H. Fang, and C. Zhai. A study of methods for negative relevance feedback. In Proc. SIGIR, pages 219--226. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Wei and W. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proc. CIKM, pages 403--410. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Active learning to maximize accuracy vs. effort in interactive information retrieval

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
        July 2011
        1374 pages
        ISBN:9781450307574
        DOI:10.1145/2009916

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 July 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader