research-article

Active learning to maximize accuracy vs. effort in interactive information retrieval

Authors:
Aibo Tian

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Matthew Lease

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalJuly 2011Pages 145–154https://doi.org/10.1145/2009916.2009939

Published:24 July 2011Publication History

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Pages 145–154

ABSTRACT

We consider an interactive information retrieval task in which the user is interested in finding several to many relevant documents with minimal effort. Given an initial document ranking, user interaction with the system produces relevance feedback (RF) which the system then uses to revise the ranking. This interactive process repeats until the user terminates the search. To maximize accuracy relative to user effort, we propose an active learning strategy. At each iteration, the document whose relevance is maximally uncertain to the system is slotted high into the ranking in order to obtain user feedback for it. Simulated feedback on the Robust04 TREC collection shows our active learning approach dominates several standard RF baselines relative to the amount of feedback provided by the user. Evaluation on Robust04 under noisy feedback and on LETOR collections further demonstrate the effectiveness of active learning, as well as value of negative feedback in this task scenario.

References

B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and K. Weinberger. Learning to rank with (a lot of) word features. Information Retrieval, 13(3):291--314, 2010. Google ScholarDigital Library
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS, 1:585--592, 2002.Google Scholar
C. Brandt, T. Joachims, Y. Yue, and J. Bank. Dynamic ranked retrieval. In Proc. WSDM, pages 247--256. ACM, 2011. Google ScholarDigital Library
C. Buckley and S. Robertson. Relevance feedback track overview. In 17th TREC Notebook, 2008.Google Scholar
W. Cooper. Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation, 19(1):30--41, 1968.Google ScholarCross Ref
G. Cormack, C. Clarke, and S. Buttcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In SIGIR, pages 758--759, 2009. Google ScholarDigital Library
N. Craswell and D. Hawking. Overview of the TREC-2004 Web track. In Proc. TREC, 2005.Google Scholar
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In Proc. SIGIR, pages 154--161. ACM, 2006. Google ScholarDigital Library
X. He, D. Cai, and P. Niyogi. Laplacian score for feature selection. NIPS, 18:507, 2006.Google ScholarDigital Library
X. He and P. Niyogi. Locality preserving projections. NIPS, 16:153, 2004.Google Scholar
T. Hofmann. Probabilistic latent semantic indexing. In Proc. SIGIR, pages 50--57. ACM, 1999. Google ScholarDigital Library
A. Kapoor, E. Horvitz, and S. Basu. Selective supervision: guiding supervised learning with decision-theoretic active learning. In IJCAI, pages 877--882, 2007. Google ScholarDigital Library
P. Over. The TREC interactive track: an annotated bibliography. Information Processing & Management, 37(3):369--381, 2001. Google ScholarDigital Library
T. Qin, T. Liu, J. Xu, and H. Li. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, pages 1--29, 2010. Google ScholarDigital Library
F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In Proc. SIGIR, pages 691--692. ACM, 2006. Google ScholarDigital Library
F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In Proc. SIGKDD, pages 570--579. ACM, 2007. Google ScholarDigital Library
D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW, pages 781--790. ACM, 2010. Google ScholarDigital Library
S. Robertson. The probability ranking principle in IR. Journal of Documentation, 33(4):294--304, 1977.Google ScholarCross Ref
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288--297, 1990.Google ScholarCross Ref
R. Schapire. A brief introduction to boosting. In Proc. IJCAI, volume 16, pages 1401--1406, 1999. Google ScholarDigital Library
M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In Proc. of CIKM, pages 623--632, 2007. Google ScholarDigital Library
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis, 2004.Google Scholar
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2:45--66, 2002. Google ScholarDigital Library
J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proc. SIGIR, pages 115--122. ACM, 2009. Google ScholarDigital Library
X. Wang, H. Fang, and C. Zhai. A study of methods for negative relevance feedback. In Proc. SIGIR, pages 219--226. ACM, 2008. Google ScholarDigital Library
X. Wei and W. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006. Google ScholarDigital Library
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proc. CIKM, pages 403--410. ACM, 2001. Google ScholarDigital Library

Index Terms

Active learning to maximize accuracy vs. effort in interactive information retrieval
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment

Recommendations

Optimization on active learning strategy for object category retrieval
ICIP'09: Proceedings of the 16th IEEE international conference on Image processing

Active learning is a machine learning technique which has attracted a lot of research interest in the content-based image retrieval (CBIR) in recent years. To be effective, an active learning system must be fast and efficient using as few (relevance) ...
Read More
Intention-focused active reranking for image object retrieval
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

We consider the problem of ranking refinement for image object retrieval, whose goal is to improve an existing ranking function by a small number of labeled instances. To retrieve the relevant image object, one state-of-the-art approach is to use the ...
Read More
SVM-based Interactive Document Retrieval with Active Learning
Abstract
This paper describes an application of SVM (Support Vector Machines) to interactive document retrieval using active learning. Some works have been done to apply classification learning like SVM to relevance feedback and have obtained successful ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
July 2011
1374 pages
ISBN:9781450307574
DOI:10.1145/2009916
General Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Jian-Yun Nie
University of Montreal, Canada
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Tat-Seng Chua
National University of Singapore
,
W. Bruce Croft
University of Massachusetts, Amherst, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
personalized search
relevance feedback
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 480
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Active learning to maximize accuracy vs. effort in interactive information retrieval

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimization on active learning strategy for object category retrieval

Intention-focused active reranking for image object retrieval

SVM-based Interactive Document Retrieval with Active Learning