research-article

Proactive learning: cost-sensitive active learning with multiple imperfect oracles

Authors:
Pinar Donmez

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Jaime G. Carbonell

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementOctober 2008Pages 619–628https://doi.org/10.1145/1458082.1458165

Published:26 October 2008Publication History

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 619–628

ABSTRACT

Proactive learning is a generalization of active learning designed to relax unrealistic assumptions and thereby reach practical applications. Active learning seeks to select the most informative unlabeled instances and ask an omniscient oracle for their labels, so as to retrain the learning algorithm maximizing accuracy. However, the oracle is assumed to be infallible (never wrong), indefatigable (always answers), individual (only one oracle), and insensitive to costs (always free or always charges the same). Proactive learning relaxes all four of these assumptions, relying on a decision-theoretic approach to jointly select the optimal oracle and instance, by casting the problem as a utility optimization problem subject to a budget constraint. Results on multi-oracle optimization over several data sets demonstrate the superiority of our approach over the single-imperfect-oracle baselines in most cases.

References

Agnostic learning vs. prior knowledge challenge and data representation discovery workshop, 2007. IJCNN '07.Google Scholar
C. Dimitrakakis and C. Savu-Krohn. Cost-minimising strategies for data labelling: optimal stopping and active learning. Foundations of Information and Knowledge Systems, FOIKS 2007, 2007. Google ScholarDigital Library
P. Donmez and J. G. Carbonell. Optimizing estimated loss reduction for active sampling in rank learning. International Conference on Machine Learning, ICML '08, 2008. Google ScholarDigital Library
P. Donmez and J. G. Carbonell. Paired sampling in density-sensitive active learning. International Symposium on Artificial Intelligence and Mathematics, 2008.Google Scholar
J. Hartigan and M. Wong. A k-means clustering algorithm. Applied Statistics, 28.Google Scholar
P. Melville, M. Saar-Tsechansky, F. Provost, and R. Mooney. Economical active feature-value acquisition through expected utility estimation. KDD '05 Workshop on Utility-based data mining, 2005. Google ScholarDigital Library
D. Newman, S. Hettich, C. Blake, and C. Merz. UCI repository of machine learning databases, 1998. University of California, Irvine, Dept. of Information and Computer Sciences.Google Scholar
H. Nguyen and A. Smeulders. Active learning with pre-clustering. ICML '04, pages 623--630, 2004. Google ScholarDigital Library
T. Pham, M. Worring, and A. Smeulders. Face detection by aggregated bayesian network classifiers. Pattern Recognition Letters, 23. Google ScholarDigital Library
N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. ICML '01, pages 441--448, 2001. Google ScholarDigital Library
M. Saar-Tsechansky and F. Provost. Decision-centric active learning of binary-outcome models. Journal of Information Systems Research, 18. Google ScholarDigital Library
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. ICML '00, pages 999--1006, 2000. Google ScholarDigital Library
G. M. Weiss and Y. Tian. Maximizing classifier utility when training data is costly. ACM SIGKDD Explorations Newsletter, 8. Google ScholarDigital Library

Index Terms

Proactive learning: cost-sensitive active learning with multiple imperfect oracles
1. Computing methodologies
  1. Machine learning
    1. Learning settings
2. Information systems

Recommendations

Active learning with c-certainty
PAKDD'12: Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

It is well known that the noise in labels deteriorates the performance of active learning. To reduce the noise, works on multiple oracles have been proposed. However, there is still no way to guarantee the label quality. In addition, most previous works ...
Read More
Proactive learning: towards learning with multiple imperfect predictors
Read More
A theory of learning with corrupted labels

It is usual in machine learning theory to assume that the training and testing sets comprise of draws from the same distribution. This is rarely, if ever, true and one must admit the presence of corruption. There are many different types of corruption ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cost-sensitive active learning
decision theory
multiple oracles
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 138
  Total Citations
  View Citations
- 1,020
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Proactive learning: cost-sensitive active learning with multiple imperfect oracles

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Active learning with c-certainty

Proactive learning: towards learning with multiple imperfect predictors

A theory of learning with corrupted labels