ABSTRACT
Proactive learning is a generalization of active learning designed to relax unrealistic assumptions and thereby reach practical applications. Active learning seeks to select the most informative unlabeled instances and ask an omniscient oracle for their labels, so as to retrain the learning algorithm maximizing accuracy. However, the oracle is assumed to be infallible (never wrong), indefatigable (always answers), individual (only one oracle), and insensitive to costs (always free or always charges the same). Proactive learning relaxes all four of these assumptions, relying on a decision-theoretic approach to jointly select the optimal oracle and instance, by casting the problem as a utility optimization problem subject to a budget constraint. Results on multi-oracle optimization over several data sets demonstrate the superiority of our approach over the single-imperfect-oracle baselines in most cases.
- Agnostic learning vs. prior knowledge challenge and data representation discovery workshop, 2007. IJCNN '07.Google Scholar
- C. Dimitrakakis and C. Savu-Krohn. Cost-minimising strategies for data labelling: optimal stopping and active learning. Foundations of Information and Knowledge Systems, FOIKS 2007, 2007. Google ScholarDigital Library
- P. Donmez and J. G. Carbonell. Optimizing estimated loss reduction for active sampling in rank learning. International Conference on Machine Learning, ICML '08, 2008. Google ScholarDigital Library
- P. Donmez and J. G. Carbonell. Paired sampling in density-sensitive active learning. International Symposium on Artificial Intelligence and Mathematics, 2008.Google Scholar
- J. Hartigan and M. Wong. A k-means clustering algorithm. Applied Statistics, 28.Google Scholar
- P. Melville, M. Saar-Tsechansky, F. Provost, and R. Mooney. Economical active feature-value acquisition through expected utility estimation. KDD '05 Workshop on Utility-based data mining, 2005. Google ScholarDigital Library
- D. Newman, S. Hettich, C. Blake, and C. Merz. UCI repository of machine learning databases, 1998. University of California, Irvine, Dept. of Information and Computer Sciences.Google Scholar
- H. Nguyen and A. Smeulders. Active learning with pre-clustering. ICML '04, pages 623--630, 2004. Google ScholarDigital Library
- T. Pham, M. Worring, and A. Smeulders. Face detection by aggregated bayesian network classifiers. Pattern Recognition Letters, 23. Google ScholarDigital Library
- N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. ICML '01, pages 441--448, 2001. Google ScholarDigital Library
- M. Saar-Tsechansky and F. Provost. Decision-centric active learning of binary-outcome models. Journal of Information Systems Research, 18. Google ScholarDigital Library
- S. Tong and D. Koller. Support vector machine active learning with applications to text classification. ICML '00, pages 999--1006, 2000. Google ScholarDigital Library
- G. M. Weiss and Y. Tian. Maximizing classifier utility when training data is costly. ACM SIGKDD Explorations Newsletter, 8. Google ScholarDigital Library
Index Terms
- Proactive learning: cost-sensitive active learning with multiple imperfect oracles
Recommendations
Active learning with c-certainty
PAKDD'12: Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part IIt is well known that the noise in labels deteriorates the performance of active learning. To reduce the noise, works on multiple oracles have been proposed. However, there is still no way to guarantee the label quality. In addition, most previous works ...
A theory of learning with corrupted labels
It is usual in machine learning theory to assume that the training and testing sets comprise of draws from the same distribution. This is rarely, if ever, true and one must admit the presence of corruption. There are many different types of corruption ...
Comments