skip to main content
10.1145/2501511.2501524acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Towards anytime active learning: interrupting experts to reduce annotation costs

Published: 11 August 2013 Publication History

Abstract

Many active learning methods use annotation cost or expert quality as part of their framework to select the best data for annotation. While these methods model expert quality, availability, or expertise, they have no direct influence on any of these elements. We present a novel framework built upon decision-theoretic active learning that allows the learner to directly control label quality by allocating a time budget to each annotation. We show that our method is able to improve performance efficiency of the active learner through an interruption mechanism trading off the induced error with the cost of annotation. Our simulation experiments on three document classification tasks show that some interruption is almost always better than none, but that the optimal interruption time varies by dataset.

References

[1]
M. Bilgic and L. Getoor. Value of information lattice: Exploiting probabilistic independence for effective feature sub-set acquisition. Journal of Artificial Intelligence Research (JAIR), 41:69--95, 2011.
[2]
D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994.
[3]
P. Donmez and J. G. Carbonell. Proactive learning:: Cost-sensitive active learning with multiple imperfect oracles. In Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08, page 619, oct 2008.
[4]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008.
[5]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: An update. SIGKDD Explorations, 11, 2009.
[6]
R. A. Howard. Information value theory. IEEE Transactions on Systems Science and Cybernetics, 2(1):22--26, 1966.
[7]
A. Kapoor, E. Horvitz, and S. Basu. Selective supervision: Guiding supervised learning with decision-theoretic active learning. International Joint Conference on Artificial Intelligence (IJCAI), 2007.
[8]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, Dec. 2004.
[9]
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142--150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
[10]
P. Melville, F. Provost, M. Saar-Tsechansky, and R. Mooney. Economical active feature-value acquisition through expected utility estimation. In Proc. of the KDD Workshop on Utility-based Data Mining, 2005.
[11]
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Learning to classify text from labeled and unlabeled documents. In Proceedings of the National Conference on Artificial Intelligence, pages 792--799, 1998.
[12]
N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In International Conference on Machine Learning, pages 441--448, 2001.
[13]
B. Settles, M. Craven, and S. Ray. Multiple-instance active learning. In Neural Information Processing Systems, pages 1289--1296, 2008.
[14]
B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos. Who should label what? instance allocation in multiple expert active learning. In Proc. of the SIAM International Conference on Data Mining (SDM), 2011.
[15]
Y. Zheng, S. Scott, and K. Deng. Active Learning from Multiple Noisy Labelers with Varied Costs. In IEEE 10th International Conference on Data Mining (ICDM), pages 639--648, 2010.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IDEA '13: Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
August 2013
104 pages
ISBN:9781450323291
DOI:10.1145/2501511
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active learning
  2. anytime algorithms
  3. empirical evaluation
  4. value of information

Qualifiers

  • Research-article

Conference

KDD' 13
Sponsor:

Acceptance Rates

IDEA '13 Paper Acceptance Rate 11 of 25 submissions, 44%;
Overall Acceptance Rate 11 of 25 submissions, 44%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Active learningData Mining and Knowledge Discovery10.1007/s10618-016-0469-731:2(287-313)Online publication date: 1-Mar-2017
  • (2017)Active inference for dynamic Bayesian networks with an application to tissue engineeringKnowledge and Information Systems10.1007/s10115-016-0963-750:3(917-943)Online publication date: 1-Mar-2017
  • (2015)Assessing diagnostic complexityComputers in Biology and Medicine10.1016/j.compbiomed.2015.01.01362:C(294-305)Online publication date: 1-Jul-2015
  • (2015)Error-Correction and Aggregation in Crowd-Sourcing of Geopolitical Incident InformationSocial Computing, Behavioral-Cultural Modeling, and Prediction10.1007/978-3-319-16268-3_47(381-387)Online publication date: 17-Mar-2015
  • (2014)Towards Achieving Diagnostic Consensus in Medical Image Interpretation2014 IEEE International Conference on Data Mining Workshop10.1109/ICDMW.2014.134(771-780)Online publication date: Dec-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media