research-article

Towards anytime active learning: interrupting experts to reduce annotation costs

Authors:

Maria E. Ramirez-Loaiza,

Mustafa BilgicAuthors Info & Claims

IDEA '13: Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics

Pages 87 - 94

https://doi.org/10.1145/2501511.2501524

Published: 11 August 2013 Publication History

Abstract

Many active learning methods use annotation cost or expert quality as part of their framework to select the best data for annotation. While these methods model expert quality, availability, or expertise, they have no direct influence on any of these elements. We present a novel framework built upon decision-theoretic active learning that allows the learner to directly control label quality by allocating a time budget to each annotation. We show that our method is able to improve performance efficiency of the active learner through an interruption mechanism trading off the induced error with the cost of annotation. Our simulation experiments on three document classification tasks show that some interruption is almost always better than none, but that the optimal interruption time varies by dataset.

References

[1]

M. Bilgic and L. Getoor. Value of information lattice: Exploiting probabilistic independence for effective feature sub-set acquisition. Journal of Artificial Intelligence Research (JAIR), 41:69--95, 2011.

Digital Library

[2]

D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994.

[3]

P. Donmez and J. G. Carbonell. Proactive learning:: Cost-sensitive active learning with multiple imperfect oracles. In Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08, page 619, oct 2008.

Digital Library

[4]

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008.

Digital Library

[5]

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: An update. SIGKDD Explorations, 11, 2009.

Digital Library

[6]

R. A. Howard. Information value theory. IEEE Transactions on Systems Science and Cybernetics, 2(1):22--26, 1966.

[7]

A. Kapoor, E. Horvitz, and S. Basu. Selective supervision: Guiding supervised learning with decision-theoretic active learning. International Joint Conference on Artificial Intelligence (IJCAI), 2007.

Digital Library

[8]

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, Dec. 2004.

Digital Library

[9]

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142--150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.

Digital Library

[10]

P. Melville, F. Provost, M. Saar-Tsechansky, and R. Mooney. Economical active feature-value acquisition through expected utility estimation. In Proc. of the KDD Workshop on Utility-based Data Mining, 2005.

Digital Library

[11]

K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Learning to classify text from labeled and unlabeled documents. In Proceedings of the National Conference on Artificial Intelligence, pages 792--799, 1998.

Digital Library

[12]

N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In International Conference on Machine Learning, pages 441--448, 2001.

Digital Library

[13]

B. Settles, M. Craven, and S. Ray. Multiple-instance active learning. In Neural Information Processing Systems, pages 1289--1296, 2008.

[14]

B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos. Who should label what? instance allocation in multiple expert active learning. In Proc. of the SIAM International Conference on Data Mining (SDM), 2011.

[15]

Y. Zheng, S. Scott, and K. Deng. Active Learning from Multiple Noisy Labelers with Varied Costs. In IEEE 10th International Conference on Data Mining (ICDM), pages 639--648, 2010.

Digital Library

Cited By

Ramirez-Loaiza MSharma MKumar GBilgic M(2017)Active learningData Mining and Knowledge Discovery10.1007/s10618-016-0469-731:2(287-313)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1007/s10618-016-0469-7
Komurlu CShao JAkar BBayrak EBrey ECinar ABilgic M(2017)Active inference for dynamic Bayesian networks with an application to tissue engineeringKnowledge and Information Systems10.1007/s10115-016-0963-750:3(917-943)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1007/s10115-016-0963-7
Zamacona JNiehaus RRasin AFurst JRaicu D(2015)Assessing diagnostic complexityComputers in Biology and Medicine10.1016/j.compbiomed.2015.01.01362:C(294-305)Online publication date: 1-Jul-2015
https://dl.acm.org/doi/10.1016/j.compbiomed.2015.01.013
Show More Cited By

Index Terms

Towards anytime active learning: interrupting experts to reduce annotation costs
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

A study of active learning methods for named entity recognition in clinical text

Display Omitted We developed novel active learning algorithms for clinical named entity recognition.Equal cost per sample is not a practical annotation cost assumption in this task.We evaluated methods based on two types of estimated annotation cost.To ...
Studying Active Learning in the Cost-Sensitive Framework
HICSS '12: Proceedings of the 2012 45th Hawaii International Conference on System Sciences

Active learning is a learning paradigm that actively acquires extra information with an "effort" for a certain "gain" when building learning models. This paper unifies the effort and gain by studying active learning in the cost-sensitive framework. The ...
Online Active Learning with Expert Advice

In literature, learning with expert advice methods usually assume that a learner always obtain the true label of every incoming training instance at the end of each trial. However, in many real-world applications, acquiring the true labels of all ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IDEA '13: Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics

August 2013

104 pages

ISBN:9781450323291

DOI:10.1145/2501511

Editors:
Duen Horng Chau
Georgia Tech
,
Jilles Vreeken
University of Antwerp
,
Matthijs van Leeuwen
KU Leuven
,
Christos Faloutsos
Carnegie Mellon University

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD' 13

Sponsor:

KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 11, 2013

Illinois, Chicago

Acceptance Rates

IDEA '13 Paper Acceptance Rate 11 of 25 submissions, 44%;

Overall Acceptance Rate 11 of 25 submissions, 44%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
100
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ramirez-Loaiza MSharma MKumar GBilgic M(2017)Active learningData Mining and Knowledge Discovery10.1007/s10618-016-0469-731:2(287-313)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1007/s10618-016-0469-7
Komurlu CShao JAkar BBayrak EBrey ECinar ABilgic M(2017)Active inference for dynamic Bayesian networks with an application to tissue engineeringKnowledge and Information Systems10.1007/s10115-016-0963-750:3(917-943)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1007/s10115-016-0963-7
Zamacona JNiehaus RRasin AFurst JRaicu D(2015)Assessing diagnostic complexityComputers in Biology and Medicine10.1016/j.compbiomed.2015.01.01362:C(294-305)Online publication date: 1-Jul-2015
https://dl.acm.org/doi/10.1016/j.compbiomed.2015.01.013
Ororbia AXu YD’Orazio VReitter D(2015)Error-Correction and Aggregation in Crowd-Sourcing of Geopolitical Incident InformationSocial Computing, Behavioral-Cultural Modeling, and Prediction10.1007/978-3-319-16268-3_47(381-387)Online publication date: 17-Mar-2015
https://doi.org/10.1007/978-3-319-16268-3_47
Seidel MRasin AFurst JRaicu D(2014)Towards Achieving Diagnostic Consensus in Medical Image Interpretation2014 IEEE International Conference on Data Mining Workshop10.1109/ICDMW.2014.134(771-780)Online publication date: Dec-2014
https://doi.org/10.1109/ICDMW.2014.134

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten