Skip to main content
Log in

Evidence-based uncertainty sampling for active learning

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Active learning methods select informative instances to effectively learn a suitable classifier. Uncertainty sampling, a frequently utilized active learning strategy, selects instances about which the model is uncertain but it does not consider the reasons for why the model is uncertain. In this article, we present an evidence-based framework that can uncover the reasons for why a model is uncertain on a given instance. Using the evidence-based framework, we discuss two reasons for uncertainty of a model: a model can be uncertain about an instance because it has strong, but conflicting evidence for both classes or it can be uncertain because it does not have enough evidence for either class. Our empirical evaluations on several real-world datasets show that distinguishing between these two types of uncertainties has a drastic impact on the learning efficiency. We further provide empirical and analytical justifications as to why distinguishing between the two uncertainties matters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. 1,507 citations on Google Scholar on April 4th, 2016.

  2. In practice, however, \(E_{+1}(x^{(i)})\) and \(E_{-1}(x^{(i)})\) might not be exactly equal to each other for all uncertain instances, and hence the ranking of uncertain instances based on evidence according to Eqs. 9, 10, 11, and 12 may be different.

  3. This figure does not correspond to a real-time simulation of active learning with users. When the user-provided labels are used, the underlying active learning strategy, whether it be UNC-CE or UNC-IE, would potentially take a different path per user based on their labels. Then, each user would potentially differ on the documents they label, and therefore meaningful comparisons of time and accuracy across users would not be possible.

References

  • Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. In: Proceedings of the fifteenth international conference on machine learning, pp 1–9

  • Bilgic M, Mihalkova L, Getoor L (2010) Active learning for networked data. In: Proceedings of the 27th international conference on machine learning, pp 79–86

  • Chao C, Cakmak M, Thomaz AL (2010) Transparent active learning for robots. In: 5th ACM/IEEE international conference on Human–Robot interaction (HRI), IEEE, pp 317–324

  • Cohn DA (1997) Minimizing statistical bias with queries. In: Advances in neural information processing systems, pp 417–423

  • Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4:129–145

    MATH  Google Scholar 

  • Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Proceedings of the twelfth international conference on machine learning, pp 150–157

  • Donmez P, Carbonell JG, Bennett PN (2007) Dual strategy active learning. In: Machine learning: ECML 2007. Springer, pp 116–127

  • Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Frey PW, Slate DJ (1991) Letter recognition using holland-style adaptive classifiers. Mach Learn 6(2):161–182

    Google Scholar 

  • Gu Q, Zhang T, Han J, Ding CH (2012) Selective labeling via error bound minimization. In: Advances in neural information processing systems, pp 323–331

  • Gu Q, Zhang T, Han J (2014) Batch-mode active learning via error bound minimization. In: Proceedings of the Thirtieth conference annual conference on uncertainty in artificial intelligence (UAI-14). AUAI Press, Corvallis, Oregon, pp 300–309

  • Guyon I et al (2011) Datasets of the active learning challenge. J Mach Learn Res

  • Hoi SC, Jin R, Lyu MR (2006a) Large-scale text categorization by batch mode active learning. In: Proceedings of the 15th international conference on World Wide Web, ACM, pp 633–642

  • Hoi SC, Jin R, Zhu J, Lyu MR (2006b) Batch mode active learning and its application to medical image classification. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 417–424

  • Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer-Verlag New York, Inc., pp 3–12

  • Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 142–150

  • MacKay DJ (1992) Information-based objective functions for active data selection. Neural Comput 4(4):590–604

    Article  Google Scholar 

  • McCallum A, Nigam K et al (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, Citeseer, vol 752, pp 41–48

  • Melville P, Mooney RJ (2004) Diverse ensembles for active learning. In: Proceedings of the twenty-first international conference on machine learning, pp 74

  • Mitchell TM (1982) Generalization as search. Artif Intell 18(2):203–226

    Article  MathSciNet  Google Scholar 

  • Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: Proceedings of the twenty-first international conference on machine learning, ACM, p 79

  • Pace RK, Barry R (1997) Sparse spatial autoregressions. Stat Probab Lett 33(3):291–297

    Article  MATH  Google Scholar 

  • Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the eighteenth international conference on machine learning. Morgan Kaufmann Publishers Inc., ICML ’01, pp 441–448

  • Sculley D (2007) Online active learning methods for fast label-efficient spam filtering. In: Fourth conference on email and anti-spam (CEAS)

  • Segal R, Markowitz T, Arnold W (2006) Fast uncertainty sampling for labeling large e-mail corpora. In: Third conference on email and anti-spam (CEAS)

  • Senge R, Bösner S, Dembczyński K, Haasenritter J, Hirsch O, Donner-Banzhoff N, Hüllermeier E (2014) Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty. Inf Sci 255:16–29

    Article  MathSciNet  MATH  Google Scholar 

  • Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6(1):1–114

    Article  MathSciNet  MATH  Google Scholar 

  • Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1070–1079

  • Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on computational learning theory, ACM, pp 287–294

  • Sharma M, Bilgic M (2013) Most-surely vs. least-surely uncertain. In: IEEE 13th international conference on data mining (ICDM), pp 667–676

  • Sindhwani V, Melville P, Lawrence RD (2009) Uncertainty sampling and transductive experimental design for active dual supervision. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 953–960

  • Steuer RE (1989) Multiple criteria optimization: theory, computations, and application. Krieger Pub Co

  • Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. In: Proceedings of the sixteenth international conference on machine learning, pp 406–414

  • Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM international conference on multimedia, ACM, pp 107–118

  • Xu Z, Yu K, Tresp V, Xu X, Wang J (2003) Representative sampling for text classification using support vector machines. In: Advances in information retrieval. Lecture notes in computer science, vol 2633, pp 393–407

  • Yu K, Bi J, Tresp V (2006) Active learning via transductive experimental design. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 1081–1088

  • Zhang C, Chen T (2002) An active learning framework for content-based information retrieval. IEEE Trans Multimedia 4(2):260–268

    Article  Google Scholar 

  • Zhu J, Wang H, Yao T, Tsou BK (2008) Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proceedings of the 22nd international conference on computational linguistics, vol 1, pp 1137–1144

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation CAREER award no. IIS-1350337.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mustafa Bilgic.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, M., Bilgic, M. Evidence-based uncertainty sampling for active learning. Data Min Knowl Disc 31, 164–202 (2017). https://doi.org/10.1007/s10618-016-0460-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0460-3

Keywords

Navigation