Classifier Risk Estimation Under Limited Labeling Resources

Kumar, Anurag; Raj, Bhiksha

doi:10.1007/978-3-319-93034-3_1

Anurag Kumar¹⁹ &
Bhiksha Raj¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10937))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

5362 Accesses

Abstract

Evaluating a trained system is an important component of machine learning. Labeling test data for large scale evaluation of a trained model can be extremely time consuming and expensive. In this paper we propose strategies for estimating performance of a classifier using as little labeling resource as possible. Specifically, we assume a labeling budget is given and the goal is to get a good estimate of the classifier performance using the provided labeling budget. We propose strategies to get a precise estimate of classifier accuracy under this restricted labeling budget scenario. We show that these strategies can reduce the variance in estimation of classifier accuracy by a significant amount compared to simple random sampling (over $\mathbf {65\%}$ in several cases). In terms of labeling resource, the reduction in number of samples required (compared to random sampling) to estimate the classifier accuracy with only $1\%$ error is high as $\mathbf {60\%}$ in some cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards optimal model evaluation: enhancing active testing with actively improved estimators

Article Open access 09 May 2024

Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data

Toward optimal probabilistic active learning using a Bayesian approach

Article Open access 04 May 2021

References

Pascal large scale learning challenge (2008). largescale.ml.tu-berlin.de
Bennett, P.N., Carvalho, V.R.: Online stratified sampling: evaluating classifiers at web-scale. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1581–1584. ACM (2010)
Google Scholar
Cochran, W.G.: Sampling Techniques. Wiley, New York (2007)
MATH Google Scholar
Dalenius, T., Gurney, M.: The problem of optimum stratification. II. Scand. Actuarial J. 1951(1–2), 133–148 (1951)
Article MathSciNet Google Scholar
Dalenius, T., Hodges Jr., J.L.: Minimum variance stratification. J. Am. Stat. Assoc. 54(285), 88–101 (1959)
Article Google Scholar
Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: estimating classification and regression errors without labels. J. Mach. Learn. Res. 11, 1323–1351 (2010)
MathSciNet MATH Google Scholar
Druck, G., McCallum, A.: Toward interactive training and evaluation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 947–956. ACM (2011)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. SSS. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Hansen, H., Hurwitz, W., Madow, W.G.: Sample survey methods and theory (1953)
Google Scholar
Jaffe, A., Nadler, B., Kluger, Y.: Estimating the accuracies of multiple classifiers without labeled data. arXiv preprint arXiv:1407.7644 (2014)
Katariya, N., Iyer, A., Sarawagi, S.: Active evaluation of classifiers on large datasets. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 329–338. IEEE (2012)
Google Scholar
Keerthi, S., DeCoste, D.: A modified finite Newton method for fast solution of large scale linear SVMs. J. Mach. Learn. Res. 6, 341–361 (2005)
MathSciNet MATH Google Scholar
Lewis, D., Yang, Y., Rose, T., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Google Scholar
Platanios, E., Blum, A., Mitchell, T.: Estimating accuracy from unlabeled data (2014)
Google Scholar
Sawade, C., Landwehr, N., Bickel, S., Scheffer, T.: Active risk estimation. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 951–958 (2010)
Google Scholar
Sethi, V.: A note on optimum stratification of populations for estimating the population means. Aust. J. Stat. 5(1), 20–33 (1963)
Article Google Scholar
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
Article MathSciNet Google Scholar
Singh, R.: Approximately optimum stratification on the auxiliary variable. J. Am. Stat. Assoc. 66(336), 829–833 (1971)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Anurag Kumar & Bhiksha Raj

Authors

Anurag Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Bhiksha Raj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anurag Kumar .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 319 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, A., Raj, B. (2018). Classifier Risk Estimation Under Limited Labeling Resources. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10937. Springer, Cham. https://doi.org/10.1007/978-3-319-93034-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-93034-3_1
Published: 19 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93033-6
Online ISBN: 978-3-319-93034-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics