Abstract
Evaluating a trained system is an important component of machine learning. Labeling test data for large scale evaluation of a trained model can be extremely time consuming and expensive. In this paper we propose strategies for estimating performance of a classifier using as little labeling resource as possible. Specifically, we assume a labeling budget is given and the goal is to get a good estimate of the classifier performance using the provided labeling budget. We propose strategies to get a precise estimate of classifier accuracy under this restricted labeling budget scenario. We show that these strategies can reduce the variance in estimation of classifier accuracy by a significant amount compared to simple random sampling (over \(\mathbf {65\%}\) in several cases). In terms of labeling resource, the reduction in number of samples required (compared to random sampling) to estimate the classifier accuracy with only \(1\%\) error is high as \(\mathbf {60\%}\) in some cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pascal large scale learning challenge (2008). largescale.ml.tu-berlin.de
Bennett, P.N., Carvalho, V.R.: Online stratified sampling: evaluating classifiers at web-scale. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1581–1584. ACM (2010)
Cochran, W.G.: Sampling Techniques. Wiley, New York (2007)
Dalenius, T., Gurney, M.: The problem of optimum stratification. II. Scand. Actuarial J. 1951(1–2), 133–148 (1951)
Dalenius, T., Hodges Jr., J.L.: Minimum variance stratification. J. Am. Stat. Assoc. 54(285), 88–101 (1959)
Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: estimating classification and regression errors without labels. J. Mach. Learn. Res. 11, 1323–1351 (2010)
Druck, G., McCallum, A.: Toward interactive training and evaluation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 947–956. ACM (2011)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. SSS. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Hansen, H., Hurwitz, W., Madow, W.G.: Sample survey methods and theory (1953)
Jaffe, A., Nadler, B., Kluger, Y.: Estimating the accuracies of multiple classifiers without labeled data. arXiv preprint arXiv:1407.7644 (2014)
Katariya, N., Iyer, A., Sarawagi, S.: Active evaluation of classifiers on large datasets. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 329–338. IEEE (2012)
Keerthi, S., DeCoste, D.: A modified finite Newton method for fast solution of large scale linear SVMs. J. Mach. Learn. Res. 6, 341–361 (2005)
Lewis, D., Yang, Y., Rose, T., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Platanios, E., Blum, A., Mitchell, T.: Estimating accuracy from unlabeled data (2014)
Sawade, C., Landwehr, N., Bickel, S., Scheffer, T.: Active risk estimation. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 951–958 (2010)
Sethi, V.: A note on optimum stratification of populations for estimating the population means. Aust. J. Stat. 5(1), 20–33 (1963)
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
Singh, R.: Approximately optimum stratification on the auxiliary variable. J. Am. Stat. Assoc. 66(336), 829–833 (1971)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Kumar, A., Raj, B. (2018). Classifier Risk Estimation Under Limited Labeling Resources. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10937. Springer, Cham. https://doi.org/10.1007/978-3-319-93034-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-93034-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93033-6
Online ISBN: 978-3-319-93034-3
eBook Packages: Computer ScienceComputer Science (R0)