Abstract
To construct a classification rule via an active learning method, during the learning process, users select training subjects sequentially, without knowing their labels, based on the model learned at the current stage. For a parametric-model-based classification rule, methods of statistical experimental design are popular guidelines for selecting new learning subjects. However, there is a lack of a counterpart for non-parametric-model-based classifiers, such as support vector machines. Thus, we propose a subject selection scheme via an extended influential index for the area under a receiver operating characteristic curve, which is applicable to general classifiers with continuous scores.


Similar content being viewed by others
References
Agresti, A. (2018). An introduction to categorical data analysis. New York: Wiley.
Antal, B., & Hajdu, A. (2014). An ensemble-based system for automatic screening of diabetic retinopathy. Knowledge-Based Systems, 60, 20–27.
Chang, Y.-C.I., & Chen, R.-B. (2019). Active learning with simultaneous subject and variable selections. Neurocomputing, 329, 495–505.
Chen, Z., Wang, Z., & Chang, Y.-C.I. (2020). Sequential adaptive variables and subject selection for gee methods. Biometrics, 76(2), 496–507.
Cook, R.D. (1986). Assessment of local influence. Journal of the Royal Statistical Society, Series B, 48(2), 133–169.
Deng, X., Joseph, V.R., Sudjianto, A., & Wu, C.J. (2009). Active learning through sequential design, with applications to detection of money laundering. Journal of the American Statistical Association, 104(487), 969–981.
Dua, D., & Graff, C. (2017). UCI machine learning repository.
Hampel, F.R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69(346), 383–393.
Owen, A.B. (2001). Empirical likelihood. CRC Press.
Pepe, M. (2003). The statistical evaluation of medical tests for classification and prediction. Oxford University Press.
Pepe, M.S., & Cai, T. (2004). The analysis of placement values for evaluating discriminatory measures. Biometrics, 60(1), 528–535.
Schein, A.I., & Ungar, L.H. (2007). Active learning for logistic regression: an evaluation. Machine Learning, 68(3), 235–265.
Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2(Nov), 45–66.
Wang, J., & Park, E. (2017). Active learning for penalized logistic regression via sequential experimental design. Neurocomputing, 222, 183–190.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ke, BS., Chang, Yc.I. A Model-Free Subject Selection Method for Active Learning Classification Procedures. J Classif 38, 544–555 (2021). https://doi.org/10.1007/s00357-021-09388-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-021-09388-3