Active learning in multiple-class classification problems via individualized binary models
Introduction
It is common for the learning related to a classification rule to take place under a training/testing framework using a given labeled dataset. Thus, having a sufficient amount of labeled data is essential to constructing a reliable classification rule. When the amount of labeled data is small and there is a considerable amount of unlabeled data, how to enlarge the training set to improve the classification rule is an important question. If examining the labels of those unlabeled data is costly and time-consuming, then how to first recruit those “crucial” data – which may largely change the classification rule – into training samples to accelerate the training process and reduce labeling costs becomes an important issue. In the machine learning literature, the term “active learning methods” refers to learning with aggressive subject selection strategies; in the statistical literature, from a data recruiting aspect, they relate to sequential methods. When we recruit new training samples by analyzing current data, stochastic regression can be useful for examining such active learning processes. If we use the criteria adopted by statistical experimental design methods to assess the unlabeled data and recruit only the most “informative” subjects into the training set, we can then accelerate the learning process and further reduce the labeling cost. On the other hand, in the machine learning literature, researchers usually use binary classifiers as the building blocks by which to build classification rules for multiclass problems, and so they tend to focus more on computational strategies such as one-versus-one and one-versus-others (Aly, 2005).
Many active learning procedures – each of which targets a different goal – provide information about specific applications and technical considerations (Lewis and Gale, 1994, Osugi et al., 2005, Lughofer, 2012, Rubens et al., 2016, Cohn et al., 1996, Zhu et al., 2003, Bouneffouf, 2016); Settles (2010) presents a general review of this topic. In particular, Deng et al. (2009) applied an active learning method for a money laundering application while using a bank customer dataset, while Wang et al. (2019) considered variable and data selection strategies to accelerate the learning process as well as identify effective variables from a computational perspective. However, there is a scarcity of research that applies active learning to multiclass classification problems. Here, we study multiclass classification problems under an active learning scenario, using a unified computational approach for both categorical and ordinal labeling situations. We adaptively and sequentially recruit new unlabeled data for a data pool, using an experimental design criterion as an information assessment tool; we additionally adopt a stopping criterion using sequential estimation methods to ensure the satisfactory performance of the final classification rule.
In the remainder of this paper, we first state our approach for both categorical and ordinal data in the adaptive design situation and then describe our active learning procedure. Section 3 presents the numerical results from using both simulated data and real data examples, followed by a summary. The proofs of the theorems and other technical details are in the Appendix.
Section snippets
Methods
We consider multiclass classification problems in either original or categorical labeling situations and assume in both cases that a sample is assigned to only one class. From a statistical perspective, ordinal labeling refers to the situation in which there is an ordinal relation among classes; this is common in applications (e.g., the label indicates the severity of a disease or product preference). Categorical labeled data contain no such ordinal relations among classes. Thus, we typically
Synthesized data
For both categorical and ordinal cases, we conduct three three-class classification numerical studies for each case. Following the notations defined in (3), we have for a three-class situation, and . In the categorical case, we set the true intercept coefficient to equal 0. Hence, in theory, the sizes of all classes are balanced. When , we set and . In the following, we call it “Categorical Scenario 1” (CS.1). When , we consider two
Conclusion
There are many ways to construct multiclass classification rules. In the statistical literature, the characteristics of categorical and ordinal labels in multiclass classification problems are usually treated differently, whereas in the machine learning literature, the difference between these two types of labels is usually ignored. Indeed, machine learning studies are more concerned with computational strategies than with modeling strategies, such as the required number of binary classifiers
Acknowledgments
The authors are grateful to the Editor-in-Chief, Associate Editor, and anonymous referees for comments and suggestions that led to substantial improvements in this paper. This work is partially supported by funds from Ministry of Science and Technology, Taiwan, ROC (106-2118-M-001-007-MY2), and by the co-corresponding author Dr. Wang’s funds from National Natural Science Foundation of China (11971457) and Anhui Provincial Natural Science Foundation, China (1908085MA06).
References (27)
Hybrid active learning for reducing the annotation effort of operators in classification systems
Pattern Recognit.
(2012)Analysis of Ordinal Categorical Data, Vol. 656
(2010)Survey on Multiclass Classification Methods
(2005)Large-sample theory of sequential estimation
- et al.
Calculation of polychotomous logistic regression parameters using individualized regressions
Biometrika
(1984) Exponentiated gradient exploration for active learning
Computers
(2016)Sequential confidence regions of generalized linear models with adaptive designs
J. Statist. Plann. Inference
(2001)- et al.
On the asymptotic theory of fixed-width sequential confidence intervals for the mean
Ann. Math. Stat.
(1965) - et al.
Probability Theory: Independence, Interchangeability, Martingales
(1988) - et al.
Active learning with statistical models
J. Artif. Intell. Res.
(1996)
Active learning through sequential design, with applications to detection of money laundering
J. Amer. Statist. Assoc.
Asymptotic normality for sums of dependent random variables
Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems
Ann. Statist.
Cited by (5)
Distributed sequential federated learning
2023, arXivActive ordinal classification by querying relative information
2023, Intelligent Data AnalysisActive learning for ordinal classification based on expected cost minimization
2022, Scientific ReportsPTEN: An Emerging Potential Target for Therapeutic Intervention in Respiratory Diseases
2022, Oxidative Medicine and Cellular Longevity