Addressing imbalanced data with argument based rule learning
Introduction
In this paper we deal with using the expert knowledge in learning rules from imbalanced data, where one of the classes (further called a minority class) contains a much smaller number of examples than the other (majority) classes. Such a situation is often observed in real-world applications. For example, in medical problems the number of patients requiring special attention (e.g., therapy or treatment) is much smaller than the number of patients who do not need it. The same phenomenon has been observed in fraud detection (Whiting, Hansen, McDonald, Albrecht, & Albrecht, 2012), oil spills detection in satellite images, financial risk analysis, predicting technical equipment failures, managing network intrusion and information filtering (Aggarwal, 2015, Chawla, 2005, He, Garcia, 2009, He, Ma, 2013, Krawczyk, Wozniak, Schaefer, 2014, Weiss, 2004). In all those problems, the correct recognition of the minority class is of key importance, however, standard classifiers are biased toward the majority classes. As a result instances of the minority class tend to be misclassified.
Degradation of classification performance is not caused solely by a skewed class distribution. It has been observed that learning classifiers from such data becomes particularly difficult when other data characteristics occur together with imbalanced distribution of classes, such as decomposition of the minority class into many rare sub-concepts (Japkowicz, 2003), too extensive overlapping of decision classes (Garcia, Sanchez, & Mollineda, 2007) or presence of minority class examples inside the majority class regions (Napierala & Stefanowski, 2012b). In several studies (Kubat, Matwin, 1997, Lopez, Fernandez, Garcia, Palade, Herrera, 2013, Napierala, Stefanowski, Wilk, 2010), these difficulty factors have been associated with different types of examples from the minority class. In Napierala and Stefanowski (2012b) we have proposed the following taxonomy of examples from the minority class: safe (located in the homogeneous regions populated by the examples from one class only), borderline, rare cases and outliers. It has also been shown that rare cases and outliers are extremely difficult to learn.
In this paper we concentrate on learning decision rules, due to their better comprehensibility and readability compared to other representations. Let us observe that in domains like medicine, banking, sales or law, where class imbalance is often encountered in analyzed data, it is crucial for practitioners to understand the rationale for recommendations, as it establishes their confidence in the decision support system they use (Doyle, Cunningham, Walsh, 2006, Furnkranz, Gamberger, Lavrac, 2012, Zytkow, 2002).
Although rule learning algorithms have been successfully used in many applications, they are particularly sensitive to the class imbalance problem (Weiss, 2004). Existing techniques for improving rule-based classifiers on imbalanced data either employ pre-processing before learning classifiers or modify the inner bias of the learning algorithms, to concentrate more on the minority class – for a review see Napierala and Stefanowski (2012a). However, these techniques share two important drawbacks. First, they can usually improve the recognition of safe and borderline minority examples, but still have difficulties with rare and outlier examples, which are common in imbalanced data. Second, the improvement on the minority class is usually achieved at the expense of the deterioration on the majority classes. In this paper we propose a new approach which addresses both these problems.
Poor performance of rule classifiers on the rare and outlier examples may be caused by the existence of several possible rules that cover subconcepts represented by only a few minority examples. These rules may have diversified predictive abilities and their consistency with the domain knowledge may vary. Note that evaluation measures employed in automatic induction procedures may be ineffective while choosing the best candidates for rules, as they are estimated with too few learning examples. Moreover, they may fail to identify the best rule that is consistent with the user’s expectations and the expert’s knowledge related to the application domain.
In Weiss (2004), it has already been pointed out that it might be inevitable to additionally support the learning process with the expert knowledge to overcome the above limitations. This knowledge can aid in the search for rule candidates – for example, she may help to identify the features that are useful for predicting rare, but important, cases (Weiss, 2004). This should lead to the induction of rules which will be consistent not only with the learning examples, but also with the domain knowledge. As a result, recognition of the minority class should be improved. Similar opinions are also expressed by Aggarwal (2015) with respect to outliers.
Such idea is very compelling, yet surprisingly, almost no solutions which include the expert knowledge to support the learning of difficult cases in imbalanced datasets have been developed so far. Therefore, in this paper we propose a new approach using the expert knowledge for improving rule classifiers learned from imbalanced data. This approach should help induce better rules having good interpretation and resulting in better classification of the minority class (in particular for difficult examples). Moreover, careful use of additional knowledge should weaken the undesired tradeoff between the recognition of the minority and majority classes.
Incorporating the background knowledge into the induction of rules has already been considered in the literature (Kietz, Dzeroski, 1994, Klösgen, Zytkow, 2002, Michalski, Bratko, Kubat, 1998), but not in the context of class imbalance. Existing approaches usually assume that experts express their “global” knowledge, valid for the whole domain of application. For example, constraints given by an expert can concern the relation between the attributes, which has to be true for all the data. However, expressing such global knowledge is often difficult for humans or may even not be feasible. Furthermore, note that difficulty of imbalanced data is often associated with the local characteristics of the minority class distributions which cannot be easily modeled with global knowledge.
Recently, Mozina et al. have introduced a new paradigm called argument based machine learning which enables expressing the domain expert knowledge in a more natural and local way (Mozina, Bratko, 2004, Mozina, Bratko, Zabkar, 2007). A key concept of their approach is to let the expert annotate some of the learning examples. An expert can specify so-called arguments that are reasons for assigning the example to the given class. This approach uses a “local” expert knowledge, which applies to specific situations and is valid for limited, chosen examples rather than for the whole domain.
Although such local argumentation has not been proposed in the context of class imbalance, we claim that it is very well suited for handling imbalanced data in rule based classifiers where the interpretability is important. First of all, argumentation can be naturally incorporated into rule induction, as both induced hypotheses and arguments are given in the same representation and the influence of expert’s arguments is explicitly visible in induced rules (Mozina et al., 2007). Moreover, this kind of explanation is natural for inherently imbalanced domains as justifying cases in law, discussing reasons behind specific decisions in finance or medicine. What is more, the idea of explaining the decisions for the selected examples is well suited for handling the minority examples, as the expert can help identify the features that are useful for predicting the most difficult minority cases that consider “local”characteristics of data. The approach that incorporates argumentation into rule induction will be further called ABRL (argument based rule learning).
In order to present the usefulness of ABRL for dealing with imbalanced data, we introduce a generalization of MODLEM rule induction algorithm (Stefanowski, 1998), called ABMODLEM. Although this adaptation is partly inspired by handling arguments in Mozina’s proposal of ABCN2 (Mozina & Bratko, 2004), several novel elements have to be introduced, such as a new measure for evaluating candidate elementary conditions to be added to a rule, and a new strategy for classifying unseen instances, which increases the importance of rules induced from argumented examples.
Last but not least, identification of examples to be argumented by the expert is a crucial issue for the effective and practical usefulness of this approach. We should focus the attention on the “problematic”, difficult examples that are likely to improve learning. At the same time, we should limit the number of selected examples and the effort and time required from the expert to provide her arguments. To achieve this, we follow inspiration from active learning techniques. Note that active learning has not been applied to such task yet. Limited attempts concerned using active learning in selective preprocessing only (Attenberg & Ertkin, 2013).
We propose three new methods for the identification of examples to be argumented. These methods satisfy the above requirements, are suitable for dealing with imbalanced data and try to handle the undesired trade-off between improvement of the minority class recognition and deterioration in the majority class. Their introduction increases the practical applicability of the ABRL approach, compared to the original formulation (Mozina, Bratko, 2004, Mozina, Bratko, Zabkar, 2007).
To summarize, the contribution of the work will be threefold:
- •
Introducing a new approach to imbalanced data and demonstrating how argument based learning could be adapted to improve rule classifiers for such data
- •
Proposing new, more practical methods for the identification of examples to be argumented, appropriate for dealing with imbalanced classes
- •
Verifying experimentally the usefulness of ABMODLEM and identification methods for improving rule learning from imbalanced datasets and analyzing the trade-off between majority and minority class recognition. Furthermore, comparing the argument based rule classifier against other rule classifiers.
The paper is organized as follows. Section 2 briefly reviews related research on learning rules from imbalanced data and incorporating expert knowledge. Section 3 describes basic concepts of argument based machine learning. Section 4 presents our ABMODLEM algorithm adapted to argument based learning. Section 5 introduces strategies for identification of difficult examples to be argumented. Section 6 presents the experimental study carried on four imbalanced datasets and discusses the obtained results. Finally, Section 7 concludes with major contributions of this work.
Section snippets
Related works
Methods addressing the class imbalance problem can be categorized into two groups: data-level and algorithmic-level methods (He, Garcia, 2009, Nanni, Fantozzi, Lazzarini, 2015, Stefanowski, 2013). The former, not restricted to rule classifiers, uses preprocessing methods that modify the original imbalanced distribution of classes into the distribution that is more suitable for learning classifiers, e.g. by removing some examples from the majority class, or by introducing additional minority
Argument based rule learning
We briefly present the concepts of ABRL that are necessary for introducing our proposal. We follow the most related works of Mozina et al. and their notations (Mozina et al., 2007), presented in a most complete form in Mozina (2009). It is assumed that some of the learning examples are enhanced by partial justifications given in a form of arguments. Each argument is attached to a single learning example only, while one example can have several arguments. There are two types of arguments;
ABMODLEM rule induction algorithm
In our previous work, (Napierala & Stefanowski, 2010), we wanted to verify whether the ABRL framework could be successfully used with other learning algorithm than CN2. We decided to incorporate the ABRL paradigm inside MODLEM algorithm, originally introduced by Stefanowski (1998). Similarly to CN2, it also follows a sequential covering schema and generates a minimal set of rules. Its specific property concerns direct processing of numerical values of attributes (without pre-discretization),
Identification of examples for argumentation
Selecting appropriate examples to be argumented by an expert is a crucial issue for the success of argument based rule induction. “Easy” examples, which represent a part of the concept definition supported by many learning examples, will probably be seeds for strong rules correctly built by an induction algorithm itself, and thus do not need to be argumented. One should rather select difficult examples corresponding to more difficult regions of the concept, such as regions under-represented in
Goals and evaluation measures
In the experimental study we want to evaluate the effect of argumentation on the recognition of classes. We focus our attention on the minority class; however we are also interested in maintaining sufficiently high recognition of the majority class. First, we evaluate the influence of ABMODLEM components – the new evaluation measure and the classification strategy. We also compare ABMODLEM with its basic, non-argumented origin MODLEM and with some standard rule classifiers.
The main aim of the
Conclusions
In this paper we have studied adapting argument based rule learning to the class imbalance problem. Such learning perspective has not been considered before. Comparing to previous research on incorporating “global” background knowledge into the learning process, this paradigm allows the domain experts to express as arguments their “local” knowledge about reasons for making classification decisions for some of the learning examples. This kind of local explanation is, in our opinion, particularly
Acknowledgments
The authors’ research is funded by the Polish National Science Center Project no. DEC-2013/11/B/ST6/00963.
References (63)
Learning classification rules from data
Computers and Mathematics with Applications
(2003)- et al.
An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics
Information Sciences
(2013) - et al.
Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. open problems on intrinsic data characteristics
Expert Systems with Applications
(2012) - et al.
Addressing imbalanced classification with instance generation techniques: Ipade-id
Neurocomputing
(2014) - et al.
Argument based machine learning
Aritificial Intelligence Journal
(2007) - et al.
Coupling different methods for overcoming the class imbalance problem
Neurocomputing
(2015) - et al.
Smote-ipf: addressing the noisy and borderline examples problem in classification with imbalanced datasets via a class noise filtering method-based re-sampling technique
Information Sciences
(2015) - et al.
Query learning strategies using boosting and bagging
Proceedings of the 15th international conference on machine learning
(2004) Rare class learning
- et al.
Class imbalance and active learning
Bagging predictors
Machine Learning
Data mining for imbalanced datasets: An overview
The CN2 induction algorithm
Machine Learning
Statistical comparisons of classifiers over multiple data sets
Journal of Machine Learning Research
Why is rule learning optimistic and how to correct it
Proceedings of the 17th European conference on machine learning (ECML 2006)
An evaluation of the usefulness of explanation in a case-based reasoning system for decision support in bronchiolitis treatment
Computational Intelligence
Invited tutorial on rule induction
Separate-and-conquer rule learning
Artificial Intelligence Review
Foundations of rule learning
An empirical study of the behaviour of classifiers on imbalanced and overlapped datasets
Proceedings of Iberoamerican congress on pattern recognition, CIARP’07
LERS – a system for learning from examples based on rough sets
Managing uncertainty in machine learning from examples
Proceedings of the 3rd international symposium in intelligent systems
An approach to imbalanced data sets based on changing rule strength
Proceedings of the AAAI workshop at the 17th conference on artificial intelligence learning from imbalanced data sets
A comparison of two approaches to data mining from imbalanced data
Proceedings of the 8th international conference on knowledge-based intelligent information & engineering systems, KES 2004
Learning from imbalanced data
IEEE Transactions on Data and Knowledge Engineering
Imbalanced learning: foundations, algorithms, and applications
Class imbalance: are we focusing on the right issue?
Proceedings of 2nd workshop on learning from imbalanced data sets (ICML)
Class imbalances versus small disjuncts
ACM SIGKDD Explorations Newsletter
Inductive logic programming and learnability
SIGART Bulletin
Cost-sensitive decision tree ensembles for effective imbalanced classification
Applied Soft Computing
Cited by (30)
Integrating human knowledge into artificial intelligence for complex and ill-structured problems: Informed artificial intelligence
2022, International Journal of Information ManagementCitation Excerpt :For instance, Coussement et al. (2015) propose a technique combining big data with expert opinions only for the Bayesian-based methods, while D'Haen et al. (2016) integrate expert knowledge into large datasets obtained through web scraping and uses the combined data in a clustering algorithm to determine the prospective customers likely to purchase products. Napierała and Stefanowski (2015) incorporate expert knowledge for data labeling in the context of imbalanced datasets, which refers to one category of the label (i.e., majority) dominating the other (i.e., minority). Their approach creates human-defined rules that can be used to classify the instances in the minority class.
A multi-objective instance-based decision support system for investment recommendation in peer-to-peer lending
2020, Expert Systems with ApplicationsArgumentation based reinforcement learning for meta-knowledge extraction
2020, Information SciencesCitation Excerpt :Amgoud and Serrurier [3,4] focus on the classification problem and propose a general formal argumentation-based model that constructs arguments for/against each possible classification of an example. By exploiting the idea of active learning, Napierala and Stefanowski [34] introduce an argument based rule induction algorithm with a specialized classification strategy for imbalanced classification tasks. (3) Building expert systems.
Integration of feature vector selection and support vector machine for classification of imbalanced data
2019, Applied Soft Computing JournalCitation Excerpt :The experiment process in shown in Fig. 3. Statistical tests, e.g. t-test, Wilcoxon signed rank test, Friedman test, are efficient and trustable ways for performance comparison of several machine learning approaches on a number of datasets [42–47]. As the Friedman test and Bonferroni–Dunn test consider only the ranks of the different methods and not the difference in accuracy of the different methods for each dataset, the Wilcoxon signed rank test, which considers both, [50] is also considered in this work for the pairwise comparison of the benchmark methods.
An overlap-sensitive margin classifier for imbalanced and overlapping data
2018, Expert Systems with ApplicationsCitation Excerpt :For example, cost-sensitive learning methods assign different costs to each class. Various studies have investigated different types of classifiers and cost-adaptation strategies (Cao, Zhao, & Zaiane, 2013; Elkan, 2001; Kukar & Kononenko, 1998; Napierala & Stefanowski, 2015; Sahin, Bulkan, & Duman, 2013; Turney, 1995; Wang, Gao, Shi, & Wang, 2017). These studies have demonstrated that cost-sensitive learning methods are useful in handling class imbalance problems (Sun, Kamel, Wong, & Wang, 2007; Wang et al., 2017; Zhou & Liu, 2006).
Refinement and selection heuristics in subgroup discovery and classification rule learning
2017, Expert Systems with ApplicationsCitation Excerpt :Despite its long history, rule learning is still actively researched and routinely applied in practice. For example, Napierala and Stefanowski (2015) use rule learning with argumentation to tackle imbalanced data sets, and Ruz (2016) explores the order of instances in seeding rules to improve the classification accuracy. Minnaert, Martens, De Backer, and Baesens (2015) discuss the importance of proper rule evaluation measures for improving the accuracy of classification rule learning algorithms.