Elsevier

Expert Systems with Applications

Volume 42, Issue 24, 30 December 2015, Pages 9468-9481
Expert Systems with Applications

Addressing imbalanced data with argument based rule learning

https://doi.org/10.1016/j.eswa.2015.07.076Get rights and content

Highlights

  • We improve learning rules from imbalanced data by using the expert knowledge.

  • An expert explains the decision for some critical examples, giving arguments.

  • Three methods of identifying critical examples are proposed and compared.

  • Induced rules reflect the expert knowledge and better classify the minority examples.

  • Trade-off between the recognition of the minority and majority classes is maintained.

Abstract

In this paper we focus on improving rule based classifiers learned from class imbalanced data by incorporating expert knowledge into the learning process. Applying expert knowledge should overcome limitations of standard methods for imbalanced data when minority classes contain many rare examples and outliers. It should also improve the minority class while maintaining better classification accuracy of the majority classes than the standard methods. Unlike existing proposals for integrating global expert knowledge into rule induction, the class imbalance requires considering local characteristics of class distributions. Therefore, we consider argument based learning, where a domain expert can annotate (explain) some of learning examples to describe reasons for assigning them to specific classes. Using local arguments should improve the interpretability of rules and their consistency with the domain knowledge, and should also result in a better recognition of the minority class. The main aim of our study is to show how argument based learning can be adapted to learn rules from imbalanced data. To achieve it, we introduce a new argument based rule induction algorithm ABMODLEM with a specialized classification strategy for imbalanced classes. Then, we propose new methods for identifying the examples which should be explained by an expert. They exploit the idea of active learning with the query by an ensemble. The proposed approach has been evaluated in an extensive computational experiment. Results show that argument based learning improves the minority class recognition, especially for difficult data distributions with rare examples and outliers. Moreover, ABMODLEM is compared against standard rule classifiers and their extensions with SMOTE preprocessing.

Introduction

In this paper we deal with using the expert knowledge in learning rules from imbalanced data, where one of the classes (further called a minority class) contains a much smaller number of examples than the other (majority) classes. Such a situation is often observed in real-world applications. For example, in medical problems the number of patients requiring special attention (e.g., therapy or treatment) is much smaller than the number of patients who do not need it. The same phenomenon has been observed in fraud detection (Whiting, Hansen, McDonald, Albrecht, & Albrecht, 2012), oil spills detection in satellite images, financial risk analysis, predicting technical equipment failures, managing network intrusion and information filtering (Aggarwal, 2015, Chawla, 2005, He, Garcia, 2009, He, Ma, 2013, Krawczyk, Wozniak, Schaefer, 2014, Weiss, 2004). In all those problems, the correct recognition of the minority class is of key importance, however, standard classifiers are biased toward the majority classes. As a result instances of the minority class tend to be misclassified.

Degradation of classification performance is not caused solely by a skewed class distribution. It has been observed that learning classifiers from such data becomes particularly difficult when other data characteristics occur together with imbalanced distribution of classes, such as decomposition of the minority class into many rare sub-concepts (Japkowicz, 2003), too extensive overlapping of decision classes (Garcia, Sanchez, & Mollineda, 2007) or presence of minority class examples inside the majority class regions (Napierala & Stefanowski, 2012b). In several studies (Kubat, Matwin, 1997, Lopez, Fernandez, Garcia, Palade, Herrera, 2013, Napierala, Stefanowski, Wilk, 2010), these difficulty factors have been associated with different types of examples from the minority class. In Napierala and Stefanowski (2012b) we have proposed the following taxonomy of examples from the minority class: safe (located in the homogeneous regions populated by the examples from one class only), borderline, rare cases and outliers. It has also been shown that rare cases and outliers are extremely difficult to learn.

In this paper we concentrate on learning decision rules, due to their better comprehensibility and readability compared to other representations. Let us observe that in domains like medicine, banking, sales or law, where class imbalance is often encountered in analyzed data, it is crucial for practitioners to understand the rationale for recommendations, as it establishes their confidence in the decision support system they use (Doyle, Cunningham, Walsh, 2006, Furnkranz, Gamberger, Lavrac, 2012, Zytkow, 2002).

Although rule learning algorithms have been successfully used in many applications, they are particularly sensitive to the class imbalance problem (Weiss, 2004). Existing techniques for improving rule-based classifiers on imbalanced data either employ pre-processing before learning classifiers or modify the inner bias of the learning algorithms, to concentrate more on the minority class – for a review see Napierala and Stefanowski (2012a). However, these techniques share two important drawbacks. First, they can usually improve the recognition of safe and borderline minority examples, but still have difficulties with rare and outlier examples, which are common in imbalanced data. Second, the improvement on the minority class is usually achieved at the expense of the deterioration on the majority classes. In this paper we propose a new approach which addresses both these problems.

Poor performance of rule classifiers on the rare and outlier examples may be caused by the existence of several possible rules that cover subconcepts represented by only a few minority examples. These rules may have diversified predictive abilities and their consistency with the domain knowledge may vary. Note that evaluation measures employed in automatic induction procedures may be ineffective while choosing the best candidates for rules, as they are estimated with too few learning examples. Moreover, they may fail to identify the best rule that is consistent with the user’s expectations and the expert’s knowledge related to the application domain.

In Weiss (2004), it has already been pointed out that it might be inevitable to additionally support the learning process with the expert knowledge to overcome the above limitations. This knowledge can aid in the search for rule candidates – for example, she may help to identify the features that are useful for predicting rare, but important, cases (Weiss, 2004). This should lead to the induction of rules which will be consistent not only with the learning examples, but also with the domain knowledge. As a result, recognition of the minority class should be improved. Similar opinions are also expressed by Aggarwal (2015) with respect to outliers.

Such idea is very compelling, yet surprisingly, almost no solutions which include the expert knowledge to support the learning of difficult cases in imbalanced datasets have been developed so far. Therefore, in this paper we propose a new approach using the expert knowledge for improving rule classifiers learned from imbalanced data. This approach should help induce better rules having good interpretation and resulting in better classification of the minority class (in particular for difficult examples). Moreover, careful use of additional knowledge should weaken the undesired tradeoff between the recognition of the minority and majority classes.

Incorporating the background knowledge into the induction of rules has already been considered in the literature (Kietz, Dzeroski, 1994, Klösgen, Zytkow, 2002, Michalski, Bratko, Kubat, 1998), but not in the context of class imbalance. Existing approaches usually assume that experts express their “global” knowledge, valid for the whole domain of application. For example, constraints given by an expert can concern the relation between the attributes, which has to be true for all the data. However, expressing such global knowledge is often difficult for humans or may even not be feasible. Furthermore, note that difficulty of imbalanced data is often associated with the local characteristics of the minority class distributions which cannot be easily modeled with global knowledge.

Recently, Mozina et al. have introduced a new paradigm called argument based machine learning which enables expressing the domain expert knowledge in a more natural and local way (Mozina, Bratko, 2004, Mozina, Bratko, Zabkar, 2007). A key concept of their approach is to let the expert annotate some of the learning examples. An expert can specify so-called arguments that are reasons for assigning the example to the given class. This approach uses a “local” expert knowledge, which applies to specific situations and is valid for limited, chosen examples rather than for the whole domain.

Although such local argumentation has not been proposed in the context of class imbalance, we claim that it is very well suited for handling imbalanced data in rule based classifiers where the interpretability is important. First of all, argumentation can be naturally incorporated into rule induction, as both induced hypotheses and arguments are given in the same representation and the influence of expert’s arguments is explicitly visible in induced rules (Mozina et al., 2007). Moreover, this kind of explanation is natural for inherently imbalanced domains as justifying cases in law, discussing reasons behind specific decisions in finance or medicine. What is more, the idea of explaining the decisions for the selected examples is well suited for handling the minority examples, as the expert can help identify the features that are useful for predicting the most difficult minority cases that consider “local”characteristics of data. The approach that incorporates argumentation into rule induction will be further called ABRL (argument based rule learning).

In order to present the usefulness of ABRL for dealing with imbalanced data, we introduce a generalization of MODLEM rule induction algorithm (Stefanowski, 1998), called ABMODLEM. Although this adaptation is partly inspired by handling arguments in Mozina’s proposal of ABCN2 (Mozina & Bratko, 2004), several novel elements have to be introduced, such as a new measure for evaluating candidate elementary conditions to be added to a rule, and a new strategy for classifying unseen instances, which increases the importance of rules induced from argumented examples.

Last but not least, identification of examples to be argumented by the expert is a crucial issue for the effective and practical usefulness of this approach. We should focus the attention on the “problematic”, difficult examples that are likely to improve learning. At the same time, we should limit the number of selected examples and the effort and time required from the expert to provide her arguments. To achieve this, we follow inspiration from active learning techniques. Note that active learning has not been applied to such task yet. Limited attempts concerned using active learning in selective preprocessing only (Attenberg & Ertkin, 2013).

We propose three new methods for the identification of examples to be argumented. These methods satisfy the above requirements, are suitable for dealing with imbalanced data and try to handle the undesired trade-off between improvement of the minority class recognition and deterioration in the majority class. Their introduction increases the practical applicability of the ABRL approach, compared to the original formulation (Mozina, Bratko, 2004, Mozina, Bratko, Zabkar, 2007).

To summarize, the contribution of the work will be threefold:

  • Introducing a new approach to imbalanced data and demonstrating how argument based learning could be adapted to improve rule classifiers for such data

  • Proposing new, more practical methods for the identification of examples to be argumented, appropriate for dealing with imbalanced classes

  • Verifying experimentally the usefulness of ABMODLEM and identification methods for improving rule learning from imbalanced datasets and analyzing the trade-off between majority and minority class recognition. Furthermore, comparing the argument based rule classifier against other rule classifiers.

The paper is organized as follows. Section 2 briefly reviews related research on learning rules from imbalanced data and incorporating expert knowledge. Section 3 describes basic concepts of argument based machine learning. Section 4 presents our ABMODLEM algorithm adapted to argument based learning. Section 5 introduces strategies for identification of difficult examples to be argumented. Section 6 presents the experimental study carried on four imbalanced datasets and discusses the obtained results. Finally, Section 7 concludes with major contributions of this work.

Section snippets

Related works

Methods addressing the class imbalance problem can be categorized into two groups: data-level and algorithmic-level methods (He, Garcia, 2009, Nanni, Fantozzi, Lazzarini, 2015, Stefanowski, 2013). The former, not restricted to rule classifiers, uses preprocessing methods that modify the original imbalanced distribution of classes into the distribution that is more suitable for learning classifiers, e.g. by removing some examples from the majority class, or by introducing additional minority

Argument based rule learning

We briefly present the concepts of ABRL that are necessary for introducing our proposal. We follow the most related works of Mozina et al. and their notations (Mozina et al., 2007), presented in a most complete form in Mozina (2009). It is assumed that some of the learning examples are enhanced by partial justifications given in a form of arguments. Each argument is attached to a single learning example only, while one example can have several arguments. There are two types of arguments;

ABMODLEM rule induction algorithm

In our previous work, (Napierala & Stefanowski, 2010), we wanted to verify whether the ABRL framework could be successfully used with other learning algorithm than CN2. We decided to incorporate the ABRL paradigm inside MODLEM algorithm, originally introduced by Stefanowski (1998). Similarly to CN2, it also follows a sequential covering schema and generates a minimal set of rules. Its specific property concerns direct processing of numerical values of attributes (without pre-discretization),

Identification of examples for argumentation

Selecting appropriate examples to be argumented by an expert is a crucial issue for the success of argument based rule induction. “Easy” examples, which represent a part of the concept definition supported by many learning examples, will probably be seeds for strong rules correctly built by an induction algorithm itself, and thus do not need to be argumented. One should rather select difficult examples corresponding to more difficult regions of the concept, such as regions under-represented in

Goals and evaluation measures

In the experimental study we want to evaluate the effect of argumentation on the recognition of classes. We focus our attention on the minority class; however we are also interested in maintaining sufficiently high recognition of the majority class. First, we evaluate the influence of ABMODLEM components – the new evaluation measure and the classification strategy. We also compare ABMODLEM with its basic, non-argumented origin MODLEM and with some standard rule classifiers.

The main aim of the

Conclusions

In this paper we have studied adapting argument based rule learning to the class imbalance problem. Such learning perspective has not been considered before. Comparing to previous research on incorporating “global” background knowledge into the learning process, this paradigm allows the domain experts to express as arguments their “local” knowledge about reasons for making classification decisions for some of the learning examples. This kind of local explanation is, in our opinion, particularly

Acknowledgments

The authors’ research is funded by the Polish National Science Center Project no. DEC-2013/11/B/ST6/00963.

References (63)

  • BreimanL.

    Bagging predictors

    Machine Learning

    (1996)
  • ChawlaN.

    Data mining for imbalanced datasets: An overview

  • ClarkP. et al.

    The CN2 induction algorithm

    Machine Learning

    (1989)
  • DemsarJ.

    Statistical comparisons of classifiers over multiple data sets

    Journal of Machine Learning Research

    (2006)
  • DemsarJ. et al.

    Why is rule learning optimistic and how to correct it

    Proceedings of the 17th European conference on machine learning (ECML 2006)

    (2006)
  • DoyleD. et al.

    An evaluation of the usefulness of explanation in a case-based reasoning system for decision support in bronchiolitis treatment

    Computational Intelligence

    (2006)
  • FlachP.

    Invited tutorial on rule induction

    (2001)
  • FurnkranzJ.

    Separate-and-conquer rule learning

    Artificial Intelligence Review

    (1999)
  • FurnkranzJ. et al.

    Foundations of rule learning

    (2012)
  • GarciaV. et al.

    An empirical study of the behaviour of classifiers on imbalanced and overlapped datasets

    Proceedings of Iberoamerican congress on pattern recognition, CIARP’07

    (2007)
  • Grzymala-BusseJ.

    LERS – a system for learning from examples based on rough sets

  • Grzymala-BusseJ.

    Managing uncertainty in machine learning from examples

    Proceedings of the 3rd international symposium in intelligent systems

    (1994)
  • Grzymala-BusseJ. et al.

    An approach to imbalanced data sets based on changing rule strength

    Proceedings of the AAAI workshop at the 17th conference on artificial intelligence learning from imbalanced data sets

    (2000)
  • Grzymala-BusseJ. et al.

    A comparison of two approaches to data mining from imbalanced data

    Proceedings of the 8th international conference on knowledge-based intelligent information & engineering systems, KES 2004

    (2004)
  • HeH. et al.

    Learning from imbalanced data

    IEEE Transactions on Data and Knowledge Engineering

    (2009)
  • HeH. et al.

    Imbalanced learning: foundations, algorithms, and applications

    (2013)
  • JapkowiczN.

    Class imbalance: are we focusing on the right issue?

    Proceedings of 2nd workshop on learning from imbalanced data sets (ICML)

    (2003)
  • JoT. et al.

    Class imbalances versus small disjuncts

    ACM SIGKDD Explorations Newsletter

    (2004)
  • KietzJ. et al.

    Inductive logic programming and learnability

    SIGART Bulletin

    (1994)
  • KrawczykB. et al.

    Cost-sensitive decision tree ensembles for effective imbalanced classification

    Applied Soft Computing

    (2014)
  • Cited by (30)

    • Integrating human knowledge into artificial intelligence for complex and ill-structured problems: Informed artificial intelligence

      2022, International Journal of Information Management
      Citation Excerpt :

      For instance, Coussement et al. (2015) propose a technique combining big data with expert opinions only for the Bayesian-based methods, while D'Haen et al. (2016) integrate expert knowledge into large datasets obtained through web scraping and uses the combined data in a clustering algorithm to determine the prospective customers likely to purchase products. Napierała and Stefanowski (2015) incorporate expert knowledge for data labeling in the context of imbalanced datasets, which refers to one category of the label (i.e., majority) dominating the other (i.e., minority). Their approach creates human-defined rules that can be used to classify the instances in the minority class.

    • Argumentation based reinforcement learning for meta-knowledge extraction

      2020, Information Sciences
      Citation Excerpt :

      Amgoud and Serrurier [3,4] focus on the classification problem and propose a general formal argumentation-based model that constructs arguments for/against each possible classification of an example. By exploiting the idea of active learning, Napierala and Stefanowski [34] introduce an argument based rule induction algorithm with a specialized classification strategy for imbalanced classification tasks. (3) Building expert systems.

    • Integration of feature vector selection and support vector machine for classification of imbalanced data

      2019, Applied Soft Computing Journal
      Citation Excerpt :

      The experiment process in shown in Fig. 3. Statistical tests, e.g. t-test, Wilcoxon signed rank test, Friedman test, are efficient and trustable ways for performance comparison of several machine learning approaches on a number of datasets [42–47]. As the Friedman test and Bonferroni–Dunn test consider only the ranks of the different methods and not the difference in accuracy of the different methods for each dataset, the Wilcoxon signed rank test, which considers both, [50] is also considered in this work for the pairwise comparison of the benchmark methods.

    • An overlap-sensitive margin classifier for imbalanced and overlapping data

      2018, Expert Systems with Applications
      Citation Excerpt :

      For example, cost-sensitive learning methods assign different costs to each class. Various studies have investigated different types of classifiers and cost-adaptation strategies (Cao, Zhao, & Zaiane, 2013; Elkan, 2001; Kukar & Kononenko, 1998; Napierala & Stefanowski, 2015; Sahin, Bulkan, & Duman, 2013; Turney, 1995; Wang, Gao, Shi, & Wang, 2017). These studies have demonstrated that cost-sensitive learning methods are useful in handling class imbalance problems (Sun, Kamel, Wong, & Wang, 2007; Wang et al., 2017; Zhou & Liu, 2006).

    • Refinement and selection heuristics in subgroup discovery and classification rule learning

      2017, Expert Systems with Applications
      Citation Excerpt :

      Despite its long history, rule learning is still actively researched and routinely applied in practice. For example, Napierala and Stefanowski (2015) use rule learning with argumentation to tackle imbalanced data sets, and Ruz (2016) explores the order of instances in seeding rules to improve the classification accuracy. Minnaert, Martens, De Backer, and Baesens (2015) discuss the importance of proper rule evaluation measures for improving the accuracy of classification rule learning algorithms.

    View all citing articles on Scopus
    View full text