Elsevier

Neurocomputing

Volume 130, 23 April 2014, Pages 53-62
Neurocomputing

Large margin principle in hyperrectangle learning

https://doi.org/10.1016/j.neucom.2013.02.042Get rights and content

Abstract

In this paper, we propose a new meta learning approach to incorporate the large margin principle into hyperrectangle based learning. The goal of Large Margin Rectangle Learning (LMRL) is to combine the natural interpretability of hyperrectangle models like decision trees and rule learners with the risk minimization property related to the large margin principle. Our approach consists of two basic steps: supervised clustering and decision boundary creation. In the first step, we apply a supervised clustering algorithm to generate an initial rectangle based generalization of the training data. Subsequently, these labeled clusters are used to produce a large margin hyperrectangle model. Besides the overall approach, we also developed Large Margin Supervised Clustering (LMSC), an attempt to introduce the large margin principle directly into the supervised clustering process. Corresponding experiments not only provided empirical evidence for the supposed margin-accuracy relation, but also showed that LMRL performs equally well or better than compared decision tree and rule learner. Altogether, this new learning approach is a promising way to create more accurate interpretable models.

Introduction

Besides pure classification accuracy, human interpretability becomes more and more important in many real world applications. For example, the US federal law “Equal Credit Opportunity Act” forces financial institutions to provide profound reasons in case of rejected credit applications [1]. As low scores or pure classification results are clearly no sufficient reasons, they have to use interpretable, for example, rule based models instead, which provide more detailed explanations. Besides legal requirements, interpretability often plays a major role for the user acceptance of machine learning models. For example, in the medical domain most doctors are not willing to blindly prescribe treatments solely based on the diagnosis results of black box models. Enabling users to extract the learned concepts for validation and knowledge acquisition is another crucial advantage of comprehensibility.

In general, interpretability is a very subjective concept. It is both, hard to formalize and to measure. Nevertheless, the hypothesis language is one of the main criteria. While, for example, decision rules are a very natural way of expressing knowledge, arbitrary hyperplanes in multidimensional feature spaces, polytopes or mixtures of Gaussian probability distributions are hard to grasp. In this paper, we are focusing on axis-parallel hyperrectangle models in numerical domains as they can be directly transformed to equivalent decision rules. Among others, decision trees are an example of this model type.

Despite their popularity in many application areas, interpretable models, like decision trees, are often discarded in favor of black box models due to weaker prediction accuracies. Especially support vector machines became very popular in the last two decades. One of their key features and a crucial performance factor is the large margin principle. More precisely, it has been proven that the margin of linear classifiers is indirectly proportional to its statistical risk. Nevertheless, the large margin principle is not limited to linear classifiers, but has also been proven relevant to other machine learning approaches like boosting and clustering.

In this paper, we are introducing a new way to combine the interpretability of hyperrectangle models and the beneficial effects of the large margin principle. As a foundation, we describe the basic ideas behind our approach and provide a formalized margin criterion for hyperrectangle models in Section 3. Subsequently, we present the new meta learning approach LMRL and describe the applied algorithms in Section 4. And finally, we provide experimental results in Section 5.

Section snippets

Supervised learning and the large margin principle

“The concept of Large Margins has recently been identified as a unifying principle for analyzing many different approaches to the problem of learning to classify data from examples, including Boosting [2], Mathematical Programming, Neural Networks and Support Vector Machines. The fact that it is the margin or confidence level of a classification (i.e., a scale parameter) rather than the raw training error that matters has become a key tool in recent years when dealing with classifiers” [3].

To

Hyperrectangle models as supervised clustering

In this paper, we especially consider supervised learning limited to hyperrectangles as hypothesis language. Based on a given training set, the goal is to infer corresponding classification models with both, high prediction accuracy and good interpretability.

Definition 3.1 Hyperrectangle Models

Classification models in numerical feature space, which are defined by a set of axis-parallel hyperrectangles {r1,,rm}. The corresponding hyperrectangle boundaries are represented by Rmax and Rmin. Thereby, Ri,dmax is the upper bound of

The basic approach

Large Margin Rectangle Learning (LMRL) is a new meta learning approach aiming to incorporate the large margin principle into the generation of hyperrectangle models. It significantly differs from existing divide-and-conquer and separate-and-conquer approaches, where decision boundaries are directly created based on the given training examples. In contrast, LMRL is a two step approach.

In the first step, called supervised clustering, we aim to find a suitable representation of the given example

Experimental settings

In our experiments, we separately studied LMSC and LearnRight and both in combination with LMRL. Moreover, we compared them to the decision tree learner C4.5, the rule learner RIPPER, a k-nearest neighbor classifier, a naive Bayes classifier and a SVM. Therefore, we primarily used 10 numerical and normalized benchmark data sets3

Related work

One of the first explicit hyperrectangle based learning approach has been proposed in [14] as the Nested Generalized Exemplar (NGE) framework. The authors directly generated hyperrectangles by iteratively aggregating examples from the given training set. Following the nearest neighbor principle, unseen examples have then been classified regarding their nearest rectangle. LearnRight [7], which has been used by us as a supervised clustering method, was an extension of this basic approach. Other

Conclusion

The purpose of our work has been to improve the accuracy of interpretable hyperrectangle models. Thereby, we identified the large margin principle, successfully applied in other machine learning areas, as a promising approach. Based on a new formal margin definition for hyperrectangle models, which naturally supports multiclass problems, we introduced a corresponding novel meta learning approach. Large Margin Rectangle Learning aims to optimize the global configuration margin by a two step

Matthias Kirmse received his B.S. and Diploma degrees in computer science from the Dresden University of Technology in 2007 and 2008 respectively. He is currently a Ph.D. candidate at the Artificial Intelligence Institute of Dresden University of Technology. His research interests include data mining, theoretical and applied machine learning as well as fault detection and diagnosis.

References (21)

  • J. Huysmans et al.

    Using rule extraction to improve the comprehensibility of predictive models

    Social Science Research Network

    (2006)
  • R. Schapire et al.

    Boosting the margina new explanation for the effectiveness of voting methods

    Annals of Statistics

    (1998)
  • A.J. Smola

    Advances in Large Margin Classifiers

    (2000)
  • V.N. Vapnik

    Statistical Learning Theory

    (1998)
  • J. Sinkkonen, S. Kaski, J. Nikkilae, Discriminative clustering: optimal contingency tables by learning metrics, in:...
  • N. Zeidat

    Supervised ClusteringAlgorithms and Applications

    (2005)
  • B.J. Gao, Hyper-rectangle-based discriminative data generalization and applications in data mining, Ph.D. Thesis, Simon...
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization, and Machine Learning

    (1989)
  • C.F. Eick, N. Zeidat, Z. Zhao, Supervised clustering – algorithms and benefits, in: 16th IEEE International Conference...
  • K. Bennett, J. Blue, Optimal Decision Trees, Rensselaer Polytechnic Institute Math Report,...
There are more references available in the full text version of this article.

Cited by (0)

Matthias Kirmse received his B.S. and Diploma degrees in computer science from the Dresden University of Technology in 2007 and 2008 respectively. He is currently a Ph.D. candidate at the Artificial Intelligence Institute of Dresden University of Technology. His research interests include data mining, theoretical and applied machine learning as well as fault detection and diagnosis.

Uwe Petersohn studied Computer Science at the Technical University of Dresden , where he received his Promotion in 1975 and Habilitation in 1981 about Discrete Optimization. Scientific Assistant in Computer Science Department (1974–1981). Member of a research staff and head of group of the Research Centre at Robotron Company, Dresden (1981–1986). Lecturer and Assistant Professor at TUD, Department of Computer Science (since 1986) and Head of the group Applied Knowledge Representation and Reasoning. Areas of interests are Knowledge Representation and Reasoning, Problem Solving, Reasoning with Uncertain Knowledge, Case based Reasoning, Complex Decisions, Discrete optimization, Hybrid knowledge models, Machine Learning and Design of applications.

View full text