Elsevier

Pattern Recognition Letters

Volume 31, Issue 2, 15 January 2010, Pages 125-132
Pattern Recognition Letters

Forests of nested dichotomies

https://doi.org/10.1016/j.patrec.2009.09.015Get rights and content

Abstract

Ensemble methods are often able to generate more accurate classifiers than the individual classifiers. In multiclass problems, it is possible to obtain an ensemble combining binary classifiers. It is sensible to use a multiclass method for constructing the binary classifiers, because the ensemble of binary classifiers can be more accurate than the individual multiclass classifier.

Ensemble of nested dichotomies (END) is a method for dealing with multiclass classification problems using binary classifiers. A nested dichotomy organizes the classes in a tree, each internal node has a binary classifier. A set of classes can be organized in different ways in a nested dichotomy. An END is formed by several nested dichotomies.

This paper studies the use of this method in conjunction with ensembles of decision trees (forests). Although forests methods are able to deal directly with several classes, their accuracies can be improved if they are used as base classifiers for ensembles of nested dichotomies. Moreover, the accuracies can be improved even more using forests of nested dichotomies, that is, ensemble methods that use as base classifiers a nested dichotomy of decision trees. The improvements over forests methods can be explained by the increased diversity of the base classifiers. The best overall results were obtained using MultiBoost with resampling.

Introduction

Some methods for constructing classifiers are inherently binary (e.g., support vector machines). Nevertheless, other methods were first devised for binary problems, although later were extended for the multiclass case (e.g., decision trees, logistic regression, some neural networks). Hence, several approaches have been proposed for using binary methods with multiclass problems. Interestingly, these methods for binarizing multiclass problems can be useful even for methods that are able to construct multiclass classifiers, because they can improve the accuracy of the classifiers (Anand et al., 1995, Fürnkranz, 2002, Frank and Kramer, 2004). Therefore, they can be considered as ensemble methods, because the obtained classifiers are formed by several classifiers.

There are two basic approaches for combining binary classifiers for multiclass problems. The first one constructs a classifier for each class (Anand et al., 1995, Rifkin and Klautau, 2004). Each classifier discriminates between one class and the others. This approach is called one vs. all and one vs. the rest. The second approach is to construct a classifier for each pair of classes, that discriminates between them (Hastie and Tibshirani, 1998, Fürnkranz, 2002, Quost et al., 2007). This approach is called one vs. one, pairwise and round robin classification.

There are more complex approaches. A method that combines the previous ones is presented in (García-Pedrajas and Ortiz-Boyer, 2006). In error-correcting output codes (ECOC) (Dietterich and Bakiri, 1995) each binary classifier discriminates between two non-empty, disjoint, subsets of the set of classes. The union of the two subsets is the set of all the classes. That is, for all the binary classifiers each original class has to be in one of the subsets. In (Allwein et al., 2000) a generalized approach is presented, the binary classifiers are trained to discriminate between two subsets of classes, but not all the classes have to appear in one of the subsets.

Ensemble of nested dichotomies (END) is a recent approach for this problem (Frank and Kramer, 2004). A nested dichotomy (ND) is a binary tree, each node has a set of classes associated. In the internal nodes, the classes are split using a binary classifier to the two children. END combines several nested dichotomies, where each tree is generated randomly. In this case the word “ensemble” is not used to indicate a family of methods, but a specific one. The ensemble method used in END is based only on the randomness of the base classifier (in this case nested dichotomies). From the same training set, different classifiers can be obtained because the base method has an intrinsic source or randomness.

Another approach for using nested dichotomies is presented in (Pujol et al., 2006, Escalera et al., 2007). In this case only one tree is constructed, but instead of generating it randomly, an optimization criterion is used. The tree is not used directly, but to generate an ECOC code.

ENDs have been studied with decision trees and logistic regression as binary classifiers. This paper studies their use with ensembles (e.g., bagging, boosting) of decision trees as binary classifiers. This approach improves the results of ENDs of decision trees and forests of multiclass trees. Moreover, another way of combining classical ensemble methods with nested dichotomies is considered. It can be seen as replacing the ensemble method used in END with other ensemble method. NDs of decision trees are used as the base classifiers for these ensemble methods. This approach gives even better accuracies.

Although the presented method could be used with ensembles of classifiers obtained using any method, this paper will consider decision trees. They are very commonly used as base classifiers in ensemble methods: they can be used with mixed type variables, are fast and sensitive to changes of the training data. The last property is relevant in ensemble methods because the diversity of the base classifiers is desirable for classifier ensembles.

The rest of the paper is organised as follows. Section 2 gives a brief introduction to ensembles of nested dichotomies and describes how to use them with decision forest. The experimental study, using 44 datasets and 51 variants of methods is presented in Section 3. In Section 4 kappa-error diagrams are used to analyze the relationship between ensemble methods when decision trees and nested dichotomies of decision trees are used as base classifiers. Finally, Section 5 presents some concluding remarks.

Section snippets

Nested dichotomies and decision forests

A nested dichotomy (Frank and Kramer, 2004) is a tree with the following properties:

  • Each node has associated a non-empty set of classes.

  • The root node includes all the classes, while the leaf nodes include only one class.

  • The tree is strictly binary, that is, all the non-leaf nodes have two children.

  • The classes in two siblings form a partition of the classes in the parent node. That is, their intersection is empty and their union is the set of all the classes in the parent.

  • Each internal node has

Experimental validation

Table 1 shows a summary of the datasets used in the experiments. They are from the UCI repository (Asuncion and Newman, 2007).

Kappa-error diagrams

The advantage of FND over F can be a consequence of the increased diversity of the classifiers in the ensemble. In order to test this hypothesis we use kappa-error diagrams (Margineantu and Dietterich, 1997). In these diagrams, each pair of classifiers in the ensemble are represented by a point. The ensembles are represented as clouds of points. The x coordinate corresponds to the diversity of the two classifiers, according to the κ measure. The y coordinate is the average error of the two

Conclusion

For multiclass problems, ensembles of decision trees can be successfully combined with ensembles of nested dichotomies. The direct approach, using ensembles of nested dichotomies with a forest method as the base classifier can be improved using ensemble methods with a nested dichotomy of decision trees as the base classifier. These affirmations are supported by an experimental study with 44 multiclass datasets and 51 variants of methods. The best overall results where obtained using forest of

Acknowledgements

This work has been supported by the “Junta de Castilla y León” project BU007B08.

We wish to thank the developers of Weka. We also express our gratitude to the donors of the different datasets and the maintainers of the UCI Repository.

References (22)

  • T.G. Dietterich et al.

    Solving multiclass learning problems via error-correcting output codes

    J. Artificial Intell. Res.

    (1995)
  • Cited by (27)

    • A random forest classifier for lymph diseases

      2014, Computer Methods and Programs in Biomedicine
    • EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

      2013, Pattern Recognition
      Citation Excerpt :

      Nevertheless, only showing two diagrams might not be representative enough to reach meaningful conclusions. For this reason, we have also considered kappa-error movement and relative movement diagrams [39,44], once again using the AUC error, which serve as a summarization of all kappa-AUC error diagrams, since all the data-sets are presented in a unique plot. In addition, kappa-AUC error relative movement diagram (Fig. 6(b)) depicts the same arrows as Fig. 6(a), but in this case, the initial points of the arrows have been moved to the origin, and the head of the arrows correspond to the difference between the centers of kappa-AUC error diagrams.

    • Cooperative Co-Evolution for Ensembles of Nested Dichotomies for Multi-Class Classification

      2023, GECCO 2023 - Proceedings of the 2023 Genetic and Evolutionary Computation Conference
    • An effective feature generation and selection approach for lymph disease recognition

      2021, CMES - Computer Modeling in Engineering and Sciences
    View all citing articles on Scopus
    View full text