Elsevier

Pattern Recognition

Volume 69, September 2017, Pages 1-13
Pattern Recognition

A parameter randomization approach for constructing classifier ensembles

https://doi.org/10.1016/j.patcog.2017.03.031Get rights and content

Highlights

  • We propose a novel randomization-based approach for classifier ensemble construction.

  • It samples the parameters of the base classifiers from a pre-defined distribution.

  • As an example we derive the parameter distribution of some linear bagged classifiers.

  • We then simulate bagging by using the derived distribution.

Abstract

Randomization-based techniques for classifier ensemble construction, like Bagging and Random Forests, are well known and widely used. They consist of independently training the ensemble members on random perturbations of the training data or random changes of the learning algorithm. We argue that randomization techniques can be defined also by directly manipulating the parameters of the base classifier, i.e., by sampling their values from a given probability distribution. A classifier ensemble can thus be built without manipulating the training data or the learning algorithm, and then running the learning algorithm to obtain the individual classifiers. The key issue is to define a suitable parameter distribution for a given base classifier. This also allows one to re-implement existing randomization techniques by sampling the classifier parameters from the distribution implicitly defined by such techniques, if it is known or can be approximated, instead of explicitly manipulating the training data and running the learning algorithm. In this work we provide a first investigation of our approach, starting from an existing randomization technique (Bagging): we analytically approximate the parameter distribution for three well-known classifiers (nearest-mean, linear and quadratic discriminant), and empirically show that it generates ensembles very similar to Bagging. We also give a first example of the definition of a novel randomization technique based on our approach.

Introduction

Ensembles methods have become a state-of-the-art approach for classifier design [1], [2]. Among them, ensemble construction techniques based on randomization are well-known and widely used, e.g., Bagging [6], Random Subspace Method [3], Random Forests [4], and the more recent Rotation Forests [7]. Randomization techniques have been formalized in [4] as independently learning several individual classifiers using a given learning algorithm, after randomly manipulating the training data or the learning algorithm itself. For instance, Bagging and Random Subspace Method consist in learning each individual classifier respectively on a bootstrap replicate of the original training set, and on a random subset of the original features; Random Forests (ensembles of decision trees) combine the bootstrap sampling of the original training set with a random selection of the attribute of each node, among the most discriminative ones.

The main effect of randomization techniques, and in particular Bagging, is generally believed to be the reduction of the variance of the loss function of a base classifier. Accordingly, they are effective especially for unstable classifiers, i.e., classifiers that exhibit large changes in their output as a consequence of small changes in the training set, like decision trees and neural networks, as opposed, e.g., to the nearest neighbor classifier [6]. It is worth noting that randomization techniques operate in parallel, contrary to another state-of-the-art approach, boosting, which is a sequential ensemble construction technique [8].

In this work we propose a new approach for defining randomization techniques, inspired by the fact that existing ones can be seen as implicitly inducing a probability distribution on the parameters of a base classifier. Accordingly, we propose that new randomization techniques can be obtained by directly defining a suitable parameter distribution for a given classifier, as a function of the training set at hand; an ensemble can therefore be built by directly sampling the parameter values of its members from such a distribution, without actually manipulating the available training data nor running the learning algorithm. In this way, an ensemble can be obtained even without having access to the training set, but having access only to a pre-trained classifier. Some information about the training set, such as mean and covariance matrix, is enough to apply our method, and it could be obtained from a pre-trained classifier.

Our approach also allows a different implementation of existing randomization techniques. If the distribution induced by a given technique on the parameters of a given base classifier is known or can be approximated, one could build an ensemble as described above, instead of running the corresponding procedure and then the learning algorithm.

As mentioned above, the key issue of our approach is to define a suitable parameter distribution for a given base classifier, i.e., capable of providing a trade-off between accuracy and diversity of the resulting classifiers which is advantageous in terms of ensemble performance. To our knowledge no previous work investigated the distribution of classifier parameters induced by randomization techniques, which is not a straightforward problem. To take a first step in this direction, in this work we start from the analysis and modeling of the distribution induced by one of the most popular techniques, Bagging, on base classifiers that can be dealt with analytically: the nearest mean, linear discriminant, and quadratic discriminant classifiers. We then assess the accuracy of our model by comparing the corresponding, empirical parameter distribution with the one produced by Bagging. The results of our analysis, that have to be extended in future work to other base classifiers and randomization techniques, are aimed at obtaining insights on the parameter distributions induced by existing randomization techniques, and thus hints and guidelines for the definition of novel techniques based on our approach. We give a first example of the definition of a new randomization technique, starting from our model of the distribution induced by Bagging on the classifiers mentioned above.

The rest of this paper is structured as follows. In Section 2 we summarize the main relevant concepts about randomization techniques and Bagging. We then present our approach and describe the considered base classifiers in Section 3. In Section 4 we model the parameter distribution induced by Bagging on such classifiers. In Section 5 we empirically evaluate the accuracy of our model, and give an example of the definition of new randomization techniques based on our approach. In Section 6 we discuss limitations and extensions of our work.

Section snippets

Background

The notation used in this paper is summarized in Table 1. We shall use Greek letters to denote probability distribution parameters, and Roman letters for other quantities, including estimated distribution parameters (statistics); vectors in Roman letters will be written in bold. For a given statistic a estimated from a training set we shall denote by a*(j) its j-th bootstrap replicate, and with a* the corresponding random variable.

Randomization techniques for ensemble construction can be

A parameter randomization approach for ensemble construction

Consider a given classification algorithm, e.g., a parametric linear classifier with discriminant function w·x+w0 implemented as the linear discriminant classifier (LDC), or a non-parametric neural network trained with the back-propagation algorithm. Let ψ denote the parameters that are set by the chosen learning algorithm L, e.g., the coefficients of the LDC (in this case, ψ=(w,w0)), or the connection weights of a neural network.

Consider now any given randomization technique R (e.g.,

Joint parameter distribution of “bagged” classifiers

For the sake of simplicity, and with no loss of generality, in the following we consider two-class problems. All the results of this section can be easily extended to multi-class problems, as explained in Section 4.1.

Let Y={+1,1} denote the class labels, and n1 and n2 (with n=n1+n2) the number of training instances from the two classes, i.e.: T={(xi,+1)}i=1n1{(xi,1)}i=n1+1n1+n2. We make the usual assumption of i.i.d. training instances. To make analytical derivations possible, we consider

Experiments

To evaluate the proposed randomization approach we carry out experiments on 27 two-class data sets, using as base classifiers NMC, LDC and QDC. Our first aim is to verify whether and to what extent the parameter distribution of classifiers obtained by Bagging can be approximated by the ones we derived in Section 4. Secondly, we compare the classification performance of Bagging with that of classifier ensembles obtained by our approach using the parameter distributions derived for Bagging.

Discussion and conclusions

We proposed a novel approach for defining and implementing randomization techniques for classifier ensemble construction. It is based on modeling the joint probability distribution of the parameters of a given base classifier, and then obtaining the ensemble members by directly sampling from such a distribution their parameter values, instead of manipulating the training data and running the learning algorithm for each of them. This approach can also be exploited as an alternative

Acknowledgment

This work has been partly supported by the project ”Computational quantum structures at the service of pattern recognition: modeling uncertainty” [CRP-59872] funded by Regione Autonoma della Sardegna, L.R. 7/2007, Bando 2012.

Enrica Santucci is a Research Fellow of Mathematical and Physical Modeling in Engineering. Her research interests are related to statistical pattern recognition, multiple classifier systems and phenomenology of neural networks.

References (23)

  • M. Skurichina et al.

    Bagging for linear classifiers

    Pattern Recognit.

    (1998)
  • J.T. Kent et al.

    Confidence intervals for the noncentral chi-squared distribution

    J. Stat. Planning Inference.

    (1995)
  • L.I. Kuncheva

    Combining Pattern Classifiers

    (2014)
  • Z.-H. Zhou

    Ensemble Methods: Foundations and Algorithms

    (2012)
  • T.K. Ho

    The random subspace method for constructing decision forests

    IEEE Trans. Patt. Anal. Mach. Intell.

    (1998)
  • L. Breiman

    Random forests

    Machine Learning.

    (2001)
  • P.J. Huber

    Robust Statistics

    (1981)
  • L. Breiman

    Bagging predictors

    Machine Learning.

    (1996)
  • J.J. Rodriguez et al.

    Rotation forest: a new classifier ensemble method

    IEEE Trans. Patt. Anal. Mach. Intell.

    (2006)
  • Y. Freund et al.

    Experiments with a new boosting algorithm

    Int. Conf. on Machine Learning

    (1996)
  • G. Fumera, F. Roli, A. Serrau, A theoretical analysis of bagging as a linear combination of classifiers, IEEE Trans....
  • Cited by (19)

    • Subspace-based decision trees integration

      2022, Information Sciences
      Citation Excerpt :

      Multiple classifier systems (MCS) or ensemble methods (EM) have been developed for over 20 years [21] and are still the current trend of machine learning. Since then, many theoretical [24,36] reviews [11,35] as well as practical works have been published. Practical applications include, among others, biometrics [13,33] intrusion detection [4,42] medical applications [1,39], production systems [7,15,29] and many others.

    • Fusion of linear base classifiers in geometric space

      2021, Knowledge-Based Systems
      Citation Excerpt :

      It is essentially composed of three stages: (i) generation, (ii) selection, and (iii) fusion or integration. The aim of the generation phase [6] is to create basic classification models, which are assumed to be diverse. In the selection phase [7], one classifier (the classifier selection) or a certain subset of classifiers is selected (the ensemble selection or ensemble pruning) learned at an earlier stage.

    • A hybrid data-level ensemble to enable learning from highly imbalanced dataset

      2021, Information Sciences
      Citation Excerpt :

      This so-called bootstrap sampling ensures that each base classifier can be trained in an independent manner and the specific imbalance issues (e.g., small disjunct, overlapping between classes and data shift) do not affect all the base classifiers. Santucci et al. [38] provided the theoretical reason behind bagging’s excellence. They stated that the excellence of bagging is achieved by reducing the variance of the loss functions of the base classifier.

    • Linear classifier combination via multiple potential functions

      2021, Pattern Recognition
      Citation Excerpt :

      In general, the procedure for creating an EoC can be divided into three major steps [6]: Generation – a phase where individual classifiers are trained [7]. Selection – a phase where only several (or one) individual models from the previous step are selected to EoC [8].

    • A non-specialized ensemble classifier using multi-objective optimization

      2020, Neurocomputing
      Citation Excerpt :

      If the outputs of a classifier are changed by inputting different bootstrapped samples, the errors of those classifiers are likely to be less correlated than the errors of classifiers that use the same data. This reasoning holds up in practice [31–34], and is something taken advantage of in this paper. Other work has suggested using clustering rather than bagging [35–40].

    View all citing articles on Scopus

    Enrica Santucci is a Research Fellow of Mathematical and Physical Modeling in Engineering. Her research interests are related to statistical pattern recognition, multiple classifier systems and phenomenology of neural networks.

    Luca Didaci is an Assistant Professor of Computer Engineering. His research interests are related to statistical pattern recognition, multiple classifier systems and adversarial classification, with applications to biometric recognition. He is a member of the IEEE and IAPR.

    Giorgio Fumera is an Associate Professor of Computer Engineering. His research interests are related to statistical pattern recognition, multiple classifier systems and adversarial classification, with applications to document categorization and person re-identification. He is a member of the IEEE and IAPR.

    Fabio Roli is a Full Professor of Computer Engineering. His research over the past twenty years addressed the design of pattern recognition systems in real applications. He played a leading role for the research field of multiple classifier systems. He is Fellow of the IEEE and Fellow of the IAPR.

    View full text