Skip to main content
Log in

Canonical Forest

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We propose a new classification ensemble method named Canonical Forest. The new method uses canonical linear discriminant analysis (CLDA) and bootstrapping to obtain accurate and diverse classifiers that constitute an ensemble. We note CLDA serves as a linear transformation tool rather than a dimension reduction tool. Since CLDA will find the transformed space that separates the classes farther in distribution, classifiers built on this space will be more accurate than those on the original space. To further facilitate the diversity of the classifiers in an ensemble, CLDA is applied only on a partial feature space for each bootstrapped data. To compare the performance of Canonical Forest and other widely used ensemble methods, we tested them on 29 real or artificial data sets. Canonical Forest performed significantly better in accuracy than other ensemble methods in most data sets. According to the investigation on the bias and variance decomposition, the success of Canonical Forest can be attributed to the variance reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ahn H, Moon H, Fazzari MJ, Lim N, Chen JJ, Kodell RL (2007) Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal 51:6166–6179

    Article  MATH  MathSciNet  Google Scholar 

  • Anthony M, Biggs N (1992) Computational learning theory. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Science. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    MATH  MathSciNet  Google Scholar 

  • Breiman L (1998) Arcing classifiers. Ann Stat 26:801–849

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman L (2001) Random Forest. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont

    MATH  Google Scholar 

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 148–156

  • Freund Y, Schapire R (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 55:119–139

    Article  MATH  MathSciNet  Google Scholar 

  • Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–48

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York

    Book  Google Scholar 

  • Hayashi K (2012) A boosting method with asymmetric mislabeling probabilities which depend on covariates. Comput Stat 27:203–218

    Article  Google Scholar 

  • Heinz G, Peterson LJ, Johnson RW, Kerk CJ (2003) Exploring relationships in body dimensions. J Stat Educ 11. http://www.amstat.org/~publications/jse/v11n2/datasets.heinz.html

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70

    MATH  MathSciNet  Google Scholar 

  • Hothorn T, Lausen B (2003) Double-Bagging: Combining classifiers by bootstrap aggregation. Pattern Recognit 36:1303–1309

    Article  MATH  Google Scholar 

  • Ji C, Ma S (1997) Combinations of weak classifiers. IEEE Trans Neural Netw 8(1):32–42

    Article  Google Scholar 

  • Kestler HA, Lausser L, Linder W, Palm G (2011) On the fusion of threshold classifiers for categorization and dimensionality reduction. Comput Stat 26:321–340

    Article  Google Scholar 

  • Kim H, Loh WY (2001) Classification trees with unbiased multiway splits. J Am Stat Assoc 96:589–604

    Article  MathSciNet  Google Scholar 

  • Kim H, Loh WY (2003) Classification trees with bivariate linear discriminant node models. J Comput Graph Stat 12:512–530

    Article  MathSciNet  Google Scholar 

  • Kim H, Kim H, Moon H, Ahn H (2011) A weight-adjusted voting algorithm for ensemble of classifiers. J Korean Stat Soc 40:437–449

    Article  MathSciNet  Google Scholar 

  • Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 275–283

  • Kong EB, Dietterich TG (1995) Error-correcting output coding corrects bias and variance. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 313–321

  • Kuncheva LI, Rodríguez JJ (2007) An experimental study on rotation forest ensembles. In: Haindl H, Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 459–468

    Chapter  Google Scholar 

  • Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles. Mach Learn 51:181–207

    Article  MATH  Google Scholar 

  • Leisch F, Dimitriadou E (2010) mlbench: machine learning benchmark problems. R package version 2.0-0

  • Loh WY (2010) Improving the precision of classification trees. Ann Appl Stat 4:1710–1737

    MathSciNet  Google Scholar 

  • Rodríguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630

    Article  Google Scholar 

  • Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227

    Google Scholar 

  • Statlib (2010) Datasets archive. Carnegie Mellon University, Department of Statistics. http://lib.stat.cmu.edu

  • Terhune JM (1994) Geographical variation of harp seal underwater vocalisations. Can J Zool 72:892–897

    Article  Google Scholar 

  • Zhu J, Rosset S, Zou H, Hastie T (2009) Multi-class Adaboost. Stat Interface 2:349–360

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

Hyunjoong Kim’s work was partly supported by Basic Science Research program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2012R1A1A2042177). Hongshik Ahn’s work was partially supported by the IT Consiliance Creative Project through the Ministry of Knowledge Economy, Republic of Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongshik Ahn.

Appendices

Appendix 1: Pseudocode of CLDA

Given:

  • \(X\): the objects in the training data set (an \(n \times p\) matrix)

  • \(C\): number of classes

  • \(p\): number of variables

  • \(S_i\): covariance of class \(i\)

Procedure:

  1. 1.

    Compute the class centroid matrix \(M_{C \mathbf{x} p}\), where the \((i,j)\) entry is the mean of class \(i\) for variable \(j\)

  2. 2.

    Compute the common covariance matrix W:

    $$\begin{aligned} W = \sum _{i=1}^C \left( n_i - 1\right) S_i \end{aligned}$$
  3. 3.

    Compute \(M^*\) = \(MW^{-1/2}\) by using eigen-decomposition of \(W\)

  4. 4.

    Obtain the between covariance matrix \(B^*\) by computing the covariance matrix of \(M^*\)

  5. 5.

    Do the eigenvalue-decomposition of \(B^*\) such that \(B^* = VDV^T\)

  6. 6.

    The columns \(v_i\) of \(V\) define the coordinates of the optimal subspaces

  7. 7.

    Convert \(X\) to the coordinates in the new subspace:

    $$\begin{aligned} Z_i = v_i^TW^{-1/2}X \end{aligned}$$
  8. 8.

    \(Z_i\) is the \(i^{th}\) canonical coordinate

Appendix 2: Pseudocode of Canonical Forest

Input:

  • \(X\): training data composed of \(n\) instances (an \(n \times p\) matrix)

  • \(Y\): the labels of the training data (an \(n \times 1\) vector)

  • \(B\): number of classifiers in an ensemble

  • \(K\): number of subsets

  • \(w = (1,\ldots , C)\): set of class labels

Training Phase: For \(i = 1,\ldots , B\)

  1. 1.

    Randomly split \(F\) (the feature set) into \(K\) subsets: \(F_{i,j}\) (for \(j = 1,\ldots , K\))

  2. 2.

    For \(j = 1,\ldots , K\)

    • \(\diamond \) Let \(X_{i,j}\) be the data matrix that corresponds to the features in \(F_{i,j}\)

    • \(\diamond \) Draw a bootstrap sample \(X'_{i,j}\) (with sample size 75 % of the number of instances in \(X_{i,j}\)) from \(X_{i,j}\)

    • \(\diamond \) Apply CLDA on \(X'_{i,j}\) to obtain a coefficient matrix \(A_{i,j}\)

  3. 3.

    Arrange \(A_{i,j}\) (\(j = 1 ,\ldots , K\)) into a block diagonal matrix \(R_i\)

  4. 4.

    Construct the rotation matrix \(R^a_i\) by rearranging the rows of \(R_i\) so that they correspond to the original order of features in \(F\)

  5. 5.

    Use (\(XR^a_i, Y\)) as the training data to build a classifier \(L_i\)

Test Phase: For a given instance \(x\), the predicted class label from classifier \(L\) is:

$$\begin{aligned} L(x) = \mathop {\arg \max }\limits _{y\in w} \sum _{i=1}^B I(L_i(xR^a_i) = y) \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, YC., Ha, H., Kim, H. et al. Canonical Forest. Comput Stat 29, 849–867 (2014). https://doi.org/10.1007/s00180-013-0466-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-013-0466-x

Keywords

Navigation