Canonical Forest

Chen, Yu-Chuan; Ha, Hyejung; Kim, Hyunjoong; Ahn, Hongshik

doi:10.1007/s00180-013-0466-x

Canonical Forest

Original Paper
Published: 11 December 2013

Volume 29, pages 849–867, (2014)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Yu-Chuan Chen¹,
Hyejung Ha³,
Hyunjoong Kim³ &
…
Hongshik Ahn^1,2

340 Accesses
3 Citations
Explore all metrics

Abstract

We propose a new classification ensemble method named Canonical Forest. The new method uses canonical linear discriminant analysis (CLDA) and bootstrapping to obtain accurate and diverse classifiers that constitute an ensemble. We note CLDA serves as a linear transformation tool rather than a dimension reduction tool. Since CLDA will find the transformed space that separates the classes farther in distribution, classifiers built on this space will be more accurate than those on the original space. To further facilitate the diversity of the classifiers in an ensemble, CLDA is applied only on a partial feature space for each bootstrapped data. To compare the performance of Canonical Forest and other widely used ensemble methods, we tested them on 29 real or artificial data sets. Canonical Forest performed significantly better in accuracy than other ensemble methods in most data sets. According to the investigation on the bias and variance decomposition, the success of Canonical Forest can be attributed to the variance reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical evaluation of random transformations applied to ensemble clustering

Article 28 July 2020

EFS: an ensemble feature selection tool implemented as R-package and web-application

Article Open access 27 June 2017

Discriminant Function Selection in Binary Classification Task

References

Ahn H, Moon H, Fazzari MJ, Lim N, Chen JJ, Kodell RL (2007) Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal 51:6166–6179
Article MATH MathSciNet Google Scholar
Anthony M, Biggs N (1992) Computational learning theory. Cambridge University Press, Cambridge
MATH Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Science. http://www.ics.uci.edu/~mlearn/MLRepository.html
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
MATH MathSciNet Google Scholar
Breiman L (1998) Arcing classifiers. Ann Stat 26:801–849
Article MATH MathSciNet Google Scholar
Breiman L (2001) Random Forest. Mach Learn 45:5–32
Article MATH Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
MATH Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Article Google Scholar
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 148–156
Freund Y, Schapire R (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 55:119–139
Article MATH MathSciNet Google Scholar
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–48
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York
Book Google Scholar
Hayashi K (2012) A boosting method with asymmetric mislabeling probabilities which depend on covariates. Comput Stat 27:203–218
Article Google Scholar
Heinz G, Peterson LJ, Johnson RW, Kerk CJ (2003) Exploring relationships in body dimensions. J Stat Educ 11. http://www.amstat.org/~publications/jse/v11n2/datasets.heinz.html
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
MATH MathSciNet Google Scholar
Hothorn T, Lausen B (2003) Double-Bagging: Combining classifiers by bootstrap aggregation. Pattern Recognit 36:1303–1309
Article MATH Google Scholar
Ji C, Ma S (1997) Combinations of weak classifiers. IEEE Trans Neural Netw 8(1):32–42
Article Google Scholar
Kestler HA, Lausser L, Linder W, Palm G (2011) On the fusion of threshold classifiers for categorization and dimensionality reduction. Comput Stat 26:321–340
Article Google Scholar
Kim H, Loh WY (2001) Classification trees with unbiased multiway splits. J Am Stat Assoc 96:589–604
Article MathSciNet Google Scholar
Kim H, Loh WY (2003) Classification trees with bivariate linear discriminant node models. J Comput Graph Stat 12:512–530
Article MathSciNet Google Scholar
Kim H, Kim H, Moon H, Ahn H (2011) A weight-adjusted voting algorithm for ensemble of classifiers. J Korean Stat Soc 40:437–449
Article MathSciNet Google Scholar
Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 275–283
Kong EB, Dietterich TG (1995) Error-correcting output coding corrects bias and variance. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 313–321
Kuncheva LI, Rodríguez JJ (2007) An experimental study on rotation forest ensembles. In: Haindl H, Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 459–468
Chapter Google Scholar
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles. Mach Learn 51:181–207
Article MATH Google Scholar
Leisch F, Dimitriadou E (2010) mlbench: machine learning benchmark problems. R package version 2.0-0
Loh WY (2010) Improving the precision of classification trees. Ann Appl Stat 4:1710–1737
MathSciNet Google Scholar
Rodríguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Article Google Scholar
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
Google Scholar
Statlib (2010) Datasets archive. Carnegie Mellon University, Department of Statistics. http://lib.stat.cmu.edu
Terhune JM (1994) Geographical variation of harp seal underwater vocalisations. Can J Zool 72:892–897
Article Google Scholar
Zhu J, Rosset S, Zou H, Hastie T (2009) Multi-class Adaboost. Stat Interface 2:349–360
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

Hyunjoong Kim’s work was partly supported by Basic Science Research program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2012R1A1A2042177). Hongshik Ahn’s work was partially supported by the IT Consiliance Creative Project through the Ministry of Knowledge Economy, Republic of Korea.

Author information

Authors and Affiliations

Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY , 11794-3600, USA
Yu-Chuan Chen & Hongshik Ahn
SUNY Korea, Incheon , 406-840, South Korea
Hongshik Ahn
Department of Applied Statistics, Yonsei University, Seoul , 120-749, South Korea
Hyejung Ha & Hyunjoong Kim

Authors

Yu-Chuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hyejung Ha
View author publications
You can also search for this author in PubMed Google Scholar
Hyunjoong Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hongshik Ahn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongshik Ahn.

Appendices

Appendix 1: Pseudocode of CLDA

Given:

$X$: the objects in the training data set (an $n \times p$ matrix)
$C$: number of classes
$p$: number of variables
$S_i$: covariance of class $i$

Procedure:

1.
Compute the class centroid matrix $M_{C \mathbf{x} p}$, where the $(i,j)$ entry is the mean of class $i$ for variable $j$
2.
Compute the common covariance matrix W:
$$\begin{aligned} W = \sum _{i=1}^C \left( n_i - 1\right) S_i \end{aligned}$$
3.
Compute $M^*$ = $MW^{-1/2}$ by using eigen-decomposition of $W$
4.
Obtain the between covariance matrix $B^*$ by computing the covariance matrix of $M^*$
5.
Do the eigenvalue-decomposition of $B^*$ such that $B^* = VDV^T$
6.
The columns $v_i$ of $V$ define the coordinates of the optimal subspaces
7.
Convert $X$ to the coordinates in the new subspace:
$$\begin{aligned} Z_i = v_i^TW^{-1/2}X \end{aligned}$$
8.
$Z_i$ is the $i^{th}$ canonical coordinate

Appendix 2: Pseudocode of Canonical Forest

Input:

$X$: training data composed of $n$ instances (an $n \times p$ matrix)
$Y$: the labels of the training data (an $n \times 1$ vector)
$B$: number of classifiers in an ensemble
$K$: number of subsets
$w = (1,\ldots , C)$: set of class labels

Training Phase: For $i = 1,\ldots , B$

1.
Randomly split $F$ (the feature set) into $K$ subsets: $F_{i,j}$ (for $j = 1,\ldots , K$)
2.
For $j = 1,\ldots , K$
- $\diamond $ Let $X_{i,j}$ be the data matrix that corresponds to the features in $F_{i,j}$
- $\diamond $ Draw a bootstrap sample $X'_{i,j}$ (with sample size 75 % of the number of instances in $X_{i,j}$) from $X_{i,j}$
- $\diamond $ Apply CLDA on $X'_{i,j}$ to obtain a coefficient matrix $A_{i,j}$
3.
Arrange $A_{i,j}$ ($j = 1 ,\ldots , K$) into a block diagonal matrix $R_i$
4.
Construct the rotation matrix $R^a_i$ by rearranging the rows of $R_i$ so that they correspond to the original order of features in $F$
5.
Use ($XR^a_i, Y$) as the training data to build a classifier $L_i$

Test Phase: For a given instance $x$, the predicted class label from classifier $L$ is:

$$\begin{aligned} L(x) = \mathop {\arg \max }\limits _{y\in w} \sum _{i=1}^B I(L_i(xR^a_i) = y) \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, YC., Ha, H., Kim, H. et al. Canonical Forest. Comput Stat 29, 849–867 (2014). https://doi.org/10.1007/s00180-013-0466-x

Download citation

Received: 01 September 2012
Accepted: 12 November 2013
Published: 11 December 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s00180-013-0466-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Canonical Forest

Abstract

Access this article

Similar content being viewed by others

An empirical evaluation of random transformations applied to ensemble clustering

EFS: an ensemble feature selection tool implemented as R-package and web-application

Discriminant Function Selection in Binary Classification Task

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Pseudocode of CLDA

Appendix 2: Pseudocode of Canonical Forest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Canonical Forest

Abstract

Access this article

Similar content being viewed by others

An empirical evaluation of random transformations applied to ensemble clustering

EFS: an ensemble feature selection tool implemented as R-package and web-application

Discriminant Function Selection in Binary Classification Task

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Pseudocode of CLDA

Appendix 2: Pseudocode of Canonical Forest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation