Diversity-Based Random Forests with Sample Weight Learning

Yang, Chun; Yin, Xu-Cheng

doi:10.1007/s12559-019-09652-0

Diversity-Based Random Forests with Sample Weight Learning

Published: 07 June 2019

Volume 11, pages 685–696, (2019)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

377 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Given a variety of classifiers, one prevalent approach in classifier ensemble is to diversely combine classifier components, i.e., diversity-based ensembles, and a lot of previous works show that these ensembles can improve classification accuracy. Random forests are one of the most important ensembles. However, most random forests approaches with diversity-related aspects focus on maximizing tree diversity while producing and training component trees. Alternatively, a novel cognitive-inspired diversity-based random forests method, diversity-based random forests via sample weight learning (DRFS), is proposed. Given numerous component trees from the original random forests, DRFS selects and combines tree classifiers adaptively via diversity learning and sample weight learning. By designing a matrix for the data distribution creatively, a unified optimization model is formulated to learn and select diverse trees, where tree weights are learned through a convex quadratic programming problem with sample weights. Moreover, a self-training algorithm is proposed to solve the convex optimization iteratively and learn sample weights automatically. Comparative experiments on 39 typical UCI classification benchmarks and a variety of real-world text categorization benchmarks of our proposed method are conducted. Extensive experiments show that our method outperforms the traditional methods. Our proposed DRFS method can select and combine tree classifiers adaptively and improves the performance on a variety of classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Categorization with Diversity Random Forests

Increasing Diversity in Random Forests Using Naive Bayes

Structural diversity for decision tree ensemble learning

Article 15 February 2018

Notes

Parts of this work previously appeared in [48, 49]. Here, we focus on random forests, propose an improve random forests method (diversity-based random forests), clarify the optimization model, improve the learning procedure, and add a variety of experiments.
From the view of ensemble pruning, Tsoumakas et al. presented a taxonomy of ensemble pruning methods, i.e., ranking based, clustering based, optimization based, and other categories [45]. Zhou divided related methods into three categories: ordering-based pruning, clustering-based pruning, and optimization-based pruning approaches [53]. More specifically, optimization-based pruning methods formulate the ensemble pruning problem as an optimization problem that aims to find the subset of available component classifiers which maximizes or minimizes an objective related to the generalization ability of the final ensemble, which is also the focus of our paper.
Here, the optimization problem with w and Ω in an iterative learning framework is similar to an EM (expectation-maximization) procedure. We also initially used the EM algorithm in our method, and the results were not encouraged compared with this iterative learning algorithm. However, how to adaptively design, improve, and use a variant of EM algorithm for our DRFS method is a near future topic.
In our experiments, this validation set is bootstrapped from the initial training set, and is used as the new training set for learning sample weights and classifier weights in the iterative learning algorithm for DRFS.
http://lamda.nju.edu.cn/code_GASEN.ashx
Note that in these experiments with parameters, the used validation set is the same as the validation set in Algorithm 1, and all other experimental conditions are the same as the ones in “Experimental Setup.”
Totally, there are 20 DRFS ensembles with 20 different values λ by 1/λ = {0, 0.1,..., 1, 2,..., 10}.
http://sourceforge.net/projects/weka/files/datasets/text-datasets/19MclassTextWc.zip
http://prdownloads.sourceforge.net/weka/reuters21578-ModApte.tar.bz2?download
http://www.cs.waikato.ac.nz/ml/weka/
https://code.google.com/p/randomforest-matlab/

References

Amasyali MF, Ersoy OK. Classifier ensembles with the extended space forest. IEEE Trans Knowl Data Eng 2014;26(3):549–62.
Article Google Scholar
Amozegar M, Khorasani K. An ensemble of dynamic neural network identifiers for fault detection and isolation of gas turbine engines. Neural Netw 2016;76:106–21.
Article CAS PubMed Google Scholar
Ayerdi B, Graṅa M. Hybrid extreme rotation forest. Neural Netw 2014;52:33–42.
Article PubMed Google Scholar
Ball K, Grant C, Mundy WR, Shafer TJ. A multivariate extension of mutual information for growing neural networks. Neural Netw 2017;95:29–43.
Article PubMed Google Scholar
Bernard S, Adam S, Heutte L. Dynamic random forests. Pattern Recogn Lett 2012;33(12):1580–6.
Article Google Scholar
Biau G. Analysis of a random forests model. J Mach Learn Res 2012;13:1063–95.
Google Scholar
Brazdil P, Soares C. A comparison of ranking methods for classification algorithm selection. Proceedings of the 11th European Conference on Machine Learning, pp 63–74; 2000.
Chapter Google Scholar
Breiman L. Bagging predictors. Mach Learn 1996;24(1):123–40.
Google Scholar
Breiman L. Random forests. Mach Learn 2001;45:5–32.
Article Google Scholar
Cardoso-Cachopo A. Improving methods for single-label text categorization. PdD Thesis. Instituto Superior Tecnico: Universidade Tecnica de Lisboa; 2007.
Google Scholar
Chang CC, Lin CJ. Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2011;2 (3):1–27. http://www.csie.ntu.edu.tw/cjlin/libsvm.
Article Google Scholar
Debole F, Sebastiani F. An analysis of the relative hardness of Reuters-21578 subsets. JASIST 2005;56(6): 584–96. https://doi.org/10.1002/asi.20147.
Article Google Scholar
Demsar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 2006;7:1–30.
Google Scholar
Frank A, Asuncion A. 2010. UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.
Freund Y, Schapire R. Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning, pp 148–156; 1996.
Freund Y, Schapire R. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1997;55(1):119–39.
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explor Newsl 2009;11(1):10–8. https://doi.org/10.1145/1656274.1656278.
Article Google Scholar
Han EH, Karypis G. Centroid-based document classification: analysis and experimental results. Principles of Data Mining and Knowledge Discovery, 4th European Conference, PKDD 2000, Lyon, France, September 13-16, 2000, Proceedings, pp 424–431; 2000.
Hansen LK, Salamon P. Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 1990;12(10): 993–1001.
Article Google Scholar
Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 2006;15(3):651–74.
Article Google Scholar
Huang K, Zhang R, Jin X, Hussain A. Special issue editorial: cognitively-inspired computing for knowledge discovery. Cogn Comput 2018;10(1):1–2.
Article CAS Google Scholar
Jiang L. Learning random forests for ranking. Frontiers of Computer Science in China 2011;5(1):79–86.
Article CAS Google Scholar
Jiang L, Wang S, Li C, Zhang L. Structure extended multinomial naive Bayes. Inf Sci 2016;329: 346–56.
Article Google Scholar
Krogh A, Sollich P. Statistical mechanics of ensemble learning. Phys Rev E 1997;55(1):811–25.
Article Google Scholar
Kuncheva LI, Whitaker CJ. Measures of diversity in classifier ensembles. Mach Learn 2003;51(2):181–207.
Article Google Scholar
Li N, Yu Y, Zhou ZH. Diversity regularized ensemble pruning. Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases; 2012.
Chapter Google Scholar
Liu FT, Ting KM. Variable randomness in decision tree ensembles. Advances in Knowledge Discovery and Data Mining, 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings, pp 81–90; 2006.
Liu FT, Ting KM, Fan W. Maximizing tree diversity by building complete-random decision trees. Advances in Knowledge Discovery and Data Mining, 9th Pacific-Asia Conference, PAKDD 2005, Hanoi, Vietnam, May 18-20, 2005, Proceedings, pp 605–610; 2005.
Liu FT, Ting KM, Yu Y, Zhou ZH. Spectrum of variable-random trees. J Artif Intell Res (JAIR) 2008;32:355–84.
Article Google Scholar
Lu Z, Wu X, Zhu X, Bongard J. Ensemble pruning via individual contribution ordering. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010, pp 871–880; 2010.
Lulli A, OnetoEmail L, Anguita D. 2019. Mining big data with random forests. Cognitive Computation pp. 1–23. Published online.
Margineantu D, Dietterich T. Pruning adaptive boosting. Proceedings of International Conference on Machine Learning, pp 211–218; 1997.
Martinez-Munoz G, Hernandez-Lobato D, Suarez A. An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans Pattern Anal Mach Intell 2009;31(2):245–59.
Article PubMed Google Scholar
McCallum A, Nigam K. 1998. A comparison of event models for naive Bayes text classification. In: Learning for text categorization: papers from the 1998 AAAI Workshop, pp 41–48. http://www.kamalnigam.com/papers/multinomial-aaaiws98.pdf.
Menze BH, Kelm BM, Splitthoff DN, Kothe U, Hamprecht FA. On oblique random forests. Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases, (ECML-PKDD’11), pp 453–469; 2011.
Opitz DW, Shavlik JW. Generating accurate and diverse members of a neural network ensemble. Advances in Neural Information Processing Systems (NIPS’96), pp 535–541. MIT Press; 1996.
Osadchy M, Keren D, Raviv D. Recognition using hybrid classifiers. IEEE Trans Pattern Anal Mach Intell 2016;38(4):759–71.
Article PubMed Google Scholar
Perera AG, Law YW, Chahl JS. Human pose and path estimation from aerial video using dynamic classifier selection. Cogn Comput 2018;10(6):1019–41.
Article Google Scholar
Qiu C, Jiang L, Li C. Randomly selected decision tree for test-cost sensitive learning. Appl Soft Comput 2017;53:27–33.
Article Google Scholar
Quinlan JR. 1993. C4.5: Programs for machine learning. Morgan Kaufmann.
Robnik-Sikonja M. Improving random forests. Proceedings of 15th European Conference on Machine Learning (ECML’04), pp 359–370; 2004.
Chapter Google Scholar
Rodriguez JJ, Kuncheva LI, Alonso CJ. Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 2006;28(10):1619–30.
Article PubMed Google Scholar
Tang B, He H, Baggenstoss PM, Kay S. A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 2016;28(6):1602–06.
Article Google Scholar
Trawinski K, Quirin A, Cordon O. On the combination of accuracy and diversity measures for genetic selection of bagging fuzzy rule-based multiclassification systems. Proceedings of the 9th Intelligent Systems Design and Applications, pp 121–127; 2009.
Tsoumakas G, Partalas I, Vlahavas I. An ensemble pruning primer. Applications of Supervised and Unsupervised Ensemble Methods, pp 1–13; 2009.
Google Scholar
Wen G, Hou Z, Li H, Li D, Jiang L, Xun E. Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cogn Comput 2017;9(5):597–610.
Article Google Scholar
Wolpert D. Stacked generalization. Neural Netw 1992;5(2):241–60.
Article Google Scholar
Yang C, Yin XC, Hao HW. Diversity-based ensemble with sample weight learning. 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, Sweden, August 24-28, 2014, pp 1236–1241; 2014.
Yang C, Yin XC, Huang K. Text categorization with diversity random forests. Neural Information Processing - 21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014. Proceedings, Part III, pp 317–324; 2014.
Chapter Google Scholar
Yin XC, Huang K, Hao HW, Iqbal K, Wang ZB. A novel classifier ensemble method with sparsity and diversity. Neurocomputing 2014;134:214–21.
Article Google Scholar
Yin XC, Huang K, Yang C, Hao HW. Convex ensemble learning with sparsity and diversity. Inf Fusion 2014;20:49–59.
Article Google Scholar
Zhang Y, Burer A, Street WN, Bennett K, Parrado-hern E. Ensemble pruning via semi-definite programming. J Mach Learn Res 2006;7:1315–38.
Google Scholar
Zhou ZH. Ensemble methods: foundations and algorithms. Boca Raton: Chamman & Hall/CRC; 2012.
Book Google Scholar
Zhou ZH, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artif Intell 2002; 137:239–63.
Article Google Scholar

Download references

Funding

The research was partly supported by the National Natural Science Foundation of China (61473036), China Postdoctoral Science Foundation (2018M641199), and Beijing Natural Science Foundation (4194084).

Author information

Authors and Affiliations

Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Chun Yang & Xu-Cheng Yin
Beijing Key Laboratory of Materials Science Knowledge Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Chun Yang
Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing, 100083, China
Xu-Cheng Yin

Authors

Chun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Cheng Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu-Cheng Yin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, C., Yin, XC. Diversity-Based Random Forests with Sample Weight Learning. Cogn Comput 11, 685–696 (2019). https://doi.org/10.1007/s12559-019-09652-0

Download citation

Received: 13 January 2019
Accepted: 10 May 2019
Published: 07 June 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s12559-019-09652-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diversity-Based Random Forests with Sample Weight Learning

Abstract

Access this article

Similar content being viewed by others

Text Categorization with Diversity Random Forests

Increasing Diversity in Random Forests Using Naive Bayes

Structural diversity for decision tree ensemble learning

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Diversity-Based Random Forests with Sample Weight Learning

Abstract

Access this article

Similar content being viewed by others

Text Categorization with Diversity Random Forests

Increasing Diversity in Random Forests Using Naive Bayes

Structural diversity for decision tree ensemble learning

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation