Abstract
Diversity plays an important role in successful ensemble classification. One way to diversify the base-classifiers in an ensemble classifier is to diversify the data they are trained on. Sampling techniques such as bagging have been used for this task in the past, however we argue that since they maintain the global distribution, they do not engender diversity. We instead make a principled argument for the use of k-Means clustering to create diversity. When creating multiple clusterings with multiple k values, there is a risk of different clusterings discovering the same clusters, which would then train the same base-classifiers. This would bias the ensemble voting process. We propose a new approach that uses the Jaccard Index to detect and remove similar clusters before training the base-classifiers, reducing classification error by removing repeated votes. We demonstrate the effectiveness of our proposed approach by comparing it to three state-of-the-art ensemble algorithms on eight UCI datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Asafuddoula, M., Verma, B., Zhang, M.: An incremental ensemble classifier leaning by means of a rule-based accuracy and diversity comparison. In: International Joint Conference on Neural Networks, p. 8. IEEE, Anchorage (2017)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Britto, A.S., Sabourin, R., Oliveira, L.E.S.: Dynamic selection of classifiers - a comprehensive review. Pattern Recogn. 47(11), 3665–3680 (2014)
Chang, K.H., Parker, D.S.: Complementary prioritized ensemble selection. In: International Joint Conference on Neural Networks, pp. 863–872 (2016)
Didaci, L., Fumera, G., Roli, F.: Diversity in classifier ensembles: fertile concept or dead end? In: Zhou, Z.-H., Roli, F., Kittler, J. (eds.) MCS 2013. LNCS, vol. 7872, pp. 37–48. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38067-9_4
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., Amorim Fernández-Delgado, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)
Gopika, D., Azhagusundari, B.: An analysis on ensemble methods in classification tasks. Int. J. Adv. Res. Comput. Commun. Eng. 3(7), 7423–7427 (2014)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Diego (2006)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Bull. Soc. Vaudoise Sci. Nat. 37, 547–579 (1901)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Kuncheva, L.I., RodrÃguez, J.J.: A weighted voting framework for classifiers ensembles. Knowl. Inf. Syst. 38(2), 259–275 (2014)
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)
Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml/
Mao, S., Jiao, L., Xiong, L., Gou, S., Chen, B., Yeung, S.K.: Weighted classifier ensemble based on quadratic form. Pattern Recogn. 48(5), 1688–1706 (2015)
MathWorks: MATLAB and Statistics and Machine Learning Toolbox
Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression. ACM Comput. Surv. 45(1), 1–40 (2012)
Mika, S., Ratsch, G., Weston, J., Schölkopf, B., Muller, K.R.: Fisher discriminant analysis with kernels. In: IEEE Signal Processing Society Workshop, pp. 41–48. IEEE (1999)
Ng, A.Y., Jordan., M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Advances in Neural Information Processing Systems, pp. 841–848. NIPS (2002)
Quinlan, J.R.: C4.5: Programs for Machine Learning, 1st edn. Morgan Kaufmann, Burlington (1993)
Rahman, A., Verma, B.: A novel layered clustering based approach for generating ensemble of classifiers. IEEE Trans. Neural Netw. 22(5), 781–792 (2011)
Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with K-Means. Knowl.-Based Syst. 71(1), 345–365 (2014)
Ren, Y., Zhang, L., Suganthan, P.N.: Ensemble classification and regression - recent developments, applications and future directions. IEEE Comput. Intell. Mag. 11(1), 41–53 (2016)
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1), 1–39 (2010)
Santucci, E., Didaci, L., Fumera, G., Roli, F.: A parameter randomization approach for constructing classifier ensembles. Pattern Recogn. 69(1), 1–13 (2017)
Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)
Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Tan, C., Li, M., Qin, X.: Random subspace regression ensemble for near-infrared spectroscopic calibration of tobacco samples. Anal. Sci. 24(5), 647–653 (2008)
Verma, B., Rahman, A.: Cluster oriented ensemble classifier: impact of multi-cluster characterisation on ensemble classifier learning. IEEE Trans. Knowl. Data Eng. 24(4), 605–618 (2012)
Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 1473–1480 (2006)
Yang, Y., Jiang, J.: Hybrid sampling-based clustering ensemble with global and local constitutions. IEEE Trans. Neural Netw. Learn. Syst. 27(5), 952–965 (2016)
Zhang, L., Suganthan, P.N.: Oblique decision tree ensemble via multisurface proximal support vector machine. IEEE Trans. Cybern. 45(10), 2165–2176 (2015)
Acknowledgments
This research was supported by the Australian Research Council’s Discovery Project funding scheme (Project Number DP160102639).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Fletcher, S., Verma, B. (2017). Removing Bias from Diverse Data Clusters for Ensemble Classification. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10637. Springer, Cham. https://doi.org/10.1007/978-3-319-70093-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-70093-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70092-2
Online ISBN: 978-3-319-70093-9
eBook Packages: Computer ScienceComputer Science (R0)