skip to main content
10.1145/3205651.3208245acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Optimizing clustering to promote data diversity when generating an ensemble classifier

Published:06 July 2018Publication History

ABSTRACT

In this paper, we propose a method to generate an optimized ensemble classifier. In the proposed method, a diverse input space is created by clustering training data incrementally within a cycle. A cycle is one complete round that includes clustering, training, and error calculation. In each cycle, a random upper bound of clustering is chosen and data clusters are generated. A set of heterogeneous classifiers are trained on all generated clusters to promote structural diversity. An ensemble classifier is formed in each cycle and generalization error of that ensemble is calculated. This process is optimized to find the set of classifiers which can have the lowest generalization error. The process of optimization terminates when generalization error can no longer be minimized. The cycle with the lowest error is then selected and all trained classifiers of that particular cycle are passed to the next stage. Any classifier having lower accuracy than the average accuracy of the pool is discarded, and the remaining classifiers form the proposed ensemble classifier. The proposed ensemble classifier is tested on classification benchmark datasets from UCI repository. The results are compared with existing state-of-the-art ensemble classifier methods including Bagging and Boosting. It is demonstrated that the proposed ensemble classifier performs better than the existing ensemble methods.

References

  1. T. K. Ho, J. J. Hull, and S. N. Srihari, "Decision combination in multiple classifier systems," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66--75, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. G. Dietterich, "Ensemble methods in machine learning," Multiple Classifier Systems, vol. 1857, pp. 1--15, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. I. Kuncheva and C. J. Whitaker, "Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy," Machine Learning, vol. 51, no. 2, pp. 181--207, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z.-H. Zhou, Ensemble methods: foundations and algorithms. CRC Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Woźniak, M. Graña, and E. Corchado, "A survey of multiple classifier systems as hybrid systems," Information Fusion, vol. 16, pp. 3--17, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. H. Wolpert and W. G. Macready, "No free lunch theorems for optimization," IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 67--82, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Kittler, M. Hatef, R. P. Duin, and J. Matas, "On combining classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226--239, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Abellán and J. G. Castellano, "A comparative study on base classifiers in ensemble methods for credit scoring," Expert Systems with Applications, vol. 73, pp. 1--10, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  9. M.-J. Kim and D.-K. Kang, "Classifiers selection in ensembles using genetic algorithms for bankruptcy prediction," Expert Systems with Applications, vol. 39, no. 10, pp. 9308--9314, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Breiman, "Bagging predictors," Machine Learning, vol. 24, no. 2, pp. 123--140, 1996. Google ScholarGoogle ScholarCross RefCross Ref
  11. G. Rätsch, T. Onoda, and K.-R. Müller, "Soft margins for AdaBoost," Machine Learning, vol. 42, no. 3, pp. 287--320, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Freund and R. E. Schapire, "Experiments with a new boosting algorithm," in International Conference on Machine Learning, Bari, Italy, 1996, vol. 96, pp. 148--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Vezhnevets and V. Vezhnevets, "Modest AdaBoost-teaching AdaBoost to generalize better," in Graphicon, 2005, vol. 12, no. 5, pp. 987--997.Google ScholarGoogle Scholar
  14. C. Domingo and O. Watanabe, "MadaBoost: A modification of AdaBoost," in Conference on Learning Theory, 2000, pp. 180--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Avidan, "Spatialboost: Adding spatial reasoning to adaboost," in European Conference on Computer Vision, 2006, pp. 386--396: Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Gönen and E. Alpaydm, "Multiple kernel learning algorithms," Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2211--2268, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Rahman and B. Verma, "Ensemble classifier generation using non-uniform layered clustering and Genetic Algorithm," Knowledge-Based Systems, vol. 43, pp. 30--42, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645--678, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Asafuddoula, B. Verma, and M. Zhang, "An incremental ensemble classifier learning by means of a rule-based accuracy and diversity comparison," in International Joint Conference on Neural Networks, 2017, pp. 1924--1931.Google ScholarGoogle Scholar
  21. S. Fletcher and B. Verma, "Removing Bias from Diverse Data Clusters for Ensemble Classification," in International Conference on Neural Information Processing, 2017, pp. 140--149: Springer.Google ScholarGoogle Scholar
  22. A. Rahman and B. Verma, "Novel layered clustering-based approach for generating ensemble of classifiers," IEEE Transactions on Neural Networks, vol. 22, no. 5, pp. 781--92, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Kadkhodaei and A. M. E. Moghadam, "An entropy based approach to find the best combination of the base classifiers in ensemble classifiers based on stack generalization," in International Conference on Control, Instrumentation, and Automation, 2016, pp. 425--429.Google ScholarGoogle Scholar
  24. L-y. Yang, J.-y. Zhang, and W.-j. Wang, "Cluster ensemble based on particle swarm optimization," in WRI Global Congress on Intelligent Systems, 2009, vol. 3, pp. 519--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. J. Escalante, M. Montes, and E. Sucar, "Ensemble particle swarm model selection," in International Joint Conference on Neural Networks, 2010, pp. 1--8.Google ScholarGoogle Scholar
  26. H. J. Escalante, M. M. y Gómez, and L. E. Sucar, "Psms for neural networks on the ijcnn 2007 agnostic vs prior knowledge challenge," in International Joint Conference on Neural Networks, 2007, pp. 678--683.Google ScholarGoogle ScholarCross RefCross Ref
  27. H. J. Escalante, M. Montes, and L. E. Sucar, "Particle swarm model selection," Journal of Machine Learning Research, vol. 10, pp. 405--440, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. J. Escalante, M. Montes, and L. Villaseñor, "Particle swarm model selection for authorship verification," in Iberoamerican Congress on Pattern Recognition, 2009, pp. 563--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Verma and A. Rahman, " Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 605--618, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Bache and M. Lichman. (2013). UCI machine learning repository. Available: http://archive.ics.uci.edu/ml/Google ScholarGoogle Scholar
  31. L. Zhang and P. N. Suganthan, "Oblique decision tree ensemble via multisurface proximal support vector machine," IEEE Transactions on Cybernetics, vol. 45, no. 10, pp. 2165--2176, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  32. L. I. Kuncheva and J. J. Rodríguez, "A weighted voting framework for classifiers ensembles," Knowledge and Information Systems, vol. 38, no. 2, pp. 259--275, 2014.Google ScholarGoogle Scholar
  33. MATLAB, Statistics and Machine Learning Toolbox. Natick, Massachusetts: The MathWorks Inc., 2013.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference Companion
    July 2018
    1968 pages
    ISBN:9781450357647
    DOI:10.1145/3205651

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 6 July 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate1,669of4,410submissions,38%

    Upcoming Conference

    GECCO '24
    Genetic and Evolutionary Computation Conference
    July 14 - 18, 2024
    Melbourne , VIC , Australia

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader