skip to main content
10.1145/1830483.1830639acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

AUC analysis of the pareto-front using multi-objective GP for classification with unbalanced data

Published: 07 July 2010 Publication History

Abstract

Learning algorithms can suffer a performance bias when data sets are unbalanced. This paper proposes a Multi-Objective Genetic Programming (MOGP) approach using the accuracy of the minority and majority class as learning objectives. We focus our analysis on the classification ability of evolved Pareto-front solutions using the Area Under the ROC Curve (AUC) and investigate which regions of the objective trade-off surface favour high-scoring AUC solutions. We show that a diverse set of well-performing classifiers is simultaneously evolved along the Pareto-front using the MOGP approach compared to canonical GP where only one solution is found along the objective trade-off surface, and that in some problems the MOGP solutions had better AUC than solutions evolved with canonical GP using hand-crafted fitness functions.

References

[1]
A. Asuncion and D. Newman. UCI Machine Learning Repository, 2007. http://www.ics.uci.edu/mlearn/MLRepository.html. University of California, Irvine, School of Information and Computer Sciences.
[2]
U. Bhowan, M. Johnston, and M. Zhang. Differentiating between individual class performance in genetic programming fitness for classification with unbalanced data. In Proceedings of the 2009 IEEE Congress on Evolutionary Computation (CEC), pages 2802--2809, 2009.
[3]
U. Bhowan, M. Johnston, and M. Zhang. Genetic programming for image classification with unbalanced data. In Proceedings of 24th International Conference on Image and Vision Computing, Wellington, New Zealand, pages 316--321. IEEE Press, 2009.
[4]
A. P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:1145--1159, 1997.
[5]
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth and Brooks, 1984.
[6]
N. V. Chawla, N. Japkowicz, and A. Kolcz. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6:1--6, June 2004.
[7]
C. Coello, G. Lamont, and D. Veldhuizen. Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic Evolutionary Computation Series). Springer US, 2nd edition, 2007.
[8]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6:182--197, 2000.
[9]
J. Doucette and M. I. Heywood. GP classification under imbalanced data sets: Active sub-sampling and AUC approximation. In Proceedings of 11th European Conference in Genetic Programming (EuroGP 08), pages 266--277, 2008.
[10]
J. Eggermont, A. Eiben, and J. van Hemert. Adapting the fitness function in GP for data mining. Genetic Programming, 2nd European Workshop, LNCS, 1598:193--202, 1999.
[11]
T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1:291--316, 1997.
[12]
J. H. Holmes. Differential negative reinforcement improves classifier system learning rate in two-class problems with unequal base rates. In J.R. Koza, W. Banzhaf, K. Chellapilla, et al (eds.): Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 635--644. Morgan Kaufmann, San Francisco, 1998.
[13]
J. Knowles, L. Thiele, and E. Zitzler. A tutorial on the performance assessment of stochastic multiobjective optimizers. Technical report, February 2006. No. 214, Computer Engineering and Networks Laboratory, Swiss Federal Institute of Technology Zurich.
[14]
J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992.
[15]
S. Munder and D. Gavrila. An experimental study on pedestrain classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11):1863--1868, 2006.
[16]
E. Pednault, B. Rosen, and C. Apte. Handling imbalanced data sets in insurance risk modeling. IBM Tech Research Report RC-21731, 2000.
[17]
D. Song, M. Heywood, and A. Zincir-Heywood. Training genetic programming on half a million patterns: an example from anomaly detection. IEEE Transactions on Evolutionary Computation, 9(3):225--239, June 2005.
[18]
K.-K. Sung. Learning and Example Selection for Object and Pattern Recognition. PhD thesis, AI Laboratory and Center for Biological and Computational Learning, MIT, 1996.
[19]
G. M. Weiss and F. Provost. Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19:315--354, 2003.
[20]
L. Yan, R. Dodier, M. C. Mozer, and R. Wolniewicz. Optimizing classifier performance via the Wilcoxon-Mann-Whitney statistic. In Proceedings of The Twentieth International Conference on Machine Learning (ICML'03), pages 848--855, 2003.

Cited By

View all
  • (2023)A parameter-optimization framework for neural decoding systemsFrontiers in Neuroinformatics10.3389/fninf.2023.93868917Online publication date: 2-Feb-2023
  • (2021)Binary Differential Evolution based Feature Selection Method with Mutual Information for Imbalanced Classification Problems2021 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC45853.2021.9504882(794-801)Online publication date: 28-Jun-2021
  • (2019)Imbalanced Dataset Problem in Classification Algorithms2019 1st International Informatics and Software Engineering Conference (UBMYK)10.1109/UBMYK48245.2019.8965444(1-5)Online publication date: Nov-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation
July 2010
1520 pages
ISBN:9781450300728
DOI:10.1145/1830483
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. class imbalance
  2. classification
  3. evolutionary multi-objective optimisation
  4. genetic programming

Qualifiers

  • Research-article

Conference

GECCO '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A parameter-optimization framework for neural decoding systemsFrontiers in Neuroinformatics10.3389/fninf.2023.93868917Online publication date: 2-Feb-2023
  • (2021)Binary Differential Evolution based Feature Selection Method with Mutual Information for Imbalanced Classification Problems2021 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC45853.2021.9504882(794-801)Online publication date: 28-Jun-2021
  • (2019)Imbalanced Dataset Problem in Classification Algorithms2019 1st International Informatics and Software Engineering Conference (UBMYK)10.1109/UBMYK48245.2019.8965444(1-5)Online publication date: Nov-2019
  • (2012)Multi-objective evolutionary optimization for generating ensembles of classifiers in the ROC spaceProceedings of the 14th annual conference on Genetic and evolutionary computation10.1145/2330163.2330285(879-886)Online publication date: 7-Jul-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media