Abstract
Learning algorithms can suffer a performance bias when data sets only have a small number of training examples for one or more classes. In this scenario learning methods can produce the deceptive appearance of “good looking” results even when classification performance on the important minority class can be poor. This paper compares two Genetic Programming (GP) approaches for classification with unbalanced data. The first focuses on adapting the fitness function to evolve classifiers with good classification ability across both minority and majority classes. The second uses a multi-objective approach to simultaneously evolve a Pareto front (or set) of classifiers along the minority and majority class trade-off surface. Our results show that solutions with good classification ability were evolved across a range of binary classification tasks with unbalanced data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks (1984)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Winkler, S., Affenzeller, M., Wagner, S.: Advanced genetic programming based machine learning. Journal of Mathematical Modelling and Algorithms 6(3), 455–480 (2007)
Doucette, J., Heywood, M.I.: GP classification under imbalanced data sets: Active sub-sampling and AUC approximation. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 266–277. Springer, Heidelberg (2008)
Chawla, N.V., Japkowicz, N., Kolcz, A.: Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter 6, 1–6 (2004)
Weiss, G.M., Provost, F.: Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)
Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1997)
Holmes, J.H.: Differential negative reinforcement improves classifier system learning rate in two-class problems with unequal base rates. In: Koza, J.R., Banzhaf, W., Chellapilla, K., et al. (eds.) Proceedings of the Third Annual Conference Genetic Programming 1998, pp. 635–644. Morgan Kaufmann, San Francisco (1998)
Pednault, E., Rosen, B., Apte, C.: Handling imbalanced data sets in insurance risk modeling. Tech. Rep., IBM Tech Research Report RC-21731 (2000)
Sung, K.-K.: Learning and Example Selection for Object and Pattern Recognition. PhD thesis, AI Laboratory and Center for Biological and Computational Learning. MIT (1996)
Munder, S., Gavrila, D.: An experimental study on pedestrain classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(11), 1863–1868 (2006)
Monard, M.C., Batista, G.E.A.P.A.: Learning with skewed class distributions. In: Advances in Logic, Artificial Intelligence and Robotics, pp. 173–180 (2002)
Song, D., Heywood, M., Zincir-Heywood, A.: Training genetic programming on half a million patterns: an example from anomaly detection. IEEE Transactions on Evolutionary Computation 9, 225–239 (2005)
Eggermont, J., Eiben, A., van Hemert, J.: Adapting the fitness function in GP for data mining. In: Langdon, W.B., Fogarty, T.C., Nordin, P., Poli, R. (eds.) EuroGP 1999. LNCS, vol. 1598, pp. 193–202. Springer, Heidelberg (1999)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Bhowan, U., Johnston, M., Zhang, M.: Differentiating between individual class performance in genetic programming fitness for classification with unbalanced data. In: Proceedings of the 2009 IEEE Congress on Evolutionary Computation, CEC 2009 (2009)
Patterson, G., Zhang, M.: Fitness functions in genetic programming for classification with unbalanced data. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 769–775. Springer, Heidelberg (2007)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 182–197 (2000)
Bhowan, U., Johnston, M., Zhang, M.: Multi-objective genetic programming for classification with unbalanced data. In: Li, X. (ed.) AI 2009. LNCS (LNAI), vol. 5866, pp. 370–380. Springer, Heidelberg (2009)
Parrot, D., Li, X., Ciesielski, V.: Multi-objective techniques in genetic programming for evolving classifiers. In: Proceedings of the 2005 Congress on Evolutionary Computation (CEC 2005), September 2005, pp. 1141–1148 (2005)
Fisher, R.A.: Statistical methods for research workers, 14th edn. Oliver and Boyd (1970)
Yan, L., Dodier, R., Mozer, M.C., Wolniewicz, R.: Optimizing classifier performance via the Wilcoxon-Mann-Whitney statistic. In: Proceedings of The Twentieth International Conference on Machine Learning (ICML 2003), pp. 848–855 (2003)
Coello, C., Lamont, G., Veldhuizen, D.: Evolutionary Algorithms for Solving Multi-Objective Problems, 2nd edn. Genetic & Evolutionary Computation Series. Springer, US (2007)
Asuncion, A., Newman, D.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bhowan, U., Johnston, M., Zhang, M.: Genetic programming for image classification with unbalanced data. In: Proceedings of 24th International Conference on Image and Vision Computing, Wellington, New Zealand, pp. 316–321. IEEE Press, Los Alamitos (2009)
Knowles, J., Thiele, L., Zitzler, E.: A tutorial on the performance assessment of stochastic multiobjective optimizers. Tech. Rep., No. 214, Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH) Zurich (February 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhowan, U., Zhang, M., Johnston, M. (2010). Genetic Programming for Classification with Unbalanced Data. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds) Genetic Programming. EuroGP 2010. Lecture Notes in Computer Science, vol 6021. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12148-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-12148-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12147-0
Online ISBN: 978-3-642-12148-7
eBook Packages: Computer ScienceComputer Science (R0)