Abstract
Evolutionary Algorithms (EAs) are population-based, stochastic search algorithms that mimic natural evolution. Over the years, EAs have been successfully applied to many classification problems. In this paper, we present three novel evolutionary approaches and analyze their performances for synthesizing classifiers with EAs in supervised data mining scenarios. The first approach is based on encoding rule sets with bit string genomes, while the second one utilizes Genetic Programming (GP) to create decision trees with arbitrary expressions attached to the nodes. The novelty of these two approaches lies in the use of solutions on the Pareto front as an ensemble. The third approach, EDDIE-101, is also based on GP but uses a new, advanced fitness measure and some novel genetic operators. We compare these approaches to a number of well-known data mining methods, including C4.5 and Random-Forest, and show that the performances of our evolved classifiers can be very competitive as far as the solution quality is concerned. In addition, the proposed approaches work well across a wide range of configurations, and EDDIE-101 particularly has been highly efficient. To further evaluate the flexibility of EDDIE-101 across different problem domains, we also test it on some real financial datasets for finding investment opportunities and compare the results with those obtained using other classifiers. Numerical experiments confirm that EDDIE-101 can be successfully extended to financial forecasting.
Similar content being viewed by others
Notes
A similar approach has been suggested by Muni et al. [45].
EDDIE-101 does not use f 1 as the objective function but we computed the f 1-values for it for the convergence analysis here.
References
Alba Torres E, Tomassini M (2002) Parallelism and evolutionary algorithms. IEEE Trans Evol Comput 6(5):443–462
Anderson E (1935) The irises of the Gaspé Peninsula. Bull Am Iris Soc 59:2–5
Au WH, Chan KCC, Yao X (2003) A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Trans Evol Comput 7(6):532–545
Avnimelech R, Intrator N (1999) Boosting regression estimators. Neural Comput Appl 11(2):499–520
Bacardit J, Butz MV (2007) Data mining in learning classifier systems: comparing XCS with GAssist. In: Revised selected papers of the international workshops on learning classifier systems, Springer, Lecture notes in artificial intelligence, vol 4399, pp 282–290
Bäck T (1996) Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press, Oxford
Bako L (2010) Real-time classification of datasets with hardware embedded neuromorphic neural networks. Briefings Bioinf 11(3):348–363
Balian R (2004) Entropy, a protean concept. In: Poincaré seminar 2003, Birkhäuser Verlag, Progress in mathematical physics, vol 38, pp 119–144
Bernadó E, Llorà X, Garrell i Guiu JM (2001) XCS and GALE: a comparative study of two learning classifier systems with six other learning algorithms on classification tasks. In: Advances in learning classifier systems, Revised Papers of IWLCS’01, Springer, Lecture notes in artificial intelligence, vol 2321, pp 115–132
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Bull, L, Bernadó-Mansilla, E, Holmes, J (eds) (2008) Learning classifier systems in data mining. In: Studies in computational intelligence, vol 125. Springer, Berlin
Cantú-Paz E, Kamath C (2000) Using evolutionary algorithms to induce oblique decision trees. In: Proceedings of the genetic and evolutionary computation conference, Morgan Kaufmann Publishers, pp 1053–1060
Chiong, R (eds) (2009) Nature-inspired algorithms for optimisation. In: Studies in computational intelligence, vol 193. Springer, Belin
Chiong R, Neri F, McKay RI (2009) Nature that breeds solutions. In: Nature-inspired informatics for intelligent applications and knowledge discovery: implications in business, science and engineering, information science reference, chap 1, pp 1–24
Corcoran AL, Sen S (1994) Using real-valued genetic algorithms to evolve rule sets for classification. In: Proceedings of the first IEEE conference on evolutionary computation, IEEE computer society, vol 1, pp 120–124
De Jong KA, Spears WM (1991) Learning concept classification rules using genetic algorithms. In: Mylopoulos J, Reiter R (eds) Proceedings of the 12th international joint conference on artificial intelligence, Morgan Kaufmann Publishers, vol 2, pp 651–656
Fernández A, García S, Luengo J, Bernadó-Mansilla E, Herrera F (2010) Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Trans Evol Comput. doi:10.1109/TEVC.2009.2039140
Fidelis M, Lopes HS, Freitas AA (2000) Discovering comprehensible classification rules with a genetic algorithm. In: Proceedings of the IEEE congress on evolutionary computation, IEEE computer society, vol 1, pp 805–810
Forina M, Lanteri S, Armanino C et al (1988) PARVUS—an extendible package for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Genoa
Forsyth R (1981) BEAGLE—a Darwinian approach to pattern recognition. Kybernetes 10(3):159–166
Frank E, Hall MA, Holmes G, Kirkby R, Pfahringer B, Witten IH, Trigg L (2005) WEKA—a machine learning workbench for data mining. In: The data mining and knowledge discovery Handbook, Springer, chap 62, pp 1305–1314
Frawley WJ, Piatetsky-Shapiro G, Matheus CJ (1992) Knowledge discovery in databases: an overview. AI Mag 13(3):213–228
Freitas AA (1997) A genetic programming framework for two data mining tasks: classification and generalized rule induction. In: Proceedings of the second annual conference on genetic programming, Morgan Kaufmann Publishers, pp 96–101
Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms. Natural computing series. Springer, Berlin
García-Almanza AL, Tsang EPK (2006) The repository method for chance discovery in financial forecasting. In: Proceedings of the 10th international conference on knowledge-based intelligent information and engineering systems, Part III, Springer, Lecture notes in artificial intelligence, vol 4253, pp 30–37
García-Almanza AL, Tsang EPK, Galván-López E (2008) Evolving decision rules to discover patterns in financial data sets. In: Computational methods in financial engineering—essays in honour of Manfred Gilli, Springer, chap II-5, pp 239–255
Gehrke J, Ramakrishnan R, Ganti V (1998) RainForest—a framework for fast decision tree construction of large datasets. In: Proceedings of 24rd international conference on very large data bases, Morgan Kaufmann Publishers, pp 416–427
Ghosh, A, Jain, LC (eds) (2005) Evolutionary computation in data mining. In: Studies in fuzziness and soft computing, vol 163. Springer, Berlin
Gong G, Cestnik B (1988) Hepatitis data set. UCI Machine Learning Repository, University of California
Grzymala-Busse JW (1997) A new version of the rule induction system LERS. Fundamenta Informaticae – Annales Societatis Mathematicae Polonae, Series IV 31(1):27–39
Harding JA, Shahbaz M, Srinivas, Kusiak A (2006) Data mining in manufacturing: a review. J Manuf Sci Eng 128(4):969–977
Holland JH (1986) Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: Machine learning: an artificial intelligence approach, vol II, William Kaufmann, pp 593–623
Holmes G, Donkin A, Witten IH (1994) WEKA: a machine learning workbench. In: Proceedings of the second Australia and New Zealand conference on intelligent information systems, IEEE Computer Society Press, pp 357–361
Hsu PL, Lai R, Chiu CC (2003) The hybrid of association rule algorithms and genetic algorithms for tree induction: an example of predicting the student course performance. Expert Syst Appl Int J 25(1):51–62
Jabeen H, Baig AR (2010) Review of classification using genetic programming. Int J Eng Sci Technol 2(2):94–103
Kharbat F, Bull L, Odeh M (2007) Mining breast cancer data with XCS. In: Proceedings of 9th genetic and evolutionary computation conference, ACM Press, pp 2066–2073
Koza JR (1990) Concept formation and decision tree induction using the genetic programming paradigm. In: Proceedings of the 1st workshop on parallel problem solving from nature, Springer, Lecture notes in computer science, vol 496, pp 124–128
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. Bradford Books, MIT Press, Cambridge
Li J (2001) FGP: a genetic programming based tool for financial forecasting. PhD thesis, University of Essex
Li J, Li X, Yao X (2005) Cost-sensitive classification with genetic programming. In: Proceedings of the IEEE congress on evolutionary computation, IEEE Computer Society, pp 2114–2121
Liu TY, Yang Y, Wan H, Zeng HJ, Chen Z, Ma WY (2005) Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explor Newsl 7(1):36–43
Martin WN, Lienig J, Cohoon JP (1997) Island (Migration) models: evolutionary algorithms based on punctuated equilibria. In: Handbook of evolutionary computation, computational intelligence library, Oxford University Press, chap C6.3, pp 448–463
Mehta M, Agrawal R, Rissanen J (1996) SLIQ: a fast scalable classifier for data mining. In: Advances in database technology—5th international conference on extending database technology, Springer, Lecture notes in computer science, vol 1057, pp 18–32
Muni DP, Pal NR, Das J (2004) A novel approach to design classifiers using genetic programming. IEEE Trans Evol Comput 8(2):183–196
Orriols-Puig A, Bernadó-Mansilla E (2009) Evolutionary rule-based systems for imbalanced data sets. Soft Comput 13(3):213–225
Orriols-Puig A, Casillas J, Bernadó-Mansilla E (2008) Genetic-based machine learning systems are competitive for pattern recognition. Evol Intell 1(3):209–232
Orriols-Puig A, Casillas J, Bernadó-Mansilla E (2009) Fuzzy-UCS: a Michigan-style learning fuzzy-classifier system for supervised learning. IEEE Trans Evol Comput 13(2):260–283
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, Massachusetts
Rastogi R, Shim K (1998) PUBLIC: a decision tree classifier that integrates building and pruning. In: Proceedings of 24th international conference on very large data bases. Morgan Kaufmann Publishers, Massachusetts, pp 404–415
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
Shafer JC, Agrawal R, Mehta M (1996) SPRINT: a scalable parallel classifier for data mining. In: Proceedings of 22nd international conference on very large data bases. Morgan Kaufmann Publishers, Massachusetts, pp 544–555
Siegel S, Castellan NJ Jr (1956) Nonparametric statistics for the behavioral sciences. Humanities/social sciences/languages. McGraw-Hill, New York
Sir Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Smith SF (1980) A learning system based on genetic adaptive algorithms. PhD thesis, University of Pittsburgh
Spears WM, De Jong KA (1990) Using genetic algorithms for supervised concept learning. In: Proceedings of the 2nd international IEEE conference on tools for artificial intelligence, IEEE Computer Society Press, pp 335–341
Stefanowski J, Slowinski K (1996) Rough sets as a tool for studying attribute dependencies in the urinary stones treatment data set. In: Rough sets and data mining: analysis of imprecise data. Kluwer Academic Publishers, pp 177–196
Tanwani AK, Farooq M (2009) Performance evaluation of evolutionary algorithms in classification of biomedical datasets. In: Proceedings of the 11th annual conference—companion on genetic and evolutionary computation conference, ACM, pp 2617–2624
Tapia JJ, Morett E, Vallejo EE (2009) A clustering genetic algorithm for genomic data mining. In: Foundations of computational intelligence—vol 4: bio-inspired data mining. Studies in computational intelligence, vol 204, Springer, pp 249–275
Tsang EPK, Butler JM, Li J (1998) EDDIE beats the bookies. Int J Softw Pract Exper 28(10):1033–1043
Tsang EPK, Li J, Markose SM, Hakan ER, Salhi A, Iori G (2000) EDDIE in financial decision making. J Manage Econ 4(4)
Tsang EPK, Yung P, Li J (2004) EDDIE-automation—a decision support tool for financial forecasting. Decis Support Syst 37(4):559–565
van Veldhuizen DA, Merkle LD (1999) Multiobjective evolutionary algorithms: classifications, analyses, and new innovations. PhD thesis, Air University, Air Force Institute of Technology: Wright-Patterson Air Force Base, OH, USA
Weise T (2009a) Evolving distributed algorithms with genetic programming. PhD thesis, University of Kassel, Distributed Systems Group: Kassel, Germany
Weise T (2009b) Global optimization algorithms—theory and application. http://www.it-weise.de/
Weise T, Geihs K (2006) DGPF—an adaptable framework for distributed multi-objective search algorithms applied to the genetic programming of sensor networks. In: Proceedings of the second international conference on bioinspired optimization methods and their applications, Jožef Stefan Institute, Informacijska Družba, pp 157–166
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bull 1(6):80–83
Wolberg WH, Mangasarian O (1989) Breast cancer Wisconsin (Original) data set. UCI Machine Learning Repository, University of California
Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 30(4):451–462
Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evol Comput 8(2):173–195
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, P., Weise, T. & Chiong, R. Novel evolutionary algorithms for supervised classification problems: an experimental study. Evol. Intel. 4, 3–16 (2011). https://doi.org/10.1007/s12065-010-0047-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-010-0047-7