Abstract
The experimental analysis on the performance of a proposed method is a crucial and necessary task to carry out in a research. This paper is focused on the statistical analysis of the results in the field of genetics-based machine Learning. It presents a study involving a set of techniques which can be used for doing a rigorous comparison among algorithms, in terms of obtaining successful classification models. Two accuracy measures for multi-class problems have been employed: classification rate and Cohen’s kappa. Furthermore, two interpretability measures have been employed: size of the rule set and number of antecedents. We have studied whether the samples of results obtained by genetics-based classifiers, using the performance measures cited above, check the necessary conditions for being analysed by means of parametrical tests. The results obtained state that the fulfillment of these conditions are problem-dependent and indefinite, which supports the use of non-parametric statistics in the experimental analysis. In addition, non-parametric tests can be satisfactorily employed for comparing generic classifiers over various data-sets considering any performance measure. According to these facts, we propose the use of the most powerful non-parametric statistical tests to carry out multiple comparisons. However, the statistical analysis conducted on interpretability must be carefully considered.
Similar content being viewed by others
References
Aguilar-Ruiz JS, Giráldez R, Riquelme JC (2000) Natural encoding for evolutionary supervised learning. IEEE Trans Evol Comput 11(4):466–479
Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput 13(3):307–318
Alpaydin E (2004) Introduction to machine learning, vol 452. MIT Press, Cambridge
Anglano C, Botta M (2002) NOW G-Net: learning classification programs on networks of workstations. IEEE Trans Evol Comput 6(13):463–480
Asuncion A, Newman DJ (2007) UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.htm
Bacardit J (2004) Pittsburgh genetic-based machine learning in the data mining era: representations, generalization and run-time, Dept. Comput. Sci., University Ramon Llull, Barcelona, Spain
Bacardit J, Garrell JM (2003) Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system. In: Proceedings of the genetic and evolutionary computation conference (GECCO’03), vol 2724. LNCS, Germany, pp 1818–1831
Bacardit J, Garrell JM (2004) Analysis and improvements of the adaptive discretization intervals knowledge representation. In: Proceedings of the genetic and evolutionary computation conference (GECCO’04), vol 3103. LNCS, Germany, pp 726–738
Bacardit J, Garrell JM (2007) Bloat control and generalization pressure using the minimum description length principle for Pittsburgh approach learning classifier system. In: Kovacs T, Llorá X, Takadama K (eds) Advances at the frontier of learning classifier systems, vol 4399. LNCS, USA, pp 61–80
Barandela R, Sánchez JS, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recognit 36(3):849–851
Ben-David A (2007) A lot of randomness is hiding in accuracy. Eng Appl Artif Intell 20:875–885
Bernadó-Mansilla E, Garrell JM (2003) Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238
Bernadó-Mansilla E, Ho TK (2005) Domain of competence of XCS classifier system in complexity measurement space. IEEE Trans Evol Comput 9(1):82–104
Clark P, Niblett T (1989) The CN2 induction algorithm. Machine Learn 3(4):261–283
Cohen JA (1960) Coefficient of agreement for nominal scales. Educ Psychol Meas 37–46
Corcoran AL, Sen S (1994) Using real-valued genetic algorithms to evolve rule sets for classification. In: Proceedings of the IEEE conference on evolutionary computation, pp 120–124
De Jong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Machine Learn 13:161–188
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Machine Learn Res 7:1–30
Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Machine Learn 65(1):95–130
Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms, vol 264. Springer, Berlin
Grefenstette JJ (1993) Genetic algorithms for machine learning, vol 176. Kluwer, Norwell
Guan SU, Zhu F (2005) An incremental approach to genetic-algorithms-based classification. IEEE Trans Syst Man Cybern B 35(2):227–239
Hekanaho J (1998) An evolutionary approach to concept learning. Dissertation, Department of Computer Science, Abo akademi University, Abo, Finland
Hochberg Y (2000) A sharper bonferroni procedure for multiple tests of significance. Biometrika 75:800–803
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310
Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat 18:571–595
Jiao L, Liu J, Zhong W (2006) An organizational coevolutionary algorithm for classification. IEEE Trans Evol Comput 10(1):67–80
Koch GG (1970) The use of non-parametric methods in the statistical analysis of a complex split plot experiment. Biometrics 26(1):105–128
Landgrebe TCW, Duin RPW (2008) Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans Pattern Anal Mach Intell 30(5):810–822
Lim T-S, Loh W-Y, Shih Y-S (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learn 40(3):203–228
Markatou M, Tian H, Biswas S, Hripcsak G (2005) Analysis of variance of cross-validation estimators of the generalization error. J Machine Learn Res 6:1127–1168
Rivest RL (1987) Learning decision lists. Machine Learn 2:229–246
Sheskin DJ (2006) Handbook of parametric and nonparametric statistical procedures, vol 1736. Chapman & Hall/CRC, London/West Palm Beach
Shaffer JP (1995) Multiple hypothesis testing. Ann Rev Psychol 46:561–584
Sigaud O, Wilson SW (2007) Learning classifier systems: a survey. Soft Comput 11:1065–1078
Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Australian conference on artificial intelligence, vol 4304. LNCS, Germany, pp 1015–1021
Tan KC, Yu Q, Ang JH (2006) A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 37(12):835–864
Tulai AF, Oppacher F (2004) Multiple species weighted voting - a genetics-based machine learning system. In: Proceedings of the genetic and evolutionary computation conference (GECCO’03), vol 3103. LNCS, Germany, pp 1263–1274
Venturini G (1993) SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proceedings of the machine learning ECML’93, vol 667. LNAI, Germany, pp 280–296
Wilson SW (1994) ZCS: a zeroth order classifier system. Evol Comput 2:1–18
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn, vol 525. Morgan Kaufmann, San Francisco
Wright SP (1992) Adjusted p-values for simultaneous inference. Biometrics 48:1005–1013
Youden W (1950) Index for rating diagnostic tests. Cancer 3:32–35
Zar JH (1999) Biostatistical analysis, vol 929. Prentice Hall, Englewood Cliffs
Acknowledgments
The study was supported by the Spanish Ministry of Science and Technology under Project TIN-2005-08386-C05-01. J. Luengo holds a FPU scholarship from Spanish Ministry of Education and Science. The authors are very grateful to the anonymous reviewers for their valuable suggestions and comments to improve the quality of this paper. We also are very grateful to Prof. Bacardit, Prof. Bernadó-Mansilla and Prof. Aguilar-Ruiz for providing the KEEL software with the GASSIST-ADI, XCS and HIDER algorithms, respectively.
Author information
Authors and Affiliations
Corresponding author
A genetic algorithms in classification
A genetic algorithms in classification
Here we will give a wider description of all the methods employed in our work, regarding their main components, structure and operation of each one of them.
For more details about the methods explained here, please refer to the corresponding references.
Pitts-GIRLA algorithm. The Pittsburgh genetic interval rule learning algorithm (Pitts-GIRLA) (Corcoran and Sen 1994) is a GBML method which makes use of the Pittsburgh approach in order to perform a classification task. Two real variables indicate the minimum and maximum value of the attribute, where a “don’t care” condition may occur if the maximum value is lower than the minimum value.
This algorithm employs three different operators: modified simple (one point) crossover, creep mutation and simple random mutation.
XCS algorithm. XCS (Wilson 1995) is a LCS that evolves online a set of rules that describe the feature space accurately. In the following we will present in detail the different components of this algorithm:
-
1.
Interaction with the environment: In keeping with the typical LCS model, the environment provides as input to the system a series of sensory situations σ(t) ∈ {0,1}L, where L is the number of bits in each situation. In response the system executes actions α(t) ∈ {a 1,…,a n } upon the environment. Each action results in a scalar reward ρ(t).
-
2.
A classifier in XCS: XCS keeps a population of classifiers which represent its knowledge about the problem. Each classifier is a condition-action-predic-tion rule having the following parts: the condition C ∈ {0,1,#}L, the action A ∈ {a 1,…,a n } and the prediction p. Furthermore, each classifier keeps certain additional parameters such as the prediction error ε, the fitness F, the experience exp, the time stamp ts, the action set size as and the numerosity.
-
3.
The different sets: There are four different sets that need to be considered in XCS: the population [P], the match set [M], the action set [A] and the previous action set [A −1].
The result of this algorithm is that the knowledge is represented by a set of rules or classifiers with a certain fitness. When classifying unseen examples, each rule that matches the input votes according its prediction and fitness. The most voted class is chosen to be the output.
GASSIST algorithm. Genetic Algorithms based claSSIfier sySTem (GASSIST) (Bacardit and Garrell 2007) is a Pittsburgh style classifier system based on GABIL (De Jong et al. 1993) from where it has taken the semantically correct crossover operator. The main features of this classifier system are presented as follows:
-
1.
General operators and policies
-
Matching strategy The matching process follows a “if ... then ... else if ... then ...” structure, usually called decision lists (Rivest 1987).
-
Mutation operators When an individual is selected for mutation a random gene is chosen inside its chromosome to be mutated.
-
-
2.
Control of the individuals length: This control is achieved using two different operators:
-
Rule deletion: This operator deletes the rules of the individuals that do not match any training example.
-
Selection bias using the individual size: Tournament selection is used, where the criterion of the tournament is given by an operator called “hierarchical selection”, defined as follows:
-
If |accuracy a –accuracy b | < threshold then:
-
If length a < length b then a is better than b.
-
If length a > length b then b is better than a.
-
If length a = length b then we will use the general case.
-
-
Otherwise, we use the general case: we select the individual with higher fitness.
-
-
-
3.
Knowledge representations
-
Rule Representations for symbolic or discrete attributes: It uses the GABIL (De Jong et al. 1993) representation for this kind of attributes.
-
Rule Representations for real-valued attributes For GASSIST-ADI, the representation is based on the Adaptive Discretization Intervals rule representation (Bacardit and Garrell 2003; Bacardit 2004).
-
HIDER algorithm. HIerarchical DEcision Rules (HIDER) (Aguilar-Ruiz et al. 2000), produces a hierarchical set of rules, which may be viewed as a Decision List. In order to extract the rule-list a real-coded GA is employed in the search process. The elements of this procedure are described below.
-
1.
Coding: Each rule is represented by an individual (chromosome), where two genes define the lower and upper bounds of the rule attribute.
-
2.
Algorithm: The algorithm is a typical sequential covering GA. It chooses the best individual of the evolutionary process, transforming it into a rule which is used to eliminate data from the training file (Venturini 1993).
Initially, the set of rules R is empty, but in each iteration a rule is included in R. In each iteration, the training file is reduced, eliminating those examples that have been covered by the description of the rule r, independently of its class.
The GA main operators are defined in the following:
-
(a)
Initialization: First, an example is randomly selected from the training file for each individual of the population. Afterwards, an interval to which the example belongs is obtained.
-
(b)
Crossover: The crossover works as follows: let [l j i , u j i ] and [l k i , u k i ] be the intervals of two parents, j and k, for the same attribute i. From these parents one child is generated by selecting values that satisfy the expression: l ∈ [min(l j i , l k i ), max(l j i , l k i )] and u ∈ [min(u j i , u k i ),max(u j i , u k i )].
-
(c)
Mutation: a small value is subtracted or added, depending on whether it is the lower or the upper boundary, respectively.
-
(d)
Fitness function: The fitness function f considers a two-objective optimization, trying to maximize the number of correctly classified examples and to minimize the number of errors.
Rights and permissions
About this article
Cite this article
García, S., Fernández, A., Luengo, J. et al. A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13, 959–977 (2009). https://doi.org/10.1007/s00500-008-0392-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-008-0392-y