Skip to main content

Advertisement

Log in

A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability

  • Original Paper
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The experimental analysis on the performance of a proposed method is a crucial and necessary task to carry out in a research. This paper is focused on the statistical analysis of the results in the field of genetics-based machine Learning. It presents a study involving a set of techniques which can be used for doing a rigorous comparison among algorithms, in terms of obtaining successful classification models. Two accuracy measures for multi-class problems have been employed: classification rate and Cohen’s kappa. Furthermore, two interpretability measures have been employed: size of the rule set and number of antecedents. We have studied whether the samples of results obtained by genetics-based classifiers, using the performance measures cited above, check the necessary conditions for being analysed by means of parametrical tests. The results obtained state that the fulfillment of these conditions are problem-dependent and indefinite, which supports the use of non-parametric statistics in the experimental analysis. In addition, non-parametric tests can be satisfactorily employed for comparing generic classifiers over various data-sets considering any performance measure. According to these facts, we propose the use of the most powerful non-parametric statistical tests to carry out multiple comparisons. However, the statistical analysis conducted on interpretability must be carefully considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aguilar-Ruiz JS, Giráldez R, Riquelme JC (2000) Natural encoding for evolutionary supervised learning. IEEE Trans Evol Comput 11(4):466–479

    Article  Google Scholar 

  • Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput 13(3):307–318

    Article  Google Scholar 

  • Alpaydin E (2004) Introduction to machine learning, vol 452. MIT Press, Cambridge

    Google Scholar 

  • Anglano C, Botta M (2002) NOW G-Net: learning classification programs on networks of workstations. IEEE Trans Evol Comput 6(13):463–480

    Article  Google Scholar 

  • Asuncion A, Newman DJ (2007) UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.htm

  • Bacardit J (2004) Pittsburgh genetic-based machine learning in the data mining era: representations, generalization and run-time, Dept. Comput. Sci., University Ramon Llull, Barcelona, Spain

  • Bacardit J, Garrell JM (2003) Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system. In: Proceedings of the genetic and evolutionary computation conference (GECCO’03), vol 2724. LNCS, Germany, pp 1818–1831

    Google Scholar 

  • Bacardit J, Garrell JM (2004) Analysis and improvements of the adaptive discretization intervals knowledge representation. In: Proceedings of the genetic and evolutionary computation conference (GECCO’04), vol 3103. LNCS, Germany, pp 726–738

    Google Scholar 

  • Bacardit J, Garrell JM (2007) Bloat control and generalization pressure using the minimum description length principle for Pittsburgh approach learning classifier system. In: Kovacs T, Llorá X, Takadama K (eds) Advances at the frontier of learning classifier systems, vol 4399. LNCS, USA, pp 61–80

    Google Scholar 

  • Barandela R, Sánchez JS, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recognit 36(3):849–851

    Article  Google Scholar 

  • Ben-David A (2007) A lot of randomness is hiding in accuracy. Eng Appl Artif Intell 20:875–885

    Article  Google Scholar 

  • Bernadó-Mansilla E, Garrell JM (2003) Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238

    Article  Google Scholar 

  • Bernadó-Mansilla E, Ho TK (2005) Domain of competence of XCS classifier system in complexity measurement space. IEEE Trans Evol Comput 9(1):82–104

    Article  Google Scholar 

  • Clark P, Niblett T (1989) The CN2 induction algorithm. Machine Learn 3(4):261–283

    Google Scholar 

  • Cohen JA (1960) Coefficient of agreement for nominal scales. Educ Psychol Meas 37–46

  • Corcoran AL, Sen S (1994) Using real-valued genetic algorithms to evolve rule sets for classification. In: Proceedings of the IEEE conference on evolutionary computation, pp 120–124

  • De Jong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Machine Learn 13:161–188

    Article  Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Machine Learn Res 7:1–30

    Google Scholar 

  • Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Machine Learn 65(1):95–130

    Article  Google Scholar 

  • Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms, vol 264. Springer, Berlin

    Google Scholar 

  • Grefenstette JJ (1993) Genetic algorithms for machine learning, vol 176. Kluwer, Norwell

    Google Scholar 

  • Guan SU, Zhu F (2005) An incremental approach to genetic-algorithms-based classification. IEEE Trans Syst Man Cybern B 35(2):227–239

    Article  Google Scholar 

  • Hekanaho J (1998) An evolutionary approach to concept learning. Dissertation, Department of Computer Science, Abo akademi University, Abo, Finland

  • Hochberg Y (2000) A sharper bonferroni procedure for multiple tests of significance. Biometrika 75:800–803

    Article  MATH  MathSciNet  Google Scholar 

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70

    MATH  MathSciNet  Google Scholar 

  • Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310

    Article  Google Scholar 

  • Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat 18:571–595

    Article  Google Scholar 

  • Jiao L, Liu J, Zhong W (2006) An organizational coevolutionary algorithm for classification. IEEE Trans Evol Comput 10(1):67–80

    Article  Google Scholar 

  • Koch GG (1970) The use of non-parametric methods in the statistical analysis of a complex split plot experiment. Biometrics 26(1):105–128

    Article  Google Scholar 

  • Landgrebe TCW, Duin RPW (2008) Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans Pattern Anal Mach Intell 30(5):810–822

    Article  Google Scholar 

  • Lim T-S, Loh W-Y, Shih Y-S (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learn 40(3):203–228

    Article  MATH  Google Scholar 

  • Markatou M, Tian H, Biswas S, Hripcsak G (2005) Analysis of variance of cross-validation estimators of the generalization error. J Machine Learn Res 6:1127–1168

    MathSciNet  Google Scholar 

  • Rivest RL (1987) Learning decision lists. Machine Learn 2:229–246

    MathSciNet  Google Scholar 

  • Sheskin DJ (2006) Handbook of parametric and nonparametric statistical procedures, vol 1736. Chapman & Hall/CRC, London/West Palm Beach

  • Shaffer JP (1995) Multiple hypothesis testing. Ann Rev Psychol 46:561–584

    Article  Google Scholar 

  • Sigaud O, Wilson SW (2007) Learning classifier systems: a survey. Soft Comput 11:1065–1078

    Article  MATH  Google Scholar 

  • Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Australian conference on artificial intelligence, vol 4304. LNCS, Germany, pp 1015–1021

    Google Scholar 

  • Tan KC, Yu Q, Ang JH (2006) A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 37(12):835–864

    Article  MATH  MathSciNet  Google Scholar 

  • Tulai AF, Oppacher F (2004) Multiple species weighted voting - a genetics-based machine learning system. In: Proceedings of the genetic and evolutionary computation conference (GECCO’03), vol 3103. LNCS, Germany, pp 1263–1274

  • Venturini G (1993) SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proceedings of the machine learning ECML’93, vol 667. LNAI, Germany, pp 280–296

  • Wilson SW (1994) ZCS: a zeroth order classifier system. Evol Comput 2:1–18

    Article  Google Scholar 

  • Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175

    Article  Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn, vol 525. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Wright SP (1992) Adjusted p-values for simultaneous inference. Biometrics 48:1005–1013

    Article  Google Scholar 

  • Youden W (1950) Index for rating diagnostic tests. Cancer 3:32–35

    Article  Google Scholar 

  • Zar JH (1999) Biostatistical analysis, vol 929. Prentice Hall, Englewood Cliffs

Download references

Acknowledgments

The study was supported by the Spanish Ministry of Science and Technology under Project TIN-2005-08386-C05-01. J. Luengo holds a FPU scholarship from Spanish Ministry of Education and Science. The authors are very grateful to the anonymous reviewers for their valuable suggestions and comments to improve the quality of this paper. We also are very grateful to Prof. Bacardit, Prof. Bernadó-Mansilla and Prof. Aguilar-Ruiz for providing the KEEL software with the GASSIST-ADI, XCS and HIDER algorithms, respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. García.

A genetic algorithms in classification

A genetic algorithms in classification

Here we will give a wider description of all the methods employed in our work, regarding their main components, structure and operation of each one of them.

For more details about the methods explained here, please refer to the corresponding references.

Pitts-GIRLA algorithm. The Pittsburgh genetic interval rule learning algorithm (Pitts-GIRLA) (Corcoran and Sen 1994) is a GBML method which makes use of the Pittsburgh approach in order to perform a classification task. Two real variables indicate the minimum and maximum value of the attribute, where a “don’t care” condition may occur if the maximum value is lower than the minimum value.

This algorithm employs three different operators: modified simple (one point) crossover, creep mutation and simple random mutation.

XCS algorithm. XCS (Wilson 1995) is a LCS that evolves online a set of rules that describe the feature space accurately. In the following we will present in detail the different components of this algorithm:

  1. 1.

    Interaction with the environment: In keeping with the typical LCS model, the environment provides as input to the system a series of sensory situations σ(t) ∈ {0,1}L, where L is the number of bits in each situation. In response the system executes actions α(t) ∈ {a 1,…,a n } upon the environment. Each action results in a scalar reward ρ(t).

  2. 2.

    A classifier in XCS: XCS keeps a population of classifiers which represent its knowledge about the problem. Each classifier is a condition-action-predic-tion rule having the following parts: the condition C ∈ {0,1,#}L, the action A ∈ {a 1,…,a n } and the prediction p. Furthermore, each classifier keeps certain additional parameters such as the prediction error ε, the fitness F, the experience exp, the time stamp ts, the action set size as and the numerosity.

  3. 3.

    The different sets: There are four different sets that need to be considered in XCS: the population [P], the match set [M], the action set [A] and the previous action set [A −1].

The result of this algorithm is that the knowledge is represented by a set of rules or classifiers with a certain fitness. When classifying unseen examples, each rule that matches the input votes according its prediction and fitness. The most voted class is chosen to be the output.

GASSIST algorithm. Genetic Algorithms based claSSIfier sySTem (GASSIST) (Bacardit and Garrell 2007) is a Pittsburgh style classifier system based on GABIL (De Jong et al. 1993) from where it has taken the semantically correct crossover operator. The main features of this classifier system are presented as follows:

  1. 1.

    General operators and policies

    • Matching strategy The matching process follows a “if ... then ... else if ... then ...” structure, usually called decision lists (Rivest 1987).

    • Mutation operators When an individual is selected for mutation a random gene is chosen inside its chromosome to be mutated.

  2. 2.

    Control of the individuals length: This control is achieved using two different operators:

    • Rule deletion: This operator deletes the rules of the individuals that do not match any training example.

    • Selection bias using the individual size: Tournament selection is used, where the criterion of the tournament is given by an operator called “hierarchical selection”, defined as follows:

      • If |accuracy a accuracy b | < threshold then:

        • If length a  < length b then a is better than b.

        • If length a  > length b then b is better than a.

        • If length a  = length b then we will use the general case.

      • Otherwise, we use the general case: we select the individual with higher fitness.

  3. 3.

    Knowledge representations

    • Rule Representations for symbolic or discrete attributes: It uses the GABIL (De Jong et al. 1993) representation for this kind of attributes.

    • Rule Representations for real-valued attributes For GASSIST-ADI, the representation is based on the Adaptive Discretization Intervals rule representation (Bacardit and Garrell 2003; Bacardit 2004).

HIDER algorithm. HIerarchical DEcision Rules (HIDER) (Aguilar-Ruiz et al. 2000), produces a hierarchical set of rules, which may be viewed as a Decision List. In order to extract the rule-list a real-coded GA is employed in the search process. The elements of this procedure are described below.

  1. 1.

    Coding: Each rule is represented by an individual (chromosome), where two genes define the lower and upper bounds of the rule attribute.

  2. 2.

    Algorithm: The algorithm is a typical sequential covering GA. It chooses the best individual of the evolutionary process, transforming it into a rule which is used to eliminate data from the training file (Venturini 1993).

    Initially, the set of rules R is empty, but in each iteration a rule is included in R. In each iteration, the training file is reduced, eliminating those examples that have been covered by the description of the rule r, independently of its class.

The GA main operators are defined in the following:

  1. (a)

    Initialization: First, an example is randomly selected from the training file for each individual of the population. Afterwards, an interval to which the example belongs is obtained.

  2. (b)

    Crossover: The crossover works as follows: let [l j i , u j i ] and [l k i , u k i ] be the intervals of two parents, j and k, for the same attribute i. From these parents one child is generated by selecting values that satisfy the expression: l ∈ [min(l j i , l k i ), max(l j i , l k i )] and u ∈ [min(u j i , u k i ),max(u j i , u k i )].

  3. (c)

    Mutation: a small value is subtracted or added, depending on whether it is the lower or the upper boundary, respectively.

  4. (d)

    Fitness function: The fitness function f considers a two-objective optimization, trying to maximize the number of correctly classified examples and to minimize the number of errors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

García, S., Fernández, A., Luengo, J. et al. A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13, 959–977 (2009). https://doi.org/10.1007/s00500-008-0392-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-008-0392-y

Keywords

Navigation