Evolutionary rule-based systems for imbalanced data sets

Orriols-Puig, Albert; Bernadó-Mansilla, Ester

doi:10.1007/s00500-008-0319-7

Evolutionary rule-based systems for imbalanced data sets

Focus
Published: 27 May 2008

Volume 13, pages 213–225, (2009)
Cite this article

Soft Computing Aims and scope Submit manuscript

Albert Orriols-Puig¹ &
Ester Bernadó-Mansilla¹

621 Accesses
136 Citations
Explore all metrics

Abstract

This paper investigates the capabilities of evolutionary on-line rule-based systems, also called learning classifier systems (LCSs), for extracting knowledge from imbalanced data. While some learners may suffer from class imbalances and instances sparsely distributed around the feature space, we show that LCSs are flexible methods that can be adapted to detect such cases and find suitable models. Results on artificial data sets specifically designed for testing the capabilities of LCSs in imbalanced data show that LCSs are able to extract knowledge from highly imbalanced domains. When LCSs are used with real-world problems, they demonstrate to be one of the most robust methods compared with instance-based learners, decision trees, and support vector machines. Moreover, all the learners benefit from re-sampling techniques. Although there is not a re-sampling technique that performs best in all data sets and for all learners, those based in over-sampling seem to perform better on average. The paper adapts and analyzes LCSs for challenging imbalanced data sets and establishes the bases for further studying the combination of re-sampling technique and learner best suited to a specific kind of problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aha DW, Kibler DF, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1): 37–66
Google Scholar
Batista G, Prati RC, Monrad MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6(1): 20–29
Article Google Scholar
Bernadó-Mansilla E, Garrell JM (2003) Accuracy-based learning classifier systems: Models, analysis and applications to classification tasks. Evol Comput 11(3): 209–238
Article Google Scholar
Bernadó-Mansilla E, Ho TK (2005) Domain of competence of XCS classifier system in complexity measurement space.. IEEE Trans Evol Comput 9(1): 1–23
Article Google Scholar
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California. http://www.ics.uc.edu/~mlearn/MLRepository.html
Butz MV (2006) Rule-based evolutionary online learning systems: a principled approach to LCS analysis and design. In: Studies in fuzziness and soft computing, vol 109. Springer, New Yok
Butz MV, Wilson SW (2001) An algorithmic description of XCS. In: Lanzi PL, Stolzmann W, Wilson SW (eds) Advances in learning classifier systems: proceedings of the third international workshop. Lecture notes in artificial intelligence, vol 1996. Springer, New York, pp 253–272
Carvalho DR, Freitas AA (2000) A hybrid decision tree/genetic algorithm for coping with the problem of small disjuncts in data mining. In: Proceedings of GECCO’00. Morgan Kaufmann, San Francisco, pp 1061–1068
Chawla NV, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16: 321–357
MATH Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30
MathSciNet Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comp 10(7): 1895–1924
Article Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32: 675–701
Article Google Scholar
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11: 86–92
Article MATH Google Scholar
Goldberg DE (2002) The design of innovation: lessons from and for competent genetic algorithms, 1 edn. Kluwer Academic Publishers, Dordrecht
MATH Google Scholar
Holland JH (1976) Adaptation. In: Rosen R, Snell F (eds) Progress in theoretical biology, vol. 4. Academic Press, New York, pp 263–293
Google Scholar
Holte RC, Acker LE, Porter BW (1989) Concept learning and the problem of small disjuncts. In: IJCAI’89, pp 813–818
Japkowicz N, Stephen S (2000) The class imbalance problem: significance and strategies. In: IC-AI’00, vol 1, pp 111–117
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5): 429–450
MATH Google Scholar
Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. SIGKDD Explor 6(1): 40–49
Article MathSciNet Google Scholar
Kovacs T (1999) Deletion schemes for classifier systems. In: GECCO’99. Morgan Kaufmann, San Francisco, pp 329–336
Orriols-Puig A (2006) Facetwise analysis of learning classifier systems in imbalanced domains. Technical report, Ramon Llull University
Orriols-Puig A, Bernadó-Mansilla E (2006) Bounding XCS parameters for unbalanced datasets. In: GECCO ’06. ACM Press, New York, pp 1561–1568
Orriols-Puig A, Bernadó-Mansilla E (2007) Modeling XCS in class imbalances: population size and parameters’ settings. In: GECCO’07. ACM Press, New York, pp 1838–1845
Orriols-Puig A, Bernadó-Mansilla E (2008) A further look at UCS classifier system. In: Advances at the frontier of LCS. Springer, New York (in press)
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel methods—support Vector Lear. MIT Press, Cambridge
Quinlan JR (1995) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo
Google Scholar
Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Cybern 6: 769–772
Article MATH MathSciNet Google Scholar
Weiss GM (2003) The effect of small disjuncts and class distribution on decision tree learning. PhD thesis, Graduate School New Brunswick, The State University of New Jersey, New Brunswick, New Jersey
Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor 6(1): 7–19
Article Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1: 80–83
Article Google Scholar
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2): 149–175
Article Google Scholar
Wilson SW (1998) Generalization in the XCS classifier system. In: Third annual conference on genetic programming. Morgan Kaufmann, San Francisco, pp 665–674
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2): 241–259
Article Google Scholar
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms.. Neural Comput 8(7): 1341–1390
Article Google Scholar

Download references

Author information

Authors and Affiliations

Grup de Recerca en Sistemes Intelligents, Enginyeria i Arquitectura La Salle, Universitat Ramon Llull, Quatre Camins 2, 08022, Barcelona, Spain
Albert Orriols-Puig & Ester Bernadó-Mansilla

Authors

Albert Orriols-Puig
View author publications
You can also search for this author in PubMed Google Scholar
Ester Bernadó-Mansilla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Albert Orriols-Puig.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Orriols-Puig, A., Bernadó-Mansilla, E. Evolutionary rule-based systems for imbalanced data sets. Soft Comput 13, 213–225 (2009). https://doi.org/10.1007/s00500-008-0319-7

Download citation

Published: 27 May 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s00500-008-0319-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolutionary rule-based systems for imbalanced data sets

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evolutionary rule-based systems for imbalanced data sets

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation