To tune or not to tune: rule evaluation for metaheuristic-based sequential covering algorithms

Minnaert, Bart; Martens, David; De Backer, Manu; Baesens, Bart

doi:10.1007/s10618-013-0339-5

To tune or not to tune: rule evaluation for metaheuristic-based sequential covering algorithms

Published: 25 December 2013

Volume 29, pages 237–272, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Bart Minnaert^1,2,
David Martens²,
Manu De Backer^1,2,3 &
…
Bart Baesens³

710 Accesses
Explore all metrics

Abstract

While many papers propose innovative methods for constructing individual rules in separate-and-conquer rule learning algorithms, comparatively few study the heuristic rule evaluation functions used in these algorithms to ensure that the selected rules combine into a good rule set. Underestimating the impact of this component has led to suboptimal design choices in many algorithms. The main goal of this paper is to demonstrate the importance of heuristic rule evaluation functions by improving existing rule induction techniques and to provide guidelines for algorithm designers. We first select optimal heuristic rule learning functions for several metaheuristic-based algorithms and empirically compare the resulting heuristics across algorithms. This results in large and significant improvements of the predictive accuracy for two techniques. We find that despite the absence of a global optimal choice for all algorithms, good default choices can be shared across algorithms with similar search biases. A near-optimal selection can thus be found for new algorithms with minor experimental tuning. Lastly, a major contribution is made towards balancing a model’s predictive accuracy with its comprehensibility. We construct a Pareto front of optimal solutions for this trade-off and show that gains in comprehensibility and/or accuracy are possible for the techniques studied. The parametrized heuristics enable users to select the desired balance as they offer a high flexibility when it comes to selecting the desired accuracy and comprehensibility in rule miners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning customized and optimized lists of rules with mathematical programming

Article 05 September 2018

Multi-heuristic Induction of Decision Rules

Investigating the Impact of Independent Rule Fitnesses in a Learning Classifier System

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

A more general approach is the weighted covering strategy in which the weights are simply reduced. While the separate-and-conquer strategy is a special case, it is far more widespread.
Several terms are encountered in the literature that refer to heuristic rule evaluation functions. Alternative terminology includes ‘rule learning heuristic’, ‘(rule) evaluation function’, ‘fitness function’ and ‘(rule) quality measure’. For the sake of brevity, the term ‘heuristic’ is used in this text where applicable.
The RIPPER algorithm is an exception as corrections can still be made in the post-processing step.
Negative fitness values are also not allowed in these algorithms. For this reason, all parametrized heuristics described in the previous section were incremented with a constant to bring the lowest possible value at precisely zero.
AntMiner is available at http://sourceforge.net/projects/guiantminer/
AntMiner+ is available at http://www.antminerplus.com
PSO/ACO2 is available at http://sourceforge.net/projects/psoaco2/
An implementation of HIDER is available as part of the KEEL software project (Alcalá-Fdez et al. 2009, 2011) at http://www.keel.es/
The probability of an individual to be selected is proportional to its fitness value. This method thus uses the exact values returned by the fitness function.
We use the RIPPER implementation included in the Weka project (Hall et al. 2009) at http://www.cs.waikato.ac.nz/ml/weka/
When using non-gain heuristics in our experiments, the best rule is returned as the last rule is suboptimal in that case.
anneal, audiology, breast-cancer, cleveland-heart-disease, contact-lenses, credit, glass2, glass, hepatitis, horse-colic, hypothyroid, iris, krkp, labor, lymphography, monk1, monk2, monk3, mushroom, sick-euthyroid, soybean, tic-tac-toe, titanic, vote-1, vote, vowel, wine
auto-mpg, autos, balance-scale, balloons, breast-w, breast-w-d, bridges2, credit-g, diabetes, echocardiogram, flag, hayes-roth, heart-h, heart-statlog, ionosphere, machine, primarytumor, promoters, segment, solar-flare, sonar, vehicle, zoo
The UCI datasets are available at http://archive.ics.uci.edu/ml/. The Titanic dataset is available as part of the Delve project of the University of Toronto at http://www.cs.toronto.edu/~delve/.
The statistical tool used to perform the Friedman test and advanced post-hoc procedures can be found at http://sci2s.ugr.es/keel/multipleTest.zip
The data of these experiments is available online at http://www.antminerplus.com
The ACO search starts with an environment that is not specialized and has to learn to add specific terms to the rules (top-down). In PSO and genetic algorithms, the initialization of the swarm/population with either general or specific particles/solutions determines the top-down or bottom-up nature. HIDER initializes with very specific instances (bottom-up). PSO/ACO2 adds conditions to rules in two phases (top-down). The first phase is a limited ACO/PSO hybrid search (top-down). The second phase is a limited PSO search with a non-specializing seeding (top-down).
Our tuning data contains significant and large positive (Spearman) correlations of the observed best performing parameter compared across the different parametrized heuristics. This indicates that the bias requirements of the datasets are somewhat stable.

References

Aguilar-Ruiz J, Riquelme JC, Toro M (2003) Evolutionary learning of hierarchical decision rules. IEEE Trans Syst Man Cybern Part B Cybern 33:324–331
Article Google Scholar
Aguilar-Ruiz JS, Giráldez R, Santos JCR (2007) Natural encoding for evolutionary supervised learning. IEEE Trans Evol Comput 11(4):466–479
Article Google Scholar
Alcalá-Fdez J, Sánchez L, García S, del Jesus M, Ventura S, Garrell J, Otero J, Romero C, Bacardit J, Rivas V, Fernández J, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Article Google Scholar
Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Logic Soft Comput 17(2–3):255–287
Google Scholar
An A, Cercone N (2000) Rule quality measures improve the accuracy of rule induction: an experimental approach. In: Proceedings of 12th Int Symp Found Intell Syst, ISMIS’00, pp 119–129
Andrews R, Diederich J, Tickle AB (1995) Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl-Based Syst 8(6):373–389
Article Google Scholar
Baesens B, Setiono R, Mues C, Vanthienen J (2003a) Using neural network rule extraction and decision tables for credit-risk evaluation. Manag Sci 49(3):312–329
Article MATH Google Scholar
Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003b) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635
Article MATH Google Scholar
Booker LB, Goldberg DE, Holland JH (1989) Classifier systems and genetic algorithms. Artif Intell 40(1–3):235–282
Article Google Scholar
Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. In: 9th Eur Conf, Artif Intell, ECAI’90, pp 147–149
Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283
Google Scholar
Cohen W (1995) Fast effective rule induction. In: Proc 12th Int Conf, Mach Learn, ICML’95, pp 115–123
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B Cybern 26(1):29–41
Article Google Scholar
Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8(1):87–102
MATH Google Scholar
Fernández A, García S, Luengo J, Bernadó-Mansilla E, Herrera F (2010) Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Trans Evol Comput 14(6):913–941
Article Google Scholar
Freitas AA (2003) A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh A, Tsutsiu S (eds) Advances in evolutionary computing: theory and applications. Springer-Verlag New York Inc., New York, NY, USA, pp 819–845
Chapter Google Scholar
Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13(1):3–54
Article MATH Google Scholar
Fürnkranz J, Flach P (2003) An analysis of rule evaluation metrics. In: Proc 20th Int Conf, Mach Learn, ICML’03, pp 202–209
Fürnkranz J, Flach P (2005) ROC ‘n’ rule learning—towards a better understanding of covering algorithms. Mach Learn 58(1):39–77
Article MATH Google Scholar
García S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
Greene DP, Smith SF (1993) Competition-based induction of decision models from examples. Mach Learn 13(2–3):229–257
Article Google Scholar
Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11:10–18
Article Google Scholar
Hettich S, Bay SD (1996) The uci kdd archive [http://kdd.ics.uci.edu]
Holden N, Freitas A (2005) A hybrid particle swarm/ant colony algorithm for the classification of hierarchical biological data. In: Proc IEEE Swarm Intell Symp, SIS’05, pp 100–107
Holden N, Freitas AA (2008) A hybrid PSO/ACO algorithm for discovering classification rules in data mining. J Artif Evol App 2008:1–12
Article Google Scholar
Holte RC, Acker LE, Porter BW (1989) Concept learning and the problem of small disjuncts. In: Proc 11th Int Joint Conf, Artif Intell, IJCAI’89, pp 813–818
Janssen F, Fürnkranz J (2009) A re-evaluation of the over-searching phenomenon in inductive rule learning. In: Proc SIAM Int Conf Data Min, SDM’09, pp 329–340
Janssen F, Fürnkranz J (2010) On the quest for optimal rule learning heuristics. Mach Learn 78(3):343–379
Article MathSciNet Google Scholar
Kira K, Rendell L (1992) The feature selection problem: traditional methods and a new algorithm. In: Proc 10th Natl Conf, Artif Intell, AAAI’92, pp 129–134
Klösgen W (1992) Problems for knowledge discovery in databases and their treatment in the statistics interpreter EXPLORA. Int J Intell Syst 7(7):649–673
Article MATH Google Scholar
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Eur Conf, Mach Learn, ECML’94, pp 171–182
Liu B, Abbass HA, McKay B (2002a) Density-based heuristic for rule discovery with ant-miner. In: Proc 6th Australasia-Japan Joint Workshop Intell Evol Syst, AJWIS’02, pp 180–184
Liu H, Hussain F, Tan CL, Dash M (2002b) Discretization: an enabling technique. Data Min Knowl Discov 6(4):393–423
Article MathSciNet Google Scholar
Liu B, Abbass H, McKay B (2003) Classification rule discovery with ant colony optimization. In: Proc IEEE/WIC Int Conf Intell Agent Tech, IAT’03, pp 83–88
Lopes HS, Coutinho MS, Lima WC (1997) An evolutionary approach to simulate cognitive feedback learning in medical domain. In: Sanchez E, Shibata T, Zadeh L (eds) Genetic algorithms and fuzzy sogic systems: soft computing perspectives. World Scientific, Singapore, pp 193–207
Chapter Google Scholar
Martens D, Baesens B, Van Gestel T, Vanthienen J (2007a) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476
Article MATH Google Scholar
Martens D, De Backer M, Haesen R, Vanthienen J, Snoeck M, Baesens B (2007b) Classification with ant colony optimization. IEEE Trans Evol Comput 11(5):651–665
Article Google Scholar
Martens D, Baesens B, Fawcett T (2011) Editorial survey: swarm intelligence for data mining. Mach Learn 82(1):1–42
Article MathSciNet Google Scholar
Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403
MATH Google Scholar
Montes de Oca M, Stützle T, Birattari M, Dorigo M (2009) Frankenstein’s PSO: a composite particle swarm optimization algorithm. IEEE Trans Evol Comput 13(5):1120–1132
Article Google Scholar
Otero FEB, Freitas AA, Johnson CG (2009) Handling continuous attributes in ant colony classification algorithms. In: Proc IEEE Symp Comput Intell Data Min, IEEE, CIDM’09, pp 225–231
Pappa G, Freitas A (2009) Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl Inf Syst 19(3):283–309
Article Google Scholar
Parpinelli RS, Lopes HS, Freitas AA (2001) An ant colony based system for data mining: Applications to medical data. In: Proc Genet Evol Comput Conf, GECCO’01, pp 791–797
Parpinelli R, Lopes H, Freitas A (2002) Data mining with an ant colony optimization algorithm. IEEE Trans Evol Comput 6(4):321–332
Article Google Scholar
Pazzani M, Mani S, Shankle W (2001) Acceptance by medical experts of rules generated by machine learning. Methods Inf Med 40(5):380–385
Google Scholar
Quinlan JR (1990) Learning logical definitions from relations. Mach Learn 5(3):239–266
Google Scholar
Quinlan JR (1993) C4.5 programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA
Google Scholar
Salama K, Abdelbar A (2011) Exploring different rule quality evaluation functions in aco-based classification algorithms. In: IEEE Symp Swarm Intell, SIS’11, pp 1–8
Smith SF (1980) A learning system based on genetic adaptive algorithms. PhD thesis, Pittsburgh, PA
Sousa T, Silva A, Neves A (2004) Particle swarm based data mining algorithms for classification tasks. Parallel Comput 30(5–6):767–783
Article Google Scholar
Steuer R (1986) Multiple criteria optimization: Theory, computation and application. Wiley, New York, NY
MATH Google Scholar
Stützle T, Holger HH (2000) MAX-MIN ant system. Future Generat Comput Syst 16(9):889–914
Google Scholar
Suykens JAK, Van Gestel T, Brabanter JD, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore
Book MATH Google Scholar
Tan PN, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proc 8th ACM SIGKDD Int Conf Knowl Discov Data Min, KDD’02, pp 32–41
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining. Addison Wesley, Boston, MA
Google Scholar
Van Gestel T, Suykens J, Baesens B, Viaene S, Vanthienen J, Dedene G, De Moor B, Vandewalle J (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54(1):5–32
Article MATH Google Scholar
van Rijsbergen CJ (1979) Information retrieval. Butterworth-Heinemann, Newton, MA
Google Scholar
Venturini G (1993), SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proc Eur Conf, Mach Learn, ECML’93, pp 280–296
Verbeke W, Martens D, Mues C, Baesens B (2011) Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst Appl 38(3):2354–2364
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA
Google Scholar
Wong ML (1998) An adaptive knowledge-acquisition system using generic genetic programming. Expert Syst Appl 15(1):47–58
Article Google Scholar
Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proc 1th Eur Symp Princ Data Min Knowl Discov, PKDD’97, pp 78–87

Download references

Acknowledgments

This work was carried out using the Stevin Supercomputer Infrastructure at Ghent University. We would like to thank the Flemish Research Council for the financial support (FWO Odysseus grant B.0915.09). We are also grateful to the creators of the open source implementations used in this work.

Author information

Authors and Affiliations

Faculty of Economics and Business Administration, Ghent University, Ghent, Belgium
Bart Minnaert & Manu De Backer
Faculty of Applied Economics, University of Antwerp, Antwerp, Belgium
Bart Minnaert, David Martens & Manu De Backer
Department of Decision Sciences Information Management, K.U. Leuven, Leuven, Belgium
Manu De Backer & Bart Baesens

Authors

Bart Minnaert
View author publications
You can also search for this author inPubMed Google Scholar
David Martens
View author publications
You can also search for this author inPubMed Google Scholar
Manu De Backer
View author publications
You can also search for this author inPubMed Google Scholar
Bart Baesens
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Bart Minnaert.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Minnaert, B., Martens, D., De Backer, M. et al. To tune or not to tune: rule evaluation for metaheuristic-based sequential covering algorithms. Data Min Knowl Disc 29, 237–272 (2015). https://doi.org/10.1007/s10618-013-0339-5

Download citation

Received: 13 January 2012
Accepted: 09 November 2013
Published: 25 December 2013
Issue Date: January 2015
DOI: https://doi.org/10.1007/s10618-013-0339-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

To tune or not to tune: rule evaluation for metaheuristic-based sequential covering algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning customized and optimized lists of rules with mathematical programming

Multi-heuristic Induction of Decision Rules

Investigating the Impact of Independent Rule Fitnesses in a Learning Classifier System

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now