Abstract
While many papers propose innovative methods for constructing individual rules in separate-and-conquer rule learning algorithms, comparatively few study the heuristic rule evaluation functions used in these algorithms to ensure that the selected rules combine into a good rule set. Underestimating the impact of this component has led to suboptimal design choices in many algorithms. The main goal of this paper is to demonstrate the importance of heuristic rule evaluation functions by improving existing rule induction techniques and to provide guidelines for algorithm designers. We first select optimal heuristic rule learning functions for several metaheuristic-based algorithms and empirically compare the resulting heuristics across algorithms. This results in large and significant improvements of the predictive accuracy for two techniques. We find that despite the absence of a global optimal choice for all algorithms, good default choices can be shared across algorithms with similar search biases. A near-optimal selection can thus be found for new algorithms with minor experimental tuning. Lastly, a major contribution is made towards balancing a model’s predictive accuracy with its comprehensibility. We construct a Pareto front of optimal solutions for this trade-off and show that gains in comprehensibility and/or accuracy are possible for the techniques studied. The parametrized heuristics enable users to select the desired balance as they offer a high flexibility when it comes to selecting the desired accuracy and comprehensibility in rule miners.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
A more general approach is the weighted covering strategy in which the weights are simply reduced. While the separate-and-conquer strategy is a special case, it is far more widespread.
Several terms are encountered in the literature that refer to heuristic rule evaluation functions. Alternative terminology includes ‘rule learning heuristic’, ‘(rule) evaluation function’, ‘fitness function’ and ‘(rule) quality measure’. For the sake of brevity, the term ‘heuristic’ is used in this text where applicable.
The RIPPER algorithm is an exception as corrections can still be made in the post-processing step.
Negative fitness values are also not allowed in these algorithms. For this reason, all parametrized heuristics described in the previous section were incremented with a constant to bring the lowest possible value at precisely zero.
AntMiner is available at http://sourceforge.net/projects/guiantminer/
AntMiner+ is available at http://www.antminerplus.com
PSO/ACO2 is available at http://sourceforge.net/projects/psoaco2/
An implementation of HIDER is available as part of the KEEL software project (Alcalá-Fdez et al. 2009, 2011) at http://www.keel.es/
The probability of an individual to be selected is proportional to its fitness value. This method thus uses the exact values returned by the fitness function.
We use the RIPPER implementation included in the Weka project (Hall et al. 2009) at http://www.cs.waikato.ac.nz/ml/weka/
When using non-gain heuristics in our experiments, the best rule is returned as the last rule is suboptimal in that case.
anneal, audiology, breast-cancer, cleveland-heart-disease, contact-lenses, credit, glass2, glass, hepatitis, horse-colic, hypothyroid, iris, krkp, labor, lymphography, monk1, monk2, monk3, mushroom, sick-euthyroid, soybean, tic-tac-toe, titanic, vote-1, vote, vowel, wine
auto-mpg, autos, balance-scale, balloons, breast-w, breast-w-d, bridges2, credit-g, diabetes, echocardiogram, flag, hayes-roth, heart-h, heart-statlog, ionosphere, machine, primarytumor, promoters, segment, solar-flare, sonar, vehicle, zoo
The UCI datasets are available at http://archive.ics.uci.edu/ml/. The Titanic dataset is available as part of the Delve project of the University of Toronto at http://www.cs.toronto.edu/~delve/.
The statistical tool used to perform the Friedman test and advanced post-hoc procedures can be found at http://sci2s.ugr.es/keel/multipleTest.zip
The data of these experiments is available online at http://www.antminerplus.com
The ACO search starts with an environment that is not specialized and has to learn to add specific terms to the rules (top-down). In PSO and genetic algorithms, the initialization of the swarm/population with either general or specific particles/solutions determines the top-down or bottom-up nature. HIDER initializes with very specific instances (bottom-up). PSO/ACO2 adds conditions to rules in two phases (top-down). The first phase is a limited ACO/PSO hybrid search (top-down). The second phase is a limited PSO search with a non-specializing seeding (top-down).
Our tuning data contains significant and large positive (Spearman) correlations of the observed best performing parameter compared across the different parametrized heuristics. This indicates that the bias requirements of the datasets are somewhat stable.
References
Aguilar-Ruiz J, Riquelme JC, Toro M (2003) Evolutionary learning of hierarchical decision rules. IEEE Trans Syst Man Cybern Part B Cybern 33:324–331
Aguilar-Ruiz JS, Giráldez R, Santos JCR (2007) Natural encoding for evolutionary supervised learning. IEEE Trans Evol Comput 11(4):466–479
Alcalá-Fdez J, Sánchez L, García S, del Jesus M, Ventura S, Garrell J, Otero J, Romero C, Bacardit J, Rivas V, Fernández J, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Logic Soft Comput 17(2–3):255–287
An A, Cercone N (2000) Rule quality measures improve the accuracy of rule induction: an experimental approach. In: Proceedings of 12th Int Symp Found Intell Syst, ISMIS’00, pp 119–129
Andrews R, Diederich J, Tickle AB (1995) Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl-Based Syst 8(6):373–389
Baesens B, Setiono R, Mues C, Vanthienen J (2003a) Using neural network rule extraction and decision tables for credit-risk evaluation. Manag Sci 49(3):312–329
Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003b) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635
Booker LB, Goldberg DE, Holland JH (1989) Classifier systems and genetic algorithms. Artif Intell 40(1–3):235–282
Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. In: 9th Eur Conf, Artif Intell, ECAI’90, pp 147–149
Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283
Cohen W (1995) Fast effective rule induction. In: Proc 12th Int Conf, Mach Learn, ICML’95, pp 115–123
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B Cybern 26(1):29–41
Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8(1):87–102
Fernández A, García S, Luengo J, Bernadó-Mansilla E, Herrera F (2010) Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Trans Evol Comput 14(6):913–941
Freitas AA (2003) A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh A, Tsutsiu S (eds) Advances in evolutionary computing: theory and applications. Springer-Verlag New York Inc., New York, NY, USA, pp 819–845
Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13(1):3–54
Fürnkranz J, Flach P (2003) An analysis of rule evaluation metrics. In: Proc 20th Int Conf, Mach Learn, ICML’03, pp 202–209
Fürnkranz J, Flach P (2005) ROC ‘n’ rule learning—towards a better understanding of covering algorithms. Mach Learn 58(1):39–77
García S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694
Greene DP, Smith SF (1993) Competition-based induction of decision models from examples. Mach Learn 13(2–3):229–257
Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11:10–18
Hettich S, Bay SD (1996) The uci kdd archive [http://kdd.ics.uci.edu]
Holden N, Freitas A (2005) A hybrid particle swarm/ant colony algorithm for the classification of hierarchical biological data. In: Proc IEEE Swarm Intell Symp, SIS’05, pp 100–107
Holden N, Freitas AA (2008) A hybrid PSO/ACO algorithm for discovering classification rules in data mining. J Artif Evol App 2008:1–12
Holte RC, Acker LE, Porter BW (1989) Concept learning and the problem of small disjuncts. In: Proc 11th Int Joint Conf, Artif Intell, IJCAI’89, pp 813–818
Janssen F, Fürnkranz J (2009) A re-evaluation of the over-searching phenomenon in inductive rule learning. In: Proc SIAM Int Conf Data Min, SDM’09, pp 329–340
Janssen F, Fürnkranz J (2010) On the quest for optimal rule learning heuristics. Mach Learn 78(3):343–379
Kira K, Rendell L (1992) The feature selection problem: traditional methods and a new algorithm. In: Proc 10th Natl Conf, Artif Intell, AAAI’92, pp 129–134
Klösgen W (1992) Problems for knowledge discovery in databases and their treatment in the statistics interpreter EXPLORA. Int J Intell Syst 7(7):649–673
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Eur Conf, Mach Learn, ECML’94, pp 171–182
Liu B, Abbass HA, McKay B (2002a) Density-based heuristic for rule discovery with ant-miner. In: Proc 6th Australasia-Japan Joint Workshop Intell Evol Syst, AJWIS’02, pp 180–184
Liu H, Hussain F, Tan CL, Dash M (2002b) Discretization: an enabling technique. Data Min Knowl Discov 6(4):393–423
Liu B, Abbass H, McKay B (2003) Classification rule discovery with ant colony optimization. In: Proc IEEE/WIC Int Conf Intell Agent Tech, IAT’03, pp 83–88
Lopes HS, Coutinho MS, Lima WC (1997) An evolutionary approach to simulate cognitive feedback learning in medical domain. In: Sanchez E, Shibata T, Zadeh L (eds) Genetic algorithms and fuzzy sogic systems: soft computing perspectives. World Scientific, Singapore, pp 193–207
Martens D, Baesens B, Van Gestel T, Vanthienen J (2007a) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476
Martens D, De Backer M, Haesen R, Vanthienen J, Snoeck M, Baesens B (2007b) Classification with ant colony optimization. IEEE Trans Evol Comput 11(5):651–665
Martens D, Baesens B, Fawcett T (2011) Editorial survey: swarm intelligence for data mining. Mach Learn 82(1):1–42
Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403
Montes de Oca M, Stützle T, Birattari M, Dorigo M (2009) Frankenstein’s PSO: a composite particle swarm optimization algorithm. IEEE Trans Evol Comput 13(5):1120–1132
Otero FEB, Freitas AA, Johnson CG (2009) Handling continuous attributes in ant colony classification algorithms. In: Proc IEEE Symp Comput Intell Data Min, IEEE, CIDM’09, pp 225–231
Pappa G, Freitas A (2009) Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl Inf Syst 19(3):283–309
Parpinelli RS, Lopes HS, Freitas AA (2001) An ant colony based system for data mining: Applications to medical data. In: Proc Genet Evol Comput Conf, GECCO’01, pp 791–797
Parpinelli R, Lopes H, Freitas A (2002) Data mining with an ant colony optimization algorithm. IEEE Trans Evol Comput 6(4):321–332
Pazzani M, Mani S, Shankle W (2001) Acceptance by medical experts of rules generated by machine learning. Methods Inf Med 40(5):380–385
Quinlan JR (1990) Learning logical definitions from relations. Mach Learn 5(3):239–266
Quinlan JR (1993) C4.5 programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA
Salama K, Abdelbar A (2011) Exploring different rule quality evaluation functions in aco-based classification algorithms. In: IEEE Symp Swarm Intell, SIS’11, pp 1–8
Smith SF (1980) A learning system based on genetic adaptive algorithms. PhD thesis, Pittsburgh, PA
Sousa T, Silva A, Neves A (2004) Particle swarm based data mining algorithms for classification tasks. Parallel Comput 30(5–6):767–783
Steuer R (1986) Multiple criteria optimization: Theory, computation and application. Wiley, New York, NY
Stützle T, Holger HH (2000) MAX-MIN ant system. Future Generat Comput Syst 16(9):889–914
Suykens JAK, Van Gestel T, Brabanter JD, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore
Tan PN, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proc 8th ACM SIGKDD Int Conf Knowl Discov Data Min, KDD’02, pp 32–41
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining. Addison Wesley, Boston, MA
Van Gestel T, Suykens J, Baesens B, Viaene S, Vanthienen J, Dedene G, De Moor B, Vandewalle J (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54(1):5–32
van Rijsbergen CJ (1979) Information retrieval. Butterworth-Heinemann, Newton, MA
Venturini G (1993), SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proc Eur Conf, Mach Learn, ECML’93, pp 280–296
Verbeke W, Martens D, Mues C, Baesens B (2011) Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst Appl 38(3):2354–2364
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA
Wong ML (1998) An adaptive knowledge-acquisition system using generic genetic programming. Expert Syst Appl 15(1):47–58
Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proc 1th Eur Symp Princ Data Min Knowl Discov, PKDD’97, pp 78–87
Acknowledgments
This work was carried out using the Stevin Supercomputer Infrastructure at Ghent University. We would like to thank the Flemish Research Council for the financial support (FWO Odysseus grant B.0915.09). We are also grateful to the creators of the open source implementations used in this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz.
Rights and permissions
About this article
Cite this article
Minnaert, B., Martens, D., De Backer, M. et al. To tune or not to tune: rule evaluation for metaheuristic-based sequential covering algorithms. Data Min Knowl Disc 29, 237–272 (2015). https://doi.org/10.1007/s10618-013-0339-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-013-0339-5