Abstract
Induction of decision rules plays an important role in machine learning. The main advantage of decision rules is their simplicity and human-interpretable form. Moreover, they are capable of modeling complex interactions between attributes. In this paper, we thoroughly analyze a learning algorithm, called ENDER, which constructs an ensemble of decision rules. This algorithm is tailored for regression and binary classification problems. It uses the boosting approach for learning, which can be treated as generalization of sequential covering. Each new rule is fitted by focusing on examples which were the hardest to classify correctly by the rules already present in the ensemble. We consider different loss functions and minimization techniques often encountered in the boosting framework. The minimization techniques are used to derive impurity measures which control construction of single decision rules. Properties of four different impurity measures are analyzed with respect to the trade-off between misclassification (discrimination) and coverage (completeness) of the rule. Moreover, we consider regularization consisting of shrinking and sampling. Finally, we compare the ENDER algorithm with other well-known decision rule learners such as SLIPPER, LRI and RuleFit.
Similar content being viewed by others
References
Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bazan JG (1998) Discovery of decision rules by matching new objects against data tables. In: Polkowski L, Skowron A (eds) Rough sets and current trends in computing, volume 1424 of Lecture notes in artificial intelligence. Springer, Warsaw, pp 521–528
Błaszczyński J, Dembczyński K, Kotłowski W, Słowiński R, Szeląg M (2006) Ensembles of decision rules. Found Comput Decis Sci 31(3–4): 221–232
Boros E, Hammer PL, Ibaraki T, Kogan A, Mayoraz E, Muchnik I (2000) An implementation of logical analysis of data. IEEE Trans Knowl Data Eng 12: 292–306
Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140
Brzezińska I, Greco S, SŁowiński R (2007) Mining Pareto-optimal rules with respect to support and confirmation or support and anti-support. Eng Appl Artif Intell 20(5): 587–600
Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3: 261–283
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference of machine learning (ICML 1995). Morgan Kaufmann, Tahoe City, pp 115–123
Cohen WW, Singer Y (1999) A simple, fast, and effective rule learner. In: Proceedings of the sixteenth national conference on artificial intelligence. AAAI Press/The MIT Press, Orlando, pp 335–342
Dembczyński K, KotŁowski W, SŁowiński R (2008a) Maximum likelihood rule ensembles. In: Proceedings of the twenty-fifth international conference on machine learning (ICML 2008). Omnipress, Helsinki, pp 224–231
Dembczyński K, KotŁowski W, SŁowiński R (2008b) Solving regression by learning an ensemble of decision rules. In: Rutkowski L, Tadeusiewicz R, Zadeh LA, Zurada JM (eds) Artificial intelligence and soft computing, volume 5097 of Lecture notes in artificial intelligence. Springer, Zakopane, pp 533–544
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2): 139–158
Domingos P (1996) Unifying instance-based and rule-based induction. Mach Learn 24(2): 141–168
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1): 119–139
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5): 1189–1232
Friedman JH, Popescu BE (2003) Importance sampled learning ensembles. Technical report, Department of Statistics, Stanford University
Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3): 916–954
Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28(2): 337–407
Fürnkranz J (1996) Separate-and-conquer rule learning. Artif Intell Rev 13(1): 3–54
Góra G, Wojna A (2002a) Local attribute value grouping for lazy rule induction. In: Peters JF, Skowron A, Zhong N (eds) Rough sets and current trends in computing, volume 2475 of Lecture notes in artificial intelligence. Springer, Malvern, pp 405–412
Góra G, Wojna A (2002b) A new classification system combining rule induction and instance-based learning. Fundam Inform 54(4): 369–390
Greco S, Matarazzo B, SŁowiński R, Stefanowski J (2000) An algorithm for induction of decision rules consistent with the dominance principle. In: Ziarko W, Yao Y (eds) Rough sets and current trends in computing, volume 2005 of Lecture notes in artificial intelligence. Springer, Banff, pp 304–313
Greco S, Matarazzo B, SŁowiński R (2001) Rough sets theory for multicriteria decision analysis. Eur J Oper Res 129: 1–47
Greco S, Pawlak Z, SŁowiński R (2004) Can Bayesian confirmation measures be useful for rough set decision rules. Eng Appl Artif Intell 17(4): 345–361
Grzymala-Busse JW (1992) LERS—a system for learning from examples based on rough sets. In: SŁowiński R (eds) Intelligent decision support, handbook of applications and advances of the rough sets theory. Kluwer, Dordrecht, pp 3–18
Hastie T, Tibshirani R, Friedman JH (2003) Elements of statistical learning: data mining, inference, and prediction. Springer, New York
Hilderman RJ, Hamilton HJ (2001) Knowledge discovery and measures of interest. Kluwer, Boston
Janssen F, Fürnkranz J (2008) An empirical investigation of the trade-off between consistency and coverage in rule learning heuristics. In: Boulicaut J-F, Berthold MR, Horváth T (eds) Discovery science, volume 5255 of Lecture notes in artificial intelligence. Springer, Budapest, pp 40–51
Jovanoski V, Lavrac N (2001) Classification rule learning with APRIORI-C. In: Brazdil P, Jorge A (eds) Progress in artificial intelligence, volume 2258 of Lecture notes in artificial intelligence. Springer, Berlin, pp 111–135
Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge
Knobbe A, Crémilleux B, Fürnkranz J, Scholz M (2008) From local patterns to global models: the LeGo approach to data mining. In: Fürnkranz J, Knobbe A (eds) Proceedings of the ECML/PKDD 2008 workshop “From local patterns to global models”, Antwerp, Belgium
Koltchinskii V, Panchenko D (2006) Complexities of convex combinations and bounding the generalization error in classification. Ann Stat 33(4): 1455–1496
Marchand M, Shawe-Taylor J (2002) The set covering machine. J Mach Learn Res 3: 723–746
Mason L, Baxter J, Bartlett P, Frean M (1999) Functional gradient techniques for combining hypotheses. In: Bartlett P, Schölkopf B, Schuurmans D, Smola AJ (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 33–58
Michalski RS (1983) A theory and methodology of inductive learning. In: Michalski RS, Carbonell JG, Mitchell TM (eds) Machine learning: an artificial intelligence approach. Tioga Publishing, Palo Alto, PP 83–129
Pawlak Z (1991) Rough sets. Theoretical aspects of reasoning about data. Kluwer, Dordrecht
Rückert U, Kramer S (2008) Margin-based first-order rule learning. Mach Learn 70(2–3): 189–206
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3): 297–336
Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5): 1651–1686
Skowron A (1995) Extracting laws from decision tables—a rough set approach. Comput Intell 11: 371–388
SŁowiński, R (eds) (1992) Intelligent decision support. Handbook of applications and advances of the rough set theory. Kluwer, Dordrecht
Stefanowski J (1998) On rough set based approach to induction of decision rules. In: Skowron A, Polkowski L (eds) Rough set in knowledge discovering. Physica Verlag, Heidelberg, pp 500–529
Stefanowski J, Vanderpooten D (2001) Induction of decision rules in classification and discovery-oriented perspectives. Int J Intell Syst 16(1): 13–27
Weiss SM, Indurkhya N (2000) Lightweight rule induction. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000). Morgan Kaufmann, Stanford, pp 1135–1142
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz and Arno Knobbe.
Rights and permissions
About this article
Cite this article
Dembczyński, K., Kotłowski, W. & Słowiński, R. ENDER: a statistical framework for boosting decision rules. Data Min Knowl Disc 21, 52–90 (2010). https://doi.org/10.1007/s10618-010-0177-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0177-7