Skip to main content

Advertisement

Log in

ENDER: a statistical framework for boosting decision rules

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Induction of decision rules plays an important role in machine learning. The main advantage of decision rules is their simplicity and human-interpretable form. Moreover, they are capable of modeling complex interactions between attributes. In this paper, we thoroughly analyze a learning algorithm, called ENDER, which constructs an ensemble of decision rules. This algorithm is tailored for regression and binary classification problems. It uses the boosting approach for learning, which can be treated as generalization of sequential covering. Each new rule is fitted by focusing on examples which were the hardest to classify correctly by the rules already present in the ensemble. We consider different loss functions and minimization techniques often encountered in the boosting framework. The minimization techniques are used to derive impurity measures which control construction of single decision rules. Properties of four different impurity measures are analyzed with respect to the trade-off between misclassification (discrimination) and coverage (completeness) of the rule. Moreover, we consider regularization consisting of shrinking and sampling. Finally, we compare the ENDER algorithm with other well-known decision rule learners such as SLIPPER, LRI and RuleFit.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Bazan JG (1998) Discovery of decision rules by matching new objects against data tables. In: Polkowski L, Skowron A (eds) Rough sets and current trends in computing, volume 1424 of Lecture notes in artificial intelligence. Springer, Warsaw, pp 521–528

    Chapter  Google Scholar 

  • Błaszczyński J, Dembczyński K, Kotłowski W, Słowiński R, Szeląg M (2006) Ensembles of decision rules. Found Comput Decis Sci 31(3–4): 221–232

    Google Scholar 

  • Boros E, Hammer PL, Ibaraki T, Kogan A, Mayoraz E, Muchnik I (2000) An implementation of logical analysis of data. IEEE Trans Knowl Data Eng 12: 292–306

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140

    MATH  MathSciNet  Google Scholar 

  • Brzezińska I, Greco S, SŁowiński R (2007) Mining Pareto-optimal rules with respect to support and confirmation or support and anti-support. Eng Appl Artif Intell 20(5): 587–600

    Article  Google Scholar 

  • Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3: 261–283

    Google Scholar 

  • Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference of machine learning (ICML 1995). Morgan Kaufmann, Tahoe City, pp 115–123

  • Cohen WW, Singer Y (1999) A simple, fast, and effective rule learner. In: Proceedings of the sixteenth national conference on artificial intelligence. AAAI Press/The MIT Press, Orlando, pp 335–342

  • Dembczyński K, KotŁowski W, SŁowiński R (2008a) Maximum likelihood rule ensembles. In: Proceedings of the twenty-fifth international conference on machine learning (ICML 2008). Omnipress, Helsinki, pp 224–231

  • Dembczyński K, KotŁowski W, SŁowiński R (2008b) Solving regression by learning an ensemble of decision rules. In: Rutkowski L, Tadeusiewicz R, Zadeh LA, Zurada JM (eds) Artificial intelligence and soft computing, volume 5097 of Lecture notes in artificial intelligence. Springer, Zakopane, pp 533–544

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30

    MathSciNet  Google Scholar 

  • Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2): 139–158

    Article  Google Scholar 

  • Domingos P (1996) Unifying instance-based and rule-based induction. Mach Learn 24(2): 141–168

    MathSciNet  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1): 119–139

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5): 1189–1232

    Article  MATH  Google Scholar 

  • Friedman JH, Popescu BE (2003) Importance sampled learning ensembles. Technical report, Department of Statistics, Stanford University

  • Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3): 916–954

    Article  MATH  Google Scholar 

  • Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28(2): 337–407

    Article  MATH  MathSciNet  Google Scholar 

  • Fürnkranz J (1996) Separate-and-conquer rule learning. Artif Intell Rev 13(1): 3–54

    Article  Google Scholar 

  • Góra G, Wojna A (2002a) Local attribute value grouping for lazy rule induction. In: Peters JF, Skowron A, Zhong N (eds) Rough sets and current trends in computing, volume 2475 of Lecture notes in artificial intelligence. Springer, Malvern, pp 405–412

    Chapter  Google Scholar 

  • Góra G, Wojna A (2002b) A new classification system combining rule induction and instance-based learning. Fundam Inform 54(4): 369–390

    Google Scholar 

  • Greco S, Matarazzo B, SŁowiński R, Stefanowski J (2000) An algorithm for induction of decision rules consistent with the dominance principle. In: Ziarko W, Yao Y (eds) Rough sets and current trends in computing, volume 2005 of Lecture notes in artificial intelligence. Springer, Banff, pp 304–313

    Google Scholar 

  • Greco S, Matarazzo B, SŁowiński R (2001) Rough sets theory for multicriteria decision analysis. Eur J Oper Res 129: 1–47

    Article  MATH  Google Scholar 

  • Greco S, Pawlak Z, SŁowiński R (2004) Can Bayesian confirmation measures be useful for rough set decision rules. Eng Appl Artif Intell 17(4): 345–361

    Article  Google Scholar 

  • Grzymala-Busse JW (1992) LERS—a system for learning from examples based on rough sets. In: SŁowiński R (eds) Intelligent decision support, handbook of applications and advances of the rough sets theory. Kluwer, Dordrecht, pp 3–18

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman JH (2003) Elements of statistical learning: data mining, inference, and prediction. Springer, New York

    Google Scholar 

  • Hilderman RJ, Hamilton HJ (2001) Knowledge discovery and measures of interest. Kluwer, Boston

    MATH  Google Scholar 

  • Janssen F, Fürnkranz J (2008) An empirical investigation of the trade-off between consistency and coverage in rule learning heuristics. In: Boulicaut J-F, Berthold MR, Horváth T (eds) Discovery science, volume 5255 of Lecture notes in artificial intelligence. Springer, Budapest, pp 40–51

    Google Scholar 

  • Jovanoski V, Lavrac N (2001) Classification rule learning with APRIORI-C. In: Brazdil P, Jorge A (eds) Progress in artificial intelligence, volume 2258 of Lecture notes in artificial intelligence. Springer, Berlin, pp 111–135

    Google Scholar 

  • Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge

    Google Scholar 

  • Knobbe A, Crémilleux B, Fürnkranz J, Scholz M (2008) From local patterns to global models: the LeGo approach to data mining. In: Fürnkranz J, Knobbe A (eds) Proceedings of the ECML/PKDD 2008 workshop “From local patterns to global models”, Antwerp, Belgium

  • Koltchinskii V, Panchenko D (2006) Complexities of convex combinations and bounding the generalization error in classification. Ann Stat 33(4): 1455–1496

    Article  MathSciNet  Google Scholar 

  • Marchand M, Shawe-Taylor J (2002) The set covering machine. J Mach Learn Res 3: 723–746

    Article  MathSciNet  Google Scholar 

  • Mason L, Baxter J, Bartlett P, Frean M (1999) Functional gradient techniques for combining hypotheses. In: Bartlett P, Schölkopf B, Schuurmans D, Smola AJ (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 33–58

    Google Scholar 

  • Michalski RS (1983) A theory and methodology of inductive learning. In: Michalski RS, Carbonell JG, Mitchell TM (eds) Machine learning: an artificial intelligence approach. Tioga Publishing, Palo Alto, PP 83–129

  • Pawlak Z (1991) Rough sets. Theoretical aspects of reasoning about data. Kluwer, Dordrecht

    MATH  Google Scholar 

  • Rückert U, Kramer S (2008) Margin-based first-order rule learning. Mach Learn 70(2–3): 189–206

    Article  Google Scholar 

  • Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3): 297–336

    Article  MATH  Google Scholar 

  • Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5): 1651–1686

    Article  MATH  MathSciNet  Google Scholar 

  • Skowron A (1995) Extracting laws from decision tables—a rough set approach. Comput Intell 11: 371–388

    Article  MathSciNet  Google Scholar 

  • SŁowiński, R (eds) (1992) Intelligent decision support. Handbook of applications and advances of the rough set theory. Kluwer, Dordrecht

    Google Scholar 

  • Stefanowski J (1998) On rough set based approach to induction of decision rules. In: Skowron A, Polkowski L (eds) Rough set in knowledge discovering. Physica Verlag, Heidelberg, pp 500–529

    Google Scholar 

  • Stefanowski J, Vanderpooten D (2001) Induction of decision rules in classification and discovery-oriented perspectives. Int J Intell Syst 16(1): 13–27

    Article  MATH  Google Scholar 

  • Weiss SM, Indurkhya N (2000) Lightweight rule induction. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000). Morgan Kaufmann, Stanford, pp 1135–1142

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Dembczyński.

Additional information

Responsible editor: Johannes Fürnkranz and Arno Knobbe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dembczyński, K., Kotłowski, W. & Słowiński, R. ENDER: a statistical framework for boosting decision rules. Data Min Knowl Disc 21, 52–90 (2010). https://doi.org/10.1007/s10618-010-0177-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-010-0177-7

Keywords

Navigation