Skip to main content
Log in

SAT-based optimal classification trees for non-binary data

  • Published:
Constraints Aims and scope Submit manuscript

Abstract

Decision trees are a popular classification model in machine learning due to their interpretability and performance. Decision-tree classifiers are traditionally constructed using greedy heuristic algorithms that do not provide guarantees regarding the quality of the resultant trees. In contrast, a recent line of work employed exact optimization techniques to construct optimal decision-tree classifiers. However, most of these approaches are designed for datasets with binary features. While numeric and categorical features can be transformed into binary features, this transformation can introduce a large number of binary features and may not be efficient in practice. In this work, we present a SAT-based encoding for decision trees that directly supports non-binary data and use it to solve two well-studied variants of the optimal decision tree problem. Furthermore, we extend our approach to support cost-sensitive learning of optimal decision trees and introduce tree pruning constraints to reduce overfitting. We perform extensive experiments based on real-world and synthetic datasets that show that our approach obtains superior performance to state-of-the-art exact techniques on non-binary datasets and has significantly smaller memory consumption. We also show that our extension for cost-sensitive learning and our tree pruning constraints can help improve the prediction quality on unseen test data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. The paper suggests converting numeric features to categorical features with limited number of categories via thresholding [16], however this conversion does not preserve the optimality of solutions w.r.t. the original numeric values.

  2. Note that since |X| is fixed, maximizing the number of correctly classified training examples is identical to maximizing the accuracy in Definition (6).

  3. Note that the root is by definition a branching node (Definition 1). However, in this procedure we treat it as a leaf node at the start. Since for any non-trivial dataset this procedure is guaranteed to at least iterate once, we are guaranteed to replace the leaf root node with a branching root node.

  4. Some categorical features induce a natural ordering and can therefore be represented as numeric features. For example, a categorical feature with the categories {Low, Medium, High} can be transformed to a numeric feature with the values {1, 2, 3}.

  5. Similar to \(\alpha \), a well-formedness condition on \(\alpha _C\) would dictate that \(\forall t\in \Pi ^C: \alpha _C(t)\subseteq dom(\beta (t))\).

  6. Note that we could add clauses that guarantee that at least one category will go to the right, however it does not provide significant pruning and we therefore opted not to add them.

  7. Note that we can similarly replace the degenerate node with its left child, however the resultant tree will not be a complete tree (Definition 3).

  8. Code obtained from https://gepgitlab.laas.fr/hhu/maxsat-decision-trees.

  9. Code obtained from https://bitbucket.org/helene_verhaeghe/classificationtree.

  10. Code obtained from https://github.com/aglingael/dl8.5.

  11. Code obtained from https://github.com/FlorentAvellaneda/InferDT.

  12. In our encoding binary features are numeric features that take one of two possible values, however we list them separately in Table 1 as they are supported by the baseline methods without transformation.

  13. Note that all approaches explore the same space of (feasible and) optimal decision-tree solutions.

References

  1. Aghaei, S., Azizi, M.J., & Vayanos, P. (2019). Learning optimal and fair decision trees for non-discriminative decision-making. In: AAAI Conference on artificial intelligence (AAAI) (pp. 1418–1426)

  2. Aglin, G., Nijssen, S., & Schaus, P. (2020). Learning optimal decision trees using caching branch-and-bound search. In: AAAI Conference on Artificial Intelligence (AAAI) (pp. 3146–3153)

  3. Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., & Rudin, C. (2018). Learning certifiably optimal rule lists for categorical data. Journal of Machine Learning Research, 18, 1–78.

    MATH  Google Scholar 

  4. Avellaneda, F. (2020). Efficient inference of optimal decision trees. In: AAAI Conference on artificial intelligence (AAAI) (pp. 3195–3202)

  5. Bennett, K. P. (1994). Global tree optimization: A non-greedy decision tree algorithm. Journal of Computing Science and Statistics, 26, 156–160.

    Google Scholar 

  6. Berg, J., Demirović, E., & Stuckey, P.J. (2019). Core-boosted linear search for incomplete MaxSAT. In: International conference on integration of constraint programming, artificial intelligence, and operations research (CPAIOR) (pp. 39–56). Springer

  7. Bertsimas, D., & Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7), 1039–1082.

    Article  MathSciNet  MATH  Google Scholar 

  8. Bessiere, C., Hebrard, E., & O’Sullivan, B. (2009). Minimising decision tree size as combinatorial optimisation. In: International conference on principles and practice of constraint programming (CP) (pp. 173–187). Springer

  9. Biere, A., Heule, M., & van Maaren, H. (2009). Handbook of satisfiability, vol. 185. IOS press

  10. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software

  11. Cabodi, G., Camurati, P.E., Ignatiev, A., Marques-Silva, J., Palena, M., & Pasini, P. (2021). Optimizing binary decision diagrams for interpretable machine learning classification. In: 2021 Design, automation & test in europe conference & exhibition (DATE) (pp. 1122–1125). IEEE

  12. Dechter, R., & Mateescu, R. (2004). The impact of and/or search spaces on constraint satisfaction and counting. In: International conference on principles and practice of constraint programming (CP) (pp. 731–736)

  13. Dua, D., & Graff, C. (2017). UCI machine learning repository. http://archive.ics.uci.edu/ml

  14. Eén, N., & Sörensson, N. (2003). Minisat SAT solver. http://minisat.se/Main.html

  15. Fu, Z., & Malik, S. (2006). On solving the partial MAX-SAT problem. In: International conference on theory and applications of satisfiability testing (SAT) (pp. 252–265). Springer

  16. Günlük, O., Kalagnanam, J., Li, M., Menickelly, M., & Scheinberg, K. (2021). Optimal decision trees for categorical data via integer programming. Journal of Global Optimization, 1–28

  17. Guyon, I. (2003). Design of experiments of the nips 2003 variable selection benchmark. In: NIPS 2003 workshop on feature extraction and feature selection, vol. 253

  18. Guyon, I., Bennett, K., Cawley, G., Escalante, H.J., Escalera, S., Ho, T.K., Macià, N., Ray, B., Saeed, M., & Statnikov, A., et al. (2015). Design of the 2015 chalearn automl challenge. In: 2015 International joint conference on neural networks (IJCNN) (pp. 1–8). IEEE

  19. Hancock, T., Jiang, T., Li, M., & Tromp, J. (1996). Lower bounds on learning decision lists and trees. Information and Computation, 126(2), 114–122.

    Article  MathSciNet  MATH  Google Scholar 

  20. Hastie, T., Tibshirani, R., Friedman, J.H., & Friedman, J.H. (2009). The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer

  21. Hu, H., Siala, M., Hébrard, E., & Huguet, M.J. (2020). Learning optimal decision trees with MaxSAT and its integration in AdaBoost. In: International joint conference on artificial intelligence and pacific rim international conference on artificial intelligence (IJCAI-PRICAI)

  22. Ignatiev, A., Lam, E., Stuckey, P.J., & Marques-Silva, J. (2021). A scalable two stage approach to computing optimal decision sets. arXiv preprint arXiv:2102.01904

  23. Ignatiev, A., Marques-Silva, J., Narodytska, N., & Stuckey, P.J. (2021). Reasoning-based learning of interpretable ML models. In: International Joint Conference on Artificial Intelligence (IJCAI) p. in press

  24. Ignatiev, A., Pereira, F., Narodytska, N., & Marques-Silva, J. (2018). A sat-based approach to learn explainable decision sets. In: International joint conference on automated reasoning (pp. 627–645). Springer

  25. Janota, M., & Morgado, A. (2020). Sat-based encodings for optimal decision trees with explicit paths. In: International conference on theory and applications of satisfiability testing (pp. 501–518). Springer

  26. Kelleher, J.D., Mac Namee, B., & D’arcy, A. (2020). Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT press

  27. Kotsiantis, S. B. (2013). Decision trees: a recent overview. Artificial Intelligence Review, 39(4), 261–283.

    Article  Google Scholar 

  28. Laurent, H., & Rivest, R. L. (1976). Constructing optimal binary decision trees is np-complete. Information Processing Letters, 5(1), 15–17.

    Article  MathSciNet  MATH  Google Scholar 

  29. Maloof, M.A. (2003). Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced data sets II, (vol. 2, pp. 2–1)

  30. Mosley, L. (2013). A balanced approach to the multi-class imbalance problem. Ph.D. thesis, Iowa State University

  31. Narodytska, N., Ignatiev, A., Pereira, F., Marques-Silva, J., & RAS, I. (2018). Learning optimal decision trees with SAT. In: International joint conference on artificial intelligence (IJCAI) (pp. 1362–1368)

  32. Nijssen, S., & Fromont, E. (2007). Mining optimal decision trees from itemset lattices. In: SIGKDD International conference on knowledge discovery and data mining (KDD) (pp. 530–539)

  33. Nijssen, S., & Fromont, E. (2010). Optimal constraint-based decision tree induction from itemset lattices. Data Mining and Knowledge Discovery, 21(1), 9–51.

    Article  MathSciNet  Google Scholar 

  34. OscaR Team (2012). OscaR: Scala in OR . https://bitbucket.org/oscarlib/oscar

  35. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    MathSciNet  MATH  Google Scholar 

  36. Potdar, K., Pardawala, T. S., & Pai, C. D. (2017). A comparative study of categorical variable encoding techniques for neural network classifiers. International Journal Of Computer Applications, 175(4), 7–9.

    Article  Google Scholar 

  37. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

    Article  Google Scholar 

  38. Quinlan, J.R. (2014). C4. 5: programs for machine learning. Elsevier

  39. Rudin, C., & Ertekin, Ş. (2018). Learning customized and optimized lists of rules with mathematical programming. Mathematical Programming Computation, 10(4), 659–702.

    Article  MathSciNet  MATH  Google Scholar 

  40. Schaus, P., Aoga, J.O., & Guns, T. (2017). Coversize: A global constraint for frequency-based itemset mining. In: International conference on principles and practice of constraint programming (CP) (pp. 529–546). Springer

  41. Shati, P., Cohen, E., & McIlraith, S. (2021). Sat-based approach for learning optimal decision trees with non-binary features. In: 27th International Conference on Principles and Practice of Constraint Programming (CP 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik

  42. Sinz, C. (2005). Towards an optimal cnf encoding of boolean cardinality constraints. In: International conference on principles and practice of constraint programming (pp. 827–831). Springer

  43. Verhaeghe, H., Nijssen, S., Pesant, G., Quimper, C. G., & Schaus, P. (2020). Learning optimal decision trees using constraint programming. Constraints, 25(3), 226–250.

    Article  MathSciNet  MATH  Google Scholar 

  44. Verwer, S., & Zhang, Y. (2019). Learning optimal classification trees using a binary linear program formulation. In: AAAI Conference on artificial intelligence (AAAI), (pp. 1625–1632)

  45. Weiss, G. M. (2004). Mining with rarity: a unifying framework. ACM Sigkdd Explorations Newsletter, 6(1), 7–19.

    Article  Google Scholar 

  46. Yu, J., Ignatiev, A., Bodic, P.L., & Stuckey, P.J. (2020). Optimal decision lists using sat. arXiv preprint. arXiv:2010.09919

  47. Yu, J., Ignatiev, A., Stuckey, P.J., & Le Bodic, P. (2020). Computing optimal decision sets with sat. In: International conference on principles and practice of constraint programming (CP) (pp. 952–970). Springer

  48. Yu, J., Ignatiev, A., Stuckey, P. J., & Le Bodic, P. (2021). Learning optimal decision sets and lists with sat. Journal of Artificial Intelligence Research, 72, 1251–1279.

    Article  MathSciNet  MATH  Google Scholar 

  49. Zhou, Z. H., & Liu, X. Y. (2005). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge And Data Engineering, 18(1), 63–77.

    Article  MathSciNet  Google Scholar 

Download references

Funding

The authors gratefully acknowledge funding from NSERC, the CIFAR AI Chairs program (Vector Institute), and from Microsoft Research

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eldan Cohen.

Ethics declarations

Conflicts of interests/Competing interests

The authors have no relevant conflicts of interest/competing interests to declare

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shati, P., Cohen, E. & McIlraith, S.A. SAT-based optimal classification trees for non-binary data. Constraints 28, 166–202 (2023). https://doi.org/10.1007/s10601-023-09348-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10601-023-09348-1

Keywords

Navigation