SAT-based optimal classification trees for non-binary data

Shati, Pouya; Cohen, Eldan; McIlraith, Sheila A.

doi:10.1007/s10601-023-09348-1

SAT-based optimal classification trees for non-binary data

Published: 08 July 2023

Volume 28, pages 166–202, (2023)
Cite this article

Constraints Aims and scope Submit manuscript

241 Accesses
2 Citations
Explore all metrics

Abstract

Decision trees are a popular classification model in machine learning due to their interpretability and performance. Decision-tree classifiers are traditionally constructed using greedy heuristic algorithms that do not provide guarantees regarding the quality of the resultant trees. In contrast, a recent line of work employed exact optimization techniques to construct optimal decision-tree classifiers. However, most of these approaches are designed for datasets with binary features. While numeric and categorical features can be transformed into binary features, this transformation can introduce a large number of binary features and may not be efficient in practice. In this work, we present a SAT-based encoding for decision trees that directly supports non-binary data and use it to solve two well-studied variants of the optimal decision tree problem. Furthermore, we extend our approach to support cost-sensitive learning of optimal decision trees and introduce tree pruning constraints to reduce overfitting. We perform extensive experiments based on real-world and synthetic datasets that show that our approach obtains superior performance to state-of-the-art exact techniques on non-binary datasets and has significantly smaller memory consumption. We also show that our extension for cost-sensitive learning and our tree pruning constraints can help improve the prediction quality on unseen test data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Optimal decision trees for categorical data via integer programming

Article 24 March 2021

Learning Decision Trees with Flexible Constraints and Objectives Using Integer Optimization

Accurate Decision Tree with Cost Constraints

Notes

The paper suggests converting numeric features to categorical features with limited number of categories via thresholding [16], however this conversion does not preserve the optimality of solutions w.r.t. the original numeric values.
Note that since |X| is fixed, maximizing the number of correctly classified training examples is identical to maximizing the accuracy in Definition (6).
Note that the root is by definition a branching node (Definition 1). However, in this procedure we treat it as a leaf node at the start. Since for any non-trivial dataset this procedure is guaranteed to at least iterate once, we are guaranteed to replace the leaf root node with a branching root node.
Some categorical features induce a natural ordering and can therefore be represented as numeric features. For example, a categorical feature with the categories {Low, Medium, High} can be transformed to a numeric feature with the values {1, 2, 3}.
Similar to \(\alpha \), a well-formedness condition on \(\alpha _C\) would dictate that \(\forall t\in \Pi ^C: \alpha _C(t)\subseteq dom(\beta (t))\).
Note that we could add clauses that guarantee that at least one category will go to the right, however it does not provide significant pruning and we therefore opted not to add them.
Note that we can similarly replace the degenerate node with its left child, however the resultant tree will not be a complete tree (Definition 3).
Code obtained from https://gepgitlab.laas.fr/hhu/maxsat-decision-trees.
Code obtained from https://bitbucket.org/helene_verhaeghe/classificationtree.
Code obtained from https://github.com/aglingael/dl8.5.
Code obtained from https://github.com/FlorentAvellaneda/InferDT.
In our encoding binary features are numeric features that take one of two possible values, however we list them separately in Table 1 as they are supported by the baseline methods without transformation.
Note that all approaches explore the same space of (feasible and) optimal decision-tree solutions.

References

Aghaei, S., Azizi, M.J., & Vayanos, P. (2019). Learning optimal and fair decision trees for non-discriminative decision-making. In: AAAI Conference on artificial intelligence (AAAI) (pp. 1418–1426)
Aglin, G., Nijssen, S., & Schaus, P. (2020). Learning optimal decision trees using caching branch-and-bound search. In: AAAI Conference on Artificial Intelligence (AAAI) (pp. 3146–3153)
Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., & Rudin, C. (2018). Learning certifiably optimal rule lists for categorical data. Journal of Machine Learning Research, 18, 1–78.
MATH Google Scholar
Avellaneda, F. (2020). Efficient inference of optimal decision trees. In: AAAI Conference on artificial intelligence (AAAI) (pp. 3195–3202)
Bennett, K. P. (1994). Global tree optimization: A non-greedy decision tree algorithm. Journal of Computing Science and Statistics, 26, 156–160.
Google Scholar
Berg, J., Demirović, E., & Stuckey, P.J. (2019). Core-boosted linear search for incomplete MaxSAT. In: International conference on integration of constraint programming, artificial intelligence, and operations research (CPAIOR) (pp. 39–56). Springer
Bertsimas, D., & Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7), 1039–1082.
Article MathSciNet MATH Google Scholar
Bessiere, C., Hebrard, E., & O’Sullivan, B. (2009). Minimising decision tree size as combinatorial optimisation. In: International conference on principles and practice of constraint programming (CP) (pp. 173–187). Springer
Biere, A., Heule, M., & van Maaren, H. (2009). Handbook of satisfiability, vol. 185. IOS press
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software
Cabodi, G., Camurati, P.E., Ignatiev, A., Marques-Silva, J., Palena, M., & Pasini, P. (2021). Optimizing binary decision diagrams for interpretable machine learning classification. In: 2021 Design, automation & test in europe conference & exhibition (DATE) (pp. 1122–1125). IEEE
Dechter, R., & Mateescu, R. (2004). The impact of and/or search spaces on constraint satisfaction and counting. In: International conference on principles and practice of constraint programming (CP) (pp. 731–736)
Dua, D., & Graff, C. (2017). UCI machine learning repository. http://archive.ics.uci.edu/ml
Eén, N., & Sörensson, N. (2003). Minisat SAT solver. http://minisat.se/Main.html
Fu, Z., & Malik, S. (2006). On solving the partial MAX-SAT problem. In: International conference on theory and applications of satisfiability testing (SAT) (pp. 252–265). Springer
Günlük, O., Kalagnanam, J., Li, M., Menickelly, M., & Scheinberg, K. (2021). Optimal decision trees for categorical data via integer programming. Journal of Global Optimization, 1–28
Guyon, I. (2003). Design of experiments of the nips 2003 variable selection benchmark. In: NIPS 2003 workshop on feature extraction and feature selection, vol. 253
Guyon, I., Bennett, K., Cawley, G., Escalante, H.J., Escalera, S., Ho, T.K., Macià, N., Ray, B., Saeed, M., & Statnikov, A., et al. (2015). Design of the 2015 chalearn automl challenge. In: 2015 International joint conference on neural networks (IJCNN) (pp. 1–8). IEEE
Hancock, T., Jiang, T., Li, M., & Tromp, J. (1996). Lower bounds on learning decision lists and trees. Information and Computation, 126(2), 114–122.
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H., & Friedman, J.H. (2009). The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer
Hu, H., Siala, M., Hébrard, E., & Huguet, M.J. (2020). Learning optimal decision trees with MaxSAT and its integration in AdaBoost. In: International joint conference on artificial intelligence and pacific rim international conference on artificial intelligence (IJCAI-PRICAI)
Ignatiev, A., Lam, E., Stuckey, P.J., & Marques-Silva, J. (2021). A scalable two stage approach to computing optimal decision sets. arXiv preprint arXiv:2102.01904
Ignatiev, A., Marques-Silva, J., Narodytska, N., & Stuckey, P.J. (2021). Reasoning-based learning of interpretable ML models. In: International Joint Conference on Artificial Intelligence (IJCAI) p. in press
Ignatiev, A., Pereira, F., Narodytska, N., & Marques-Silva, J. (2018). A sat-based approach to learn explainable decision sets. In: International joint conference on automated reasoning (pp. 627–645). Springer
Janota, M., & Morgado, A. (2020). Sat-based encodings for optimal decision trees with explicit paths. In: International conference on theory and applications of satisfiability testing (pp. 501–518). Springer
Kelleher, J.D., Mac Namee, B., & D’arcy, A. (2020). Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT press
Kotsiantis, S. B. (2013). Decision trees: a recent overview. Artificial Intelligence Review, 39(4), 261–283.
Article Google Scholar
Laurent, H., & Rivest, R. L. (1976). Constructing optimal binary decision trees is np-complete. Information Processing Letters, 5(1), 15–17.
Article MathSciNet MATH Google Scholar
Maloof, M.A. (2003). Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced data sets II, (vol. 2, pp. 2–1)
Mosley, L. (2013). A balanced approach to the multi-class imbalance problem. Ph.D. thesis, Iowa State University
Narodytska, N., Ignatiev, A., Pereira, F., Marques-Silva, J., & RAS, I. (2018). Learning optimal decision trees with SAT. In: International joint conference on artificial intelligence (IJCAI) (pp. 1362–1368)
Nijssen, S., & Fromont, E. (2007). Mining optimal decision trees from itemset lattices. In: SIGKDD International conference on knowledge discovery and data mining (KDD) (pp. 530–539)
Nijssen, S., & Fromont, E. (2010). Optimal constraint-based decision tree induction from itemset lattices. Data Mining and Knowledge Discovery, 21(1), 9–51.
Article MathSciNet Google Scholar
OscaR Team (2012). OscaR: Scala in OR . https://bitbucket.org/oscarlib/oscar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
MathSciNet MATH Google Scholar
Potdar, K., Pardawala, T. S., & Pai, C. D. (2017). A comparative study of categorical variable encoding techniques for neural network classifiers. International Journal Of Computer Applications, 175(4), 7–9.
Article Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Article Google Scholar
Quinlan, J.R. (2014). C4. 5: programs for machine learning. Elsevier
Rudin, C., & Ertekin, Ş. (2018). Learning customized and optimized lists of rules with mathematical programming. Mathematical Programming Computation, 10(4), 659–702.
Article MathSciNet MATH Google Scholar
Schaus, P., Aoga, J.O., & Guns, T. (2017). Coversize: A global constraint for frequency-based itemset mining. In: International conference on principles and practice of constraint programming (CP) (pp. 529–546). Springer
Shati, P., Cohen, E., & McIlraith, S. (2021). Sat-based approach for learning optimal decision trees with non-binary features. In: 27th International Conference on Principles and Practice of Constraint Programming (CP 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik
Sinz, C. (2005). Towards an optimal cnf encoding of boolean cardinality constraints. In: International conference on principles and practice of constraint programming (pp. 827–831). Springer
Verhaeghe, H., Nijssen, S., Pesant, G., Quimper, C. G., & Schaus, P. (2020). Learning optimal decision trees using constraint programming. Constraints, 25(3), 226–250.
Article MathSciNet MATH Google Scholar
Verwer, S., & Zhang, Y. (2019). Learning optimal classification trees using a binary linear program formulation. In: AAAI Conference on artificial intelligence (AAAI), (pp. 1625–1632)
Weiss, G. M. (2004). Mining with rarity: a unifying framework. ACM Sigkdd Explorations Newsletter, 6(1), 7–19.
Article Google Scholar
Yu, J., Ignatiev, A., Bodic, P.L., & Stuckey, P.J. (2020). Optimal decision lists using sat. arXiv preprint. arXiv:2010.09919
Yu, J., Ignatiev, A., Stuckey, P.J., & Le Bodic, P. (2020). Computing optimal decision sets with sat. In: International conference on principles and practice of constraint programming (CP) (pp. 952–970). Springer
Yu, J., Ignatiev, A., Stuckey, P. J., & Le Bodic, P. (2021). Learning optimal decision sets and lists with sat. Journal of Artificial Intelligence Research, 72, 1251–1279.
Article MathSciNet MATH Google Scholar
Zhou, Z. H., & Liu, X. Y. (2005). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge And Data Engineering, 18(1), 63–77.
Article MathSciNet Google Scholar

Download references

Funding

The authors gratefully acknowledge funding from NSERC, the CIFAR AI Chairs program (Vector Institute), and from Microsoft Research

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, ON, Canada
Pouya Shati & Sheila A. McIlraith
Vector Institute for Artificial Intelligence, Toronto, ON, Canada
Pouya Shati & Sheila A. McIlraith
Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada
Eldan Cohen

Authors

Pouya Shati
View author publications
You can also search for this author in PubMed Google Scholar
Eldan Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Sheila A. McIlraith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eldan Cohen.

Ethics declarations

Conflicts of interests/Competing interests

The authors have no relevant conflicts of interest/competing interests to declare

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shati, P., Cohen, E. & McIlraith, S.A. SAT-based optimal classification trees for non-binary data. Constraints 28, 166–202 (2023). https://doi.org/10.1007/s10601-023-09348-1

Download citation

Accepted: 17 April 2023
Published: 08 July 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10601-023-09348-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SAT-based optimal classification trees for non-binary data

Abstract

Access this article

Similar content being viewed by others

Optimal decision trees for categorical data via integer programming

Learning Decision Trees with Flexible Constraints and Objectives Using Integer Optimization

Accurate Decision Tree with Cost Constraints

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests/Competing interests

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SAT-based optimal classification trees for non-binary data

Abstract

Access this article

Similar content being viewed by others

Optimal decision trees for categorical data via integer programming

Learning Decision Trees with Flexible Constraints and Objectives Using Integer Optimization

Accurate Decision Tree with Cost Constraints

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests/Competing interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation