skip to main content
10.1145/3632410.3632443acmotherconferencesArticle/Chapter ViewAbstractPublication PagescomadConference Proceedingsconference-collections
research-article

Cost-Sensitive Trees for Interpretable Reinforcement Learning

Published:04 January 2024Publication History

ABSTRACT

Trees have emerged as the most popular choice of intrinsically interpretable models to represent policies in reinforcement learning. However, directly learning a tree policy poses challenges, prompting existing approaches to employ neural network policies to generate datasets for training tree-based models in a supervised manner. Nonetheless, these approaches assume that the action suggested by the neural network policy represents the sole optimal action, with all other actions being equally sub-optimal. This work presents a novel perspective by associating different costs with the prediction of different actions. By adopting a cost-sensitive approach to tree construction, we demonstrate that policies generated using this methodology exhibit improved performance. To validate our findings, we develop cost-sensitive variants of two established methods, VIPER and MoET, and provide empirical evidence showcasing their superiority over the original methods across diverse environments.

References

  1. Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable Reinforcement Learning via Policy Extraction. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Vol. 31. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2018/file/e6d8545daa42d5ced125a4bf747b3688-Paper.pdfGoogle ScholarGoogle Scholar
  2. Leo Breiman, Jerome Friedman, Charles J Stone, and RA Olshen. 1984. Classification and Regression Trees. CRC Press.Google ScholarGoogle Scholar
  3. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs/1606.01540 (2016). arXiv:1606.01540http://arxiv.org/abs/1606.01540Google ScholarGoogle Scholar
  4. David Chapman and Leslie Pack Kaelbling. 1991. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons.. In Ijcai, Vol. 91. 726–731.Google ScholarGoogle Scholar
  5. Youri Coppens, Kyriakos Efthymiadis, Tom Lenaerts, and Ann Nowé. 2019. Distilling Deep Reinforcement Learning Policies in Soft Decision Trees. In International Joint Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  6. Alejandro Correa Bahnsen, Djamila Aouada, and Björn Ottersten. 2015. Example-dependent cost-sensitive decision trees. Expert Systems with Applications 42, 19 (2015), 6609–6619. https://doi.org/10.1016/j.eswa.2015.04.042Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zihan Ding, Pablo Hernandez-Leal, Gavin Weiguang Ding, Changjian Li, and Ruitong Huang. 2020. CDT: Cascading Decision Trees for Explainable Reinforcement Learning. CoRR abs/2011.07553 (2020). arXiv:2011.07553https://arxiv.org/abs/2011.07553Google ScholarGoogle Scholar
  8. Charles Elkan. 2001. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, Vol. 17. Lawrence Erlbaum Associates Ltd, 973–978.Google ScholarGoogle Scholar
  9. Johan Huysmans, Karel Dejaeger, Christophe Mues, Jan Vanthienen, and Bart Baesens. 2011. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems 51, 1 (2011), 141–154. https://doi.org/10.1016/j.dss.2010.12.003Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive Mixtures of Local Experts. Neural Computation 3, 1 (03 1991), 79–87. https://doi.org/10.1162/neco.1991.3.1.79 arXiv:https://direct.mit.edu/neco/article-pdf/3/1/79/812104/neco.1991.3.1.79.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  11. Michael I. Jordan and Lei Xu. 1995. Convergence results for the EM approach to mixtures of experts architectures. Neural Networks 8, 9 (1995), 1409–1431. https://doi.org/10.1016/0893-6080(95)00014-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Edouard Leurent. 2018. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highway-env.Google ScholarGoogle Scholar
  13. Charles X Ling, Qiang Yang, Jianning Wang, and Shichao Zhang. 2004. Decision trees with minimal costs. In Proceedings of the twenty-first international conference on Machine learning. 69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Guiliang Liu, Oliver Schulte, Wang Zhu, and Qingcan Li. 2019. Toward interpretable deep reinforcement learning with linear model u-trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part II 18. Springer, 414–429.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Andrew Kachites McCallum 1996. Learning to use selective attention and short-term memory in sequential tasks. In From Animals to Animats 4: Proceedings of the fourth international conference on simulation of adaptive behavior, Vol. 4. MIT Press Cambridge, 315.Google ScholarGoogle Scholar
  16. R Andrew McCallum. 1995. Instance-based utile distinctions for reinforcement learning with hidden state. In Machine Learning Proceedings 1995. Elsevier, 387–395.Google ScholarGoogle ScholarCross RefCross Ref
  17. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.Google ScholarGoogle Scholar
  18. Michael Pazzani, Christopher Merz, Patrick Murphy, Kamal Ali, Timothy Hume, and Clifford Brunk. 1994. Reducing misclassification costs. In Machine Learning Proceedings 1994. Elsevier, 217–225.Google ScholarGoogle ScholarCross RefCross Ref
  19. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22, 268 (2021), 1–8. http://jmlr.org/papers/v22/20-1364.htmlGoogle ScholarGoogle Scholar
  21. Stephane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 15), Geoffrey Gordon, David Dunson, and Miroslav Dudík (Eds.). PMLR, Fort Lauderdale, FL, USA, 627–635. https://proceedings.mlr.press/v15/ross11a.htmlGoogle ScholarGoogle Scholar
  22. Aaron M Roth, Nicholay Topin, Pooyan Jamshidi, and Manuela Veloso. 2019. Conservative q-improvement: Reinforcement learning for an interpretable decision-tree policy. arXiv preprint arXiv:1907.01180 (2019).Google ScholarGoogle Scholar
  23. Andrew Silva, Matthew Gombolay, Taylor Killian, Ivan Jimenez, and Sung-Hyun Son. 2020. Optimization methods for interpretable differentiable decision trees applied to reinforcement learning. In International conference on artificial intelligence and statistics. PMLR, 1855–1865.Google ScholarGoogle Scholar
  24. Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 1 (1999), 181–211. https://doi.org/10.1016/S0004-3702(99)00052-1Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kai Ming Ting. 2002. An instance-weighting method to induce cost-sensitive trees. IEEE Transactions on Knowledge and Data Engineering 14, 3 (2002), 659–665. https://doi.org/10.1109/TKDE.2002.1000348Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. William TB Uther and Manuela M Veloso. 1998. Tree based discretization for continuous state space reinforcement learning. AAAI/IAAI 98 (1998), 769–774.Google ScholarGoogle Scholar
  27. Marko Vasić, Andrija Petrović, Kaiyuan Wang, Mladen Nikolić, Rishabh Singh, and Sarfraz Khurshid. 2022. MoËT: Mixture of Expert Trees and its application to verifiable reinforcement learning. Neural Networks 151 (2022), 34–47. https://doi.org/10.1016/j.neunet.2022.03.022Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Zadrozny, J. Langford, and N. Abe. 2003. Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining. 435–442. https://doi.org/10.1109/ICDM.2003.1250950Google ScholarGoogle ScholarCross RefCross Ref
  29. Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18, 1 (2006), 63–77. https://doi.org/10.1109/TKDE.2006.17Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cost-Sensitive Trees for Interpretable Reinforcement Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
          January 2024
          627 pages

          Copyright © 2024 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 January 2024

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)49
          • Downloads (Last 6 weeks)12

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format