research-article

Cost-Sensitive Trees for Interpretable Reinforcement Learning

Authors:
Siddharth Nishtala

Robert Bosch Centre for Data Science and AI, Indian Institute of Technology Madras, India

Robert Bosch Centre for Data Science and AI, Indian Institute of Technology Madras, India

0000-0002-4229-5922
View Profile

,
Balaraman Ravindran

Robert Bosch Centre for Data Science and AI, Indian Institute of Technology Madras, India

Robert Bosch Centre for Data Science and AI, Indian Institute of Technology Madras, India

0000-0002-5364-7639
View Profile

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)January 2024Pages 91–99https://doi.org/10.1145/3632410.3632443

Published:04 January 2024Publication History

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

Pages 91–99

ABSTRACT

Trees have emerged as the most popular choice of intrinsically interpretable models to represent policies in reinforcement learning. However, directly learning a tree policy poses challenges, prompting existing approaches to employ neural network policies to generate datasets for training tree-based models in a supervised manner. Nonetheless, these approaches assume that the action suggested by the neural network policy represents the sole optimal action, with all other actions being equally sub-optimal. This work presents a novel perspective by associating different costs with the prediction of different actions. By adopting a cost-sensitive approach to tree construction, we demonstrate that policies generated using this methodology exhibit improved performance. To validate our findings, we develop cost-sensitive variants of two established methods, VIPER and MoET, and provide empirical evidence showcasing their superiority over the original methods across diverse environments.

References

Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable Reinforcement Learning via Policy Extraction. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Vol. 31. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2018/file/e6d8545daa42d5ced125a4bf747b3688-Paper.pdfGoogle Scholar
Leo Breiman, Jerome Friedman, Charles J Stone, and RA Olshen. 1984. Classification and Regression Trees. CRC Press.Google Scholar
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs/1606.01540 (2016). arXiv:1606.01540http://arxiv.org/abs/1606.01540Google Scholar
David Chapman and Leslie Pack Kaelbling. 1991. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons.. In Ijcai, Vol. 91. 726–731.Google Scholar
Youri Coppens, Kyriakos Efthymiadis, Tom Lenaerts, and Ann Nowé. 2019. Distilling Deep Reinforcement Learning Policies in Soft Decision Trees. In International Joint Conference on Artificial Intelligence.Google Scholar
Alejandro Correa Bahnsen, Djamila Aouada, and Björn Ottersten. 2015. Example-dependent cost-sensitive decision trees. Expert Systems with Applications 42, 19 (2015), 6609–6619. https://doi.org/10.1016/j.eswa.2015.04.042Google ScholarDigital Library
Zihan Ding, Pablo Hernandez-Leal, Gavin Weiguang Ding, Changjian Li, and Ruitong Huang. 2020. CDT: Cascading Decision Trees for Explainable Reinforcement Learning. CoRR abs/2011.07553 (2020). arXiv:2011.07553https://arxiv.org/abs/2011.07553Google Scholar
Charles Elkan. 2001. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, Vol. 17. Lawrence Erlbaum Associates Ltd, 973–978.Google Scholar
Johan Huysmans, Karel Dejaeger, Christophe Mues, Jan Vanthienen, and Bart Baesens. 2011. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems 51, 1 (2011), 141–154. https://doi.org/10.1016/j.dss.2010.12.003Google ScholarDigital Library
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive Mixtures of Local Experts. Neural Computation 3, 1 (03 1991), 79–87. https://doi.org/10.1162/neco.1991.3.1.79 arXiv:https://direct.mit.edu/neco/article-pdf/3/1/79/812104/neco.1991.3.1.79.pdfGoogle ScholarCross Ref
Michael I. Jordan and Lei Xu. 1995. Convergence results for the EM approach to mixtures of experts architectures. Neural Networks 8, 9 (1995), 1409–1431. https://doi.org/10.1016/0893-6080(95)00014-3Google ScholarDigital Library
Edouard Leurent. 2018. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highway-env.Google Scholar
Charles X Ling, Qiang Yang, Jianning Wang, and Shichao Zhang. 2004. Decision trees with minimal costs. In Proceedings of the twenty-first international conference on Machine learning. 69.Google ScholarDigital Library
Guiliang Liu, Oliver Schulte, Wang Zhu, and Qingcan Li. 2019. Toward interpretable deep reinforcement learning with linear model u-trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part II 18. Springer, 414–429.Google ScholarDigital Library
Andrew Kachites McCallum 1996. Learning to use selective attention and short-term memory in sequential tasks. In From Animals to Animats 4: Proceedings of the fourth international conference on simulation of adaptive behavior, Vol. 4. MIT Press Cambridge, 315.Google Scholar
R Andrew McCallum. 1995. Instance-based utile distinctions for reinforcement learning with hidden state. In Machine Learning Proceedings 1995. Elsevier, 387–395.Google ScholarCross Ref
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.Google Scholar
Michael Pazzani, Christopher Merz, Patrick Murphy, Kamal Ali, Timothy Hume, and Clifford Brunk. 1994. Reducing misclassification costs. In Machine Learning Proceedings 1994. Elsevier, 217–225.Google ScholarCross Ref
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarDigital Library
Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22, 268 (2021), 1–8. http://jmlr.org/papers/v22/20-1364.htmlGoogle Scholar
Stephane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 15), Geoffrey Gordon, David Dunson, and Miroslav Dudík (Eds.). PMLR, Fort Lauderdale, FL, USA, 627–635. https://proceedings.mlr.press/v15/ross11a.htmlGoogle Scholar
Aaron M Roth, Nicholay Topin, Pooyan Jamshidi, and Manuela Veloso. 2019. Conservative q-improvement: Reinforcement learning for an interpretable decision-tree policy. arXiv preprint arXiv:1907.01180 (2019).Google Scholar
Andrew Silva, Matthew Gombolay, Taylor Killian, Ivan Jimenez, and Sung-Hyun Son. 2020. Optimization methods for interpretable differentiable decision trees applied to reinforcement learning. In International conference on artificial intelligence and statistics. PMLR, 1855–1865.Google Scholar
Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 1 (1999), 181–211. https://doi.org/10.1016/S0004-3702(99)00052-1Google ScholarDigital Library
Kai Ming Ting. 2002. An instance-weighting method to induce cost-sensitive trees. IEEE Transactions on Knowledge and Data Engineering 14, 3 (2002), 659–665. https://doi.org/10.1109/TKDE.2002.1000348Google ScholarDigital Library
William TB Uther and Manuela M Veloso. 1998. Tree based discretization for continuous state space reinforcement learning. AAAI/IAAI 98 (1998), 769–774.Google Scholar
Marko Vasić, Andrija Petrović, Kaiyuan Wang, Mladen Nikolić, Rishabh Singh, and Sarfraz Khurshid. 2022. MoËT: Mixture of Expert Trees and its application to verifiable reinforcement learning. Neural Networks 151 (2022), 34–47. https://doi.org/10.1016/j.neunet.2022.03.022Google ScholarDigital Library
B. Zadrozny, J. Langford, and N. Abe. 2003. Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining. 435–442. https://doi.org/10.1109/ICDM.2003.1250950Google ScholarCross Ref
Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18, 1 (2006), 63–77. https://doi.org/10.1109/TKDE.2006.17Google ScholarDigital Library

Index Terms

Cost-Sensitive Trees for Interpretable Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Sequential decision making
      2. Supervised learning
        Cost-sensitive learning
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Interpretable Reinforcement Learning of Behavior Trees
ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing

The interpretability of reinforcement learning (RL) algorithms has become one of the significant challenges for artificial intelligence (AI) researchers. Behavior Trees (BTs) have enabled developers to design AI policies visually and comprehend the ...
Read More
Evolving interpretable decision trees for reinforcement learning
Abstract
In recent years, reinforcement learning (RL) techniques have achieved great success in many different applications. However, their heavy reliance on complex deep neural networks makes most RL models uninterpretable, limiting their application in ...
Highlights
- A multi-method ensemble evolutionary algorithm is proposed for Interpretable RL.
- Interpretability is achieved by using Decision Trees as RL agents.
- Imitation Learning is leveraged to jumpstart the evolutionary algorithm.
- A ...
Read More
Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

The potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem. One alternative to heuristic trial-and-error is preference-based RL (PbRL), where a reward function is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
January 2024
627 pages
ISBN:9798400716348
DOI:10.1145/3632410
Editors:
Sriraam Natarajan,
Indrajit Bhattacharya,
Richa Singh,
Arun Kumar,
Sayan Ranu,
Kalika Bali,
Abinaya K
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 January 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
decision trees
interpretability
reinforcement learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 49
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Cost-Sensitive Trees for Interpretable Reinforcement Learning

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Interpretable Reinforcement Learning of Behavior Trees

Evolving interpretable decision trees for reinforcement learning

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Cost-Sensitive Trees for Interpretable Reinforcement Learning

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Interpretable Reinforcement Learning of Behavior Trees

Evolving interpretable decision trees for reinforcement learning

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media