ABSTRACT
Trees have emerged as the most popular choice of intrinsically interpretable models to represent policies in reinforcement learning. However, directly learning a tree policy poses challenges, prompting existing approaches to employ neural network policies to generate datasets for training tree-based models in a supervised manner. Nonetheless, these approaches assume that the action suggested by the neural network policy represents the sole optimal action, with all other actions being equally sub-optimal. This work presents a novel perspective by associating different costs with the prediction of different actions. By adopting a cost-sensitive approach to tree construction, we demonstrate that policies generated using this methodology exhibit improved performance. To validate our findings, we develop cost-sensitive variants of two established methods, VIPER and MoET, and provide empirical evidence showcasing their superiority over the original methods across diverse environments.
- Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable Reinforcement Learning via Policy Extraction. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Vol. 31. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2018/file/e6d8545daa42d5ced125a4bf747b3688-Paper.pdfGoogle Scholar
- Leo Breiman, Jerome Friedman, Charles J Stone, and RA Olshen. 1984. Classification and Regression Trees. CRC Press.Google Scholar
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs/1606.01540 (2016). arXiv:1606.01540http://arxiv.org/abs/1606.01540Google Scholar
- David Chapman and Leslie Pack Kaelbling. 1991. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons.. In Ijcai, Vol. 91. 726–731.Google Scholar
- Youri Coppens, Kyriakos Efthymiadis, Tom Lenaerts, and Ann Nowé. 2019. Distilling Deep Reinforcement Learning Policies in Soft Decision Trees. In International Joint Conference on Artificial Intelligence.Google Scholar
- Alejandro Correa Bahnsen, Djamila Aouada, and Björn Ottersten. 2015. Example-dependent cost-sensitive decision trees. Expert Systems with Applications 42, 19 (2015), 6609–6619. https://doi.org/10.1016/j.eswa.2015.04.042Google ScholarDigital Library
- Zihan Ding, Pablo Hernandez-Leal, Gavin Weiguang Ding, Changjian Li, and Ruitong Huang. 2020. CDT: Cascading Decision Trees for Explainable Reinforcement Learning. CoRR abs/2011.07553 (2020). arXiv:2011.07553https://arxiv.org/abs/2011.07553Google Scholar
- Charles Elkan. 2001. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, Vol. 17. Lawrence Erlbaum Associates Ltd, 973–978.Google Scholar
- Johan Huysmans, Karel Dejaeger, Christophe Mues, Jan Vanthienen, and Bart Baesens. 2011. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems 51, 1 (2011), 141–154. https://doi.org/10.1016/j.dss.2010.12.003Google ScholarDigital Library
- Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive Mixtures of Local Experts. Neural Computation 3, 1 (03 1991), 79–87. https://doi.org/10.1162/neco.1991.3.1.79 arXiv:https://direct.mit.edu/neco/article-pdf/3/1/79/812104/neco.1991.3.1.79.pdfGoogle ScholarCross Ref
- Michael I. Jordan and Lei Xu. 1995. Convergence results for the EM approach to mixtures of experts architectures. Neural Networks 8, 9 (1995), 1409–1431. https://doi.org/10.1016/0893-6080(95)00014-3Google ScholarDigital Library
- Edouard Leurent. 2018. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highway-env.Google Scholar
- Charles X Ling, Qiang Yang, Jianning Wang, and Shichao Zhang. 2004. Decision trees with minimal costs. In Proceedings of the twenty-first international conference on Machine learning. 69.Google ScholarDigital Library
- Guiliang Liu, Oliver Schulte, Wang Zhu, and Qingcan Li. 2019. Toward interpretable deep reinforcement learning with linear model u-trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part II 18. Springer, 414–429.Google ScholarDigital Library
- Andrew Kachites McCallum 1996. Learning to use selective attention and short-term memory in sequential tasks. In From Animals to Animats 4: Proceedings of the fourth international conference on simulation of adaptive behavior, Vol. 4. MIT Press Cambridge, 315.Google Scholar
- R Andrew McCallum. 1995. Instance-based utile distinctions for reinforcement learning with hidden state. In Machine Learning Proceedings 1995. Elsevier, 387–395.Google ScholarCross Ref
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.Google Scholar
- Michael Pazzani, Christopher Merz, Patrick Murphy, Kamal Ali, Timothy Hume, and Clifford Brunk. 1994. Reducing misclassification costs. In Machine Learning Proceedings 1994. Elsevier, 217–225.Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarDigital Library
- Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22, 268 (2021), 1–8. http://jmlr.org/papers/v22/20-1364.htmlGoogle Scholar
- Stephane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 15), Geoffrey Gordon, David Dunson, and Miroslav Dudík (Eds.). PMLR, Fort Lauderdale, FL, USA, 627–635. https://proceedings.mlr.press/v15/ross11a.htmlGoogle Scholar
- Aaron M Roth, Nicholay Topin, Pooyan Jamshidi, and Manuela Veloso. 2019. Conservative q-improvement: Reinforcement learning for an interpretable decision-tree policy. arXiv preprint arXiv:1907.01180 (2019).Google Scholar
- Andrew Silva, Matthew Gombolay, Taylor Killian, Ivan Jimenez, and Sung-Hyun Son. 2020. Optimization methods for interpretable differentiable decision trees applied to reinforcement learning. In International conference on artificial intelligence and statistics. PMLR, 1855–1865.Google Scholar
- Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 1 (1999), 181–211. https://doi.org/10.1016/S0004-3702(99)00052-1Google ScholarDigital Library
- Kai Ming Ting. 2002. An instance-weighting method to induce cost-sensitive trees. IEEE Transactions on Knowledge and Data Engineering 14, 3 (2002), 659–665. https://doi.org/10.1109/TKDE.2002.1000348Google ScholarDigital Library
- William TB Uther and Manuela M Veloso. 1998. Tree based discretization for continuous state space reinforcement learning. AAAI/IAAI 98 (1998), 769–774.Google Scholar
- Marko Vasić, Andrija Petrović, Kaiyuan Wang, Mladen Nikolić, Rishabh Singh, and Sarfraz Khurshid. 2022. MoËT: Mixture of Expert Trees and its application to verifiable reinforcement learning. Neural Networks 151 (2022), 34–47. https://doi.org/10.1016/j.neunet.2022.03.022Google ScholarDigital Library
- B. Zadrozny, J. Langford, and N. Abe. 2003. Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining. 435–442. https://doi.org/10.1109/ICDM.2003.1250950Google ScholarCross Ref
- Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18, 1 (2006), 63–77. https://doi.org/10.1109/TKDE.2006.17Google ScholarDigital Library
Index Terms
- Cost-Sensitive Trees for Interpretable Reinforcement Learning
Recommendations
Interpretable Reinforcement Learning of Behavior Trees
ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and ComputingThe interpretability of reinforcement learning (RL) algorithms has become one of the significant challenges for artificial intelligence (AI) researchers. Behavior Trees (BTs) have enabled developers to design AI policies visually and comprehend the ...
Evolving interpretable decision trees for reinforcement learning
AbstractIn recent years, reinforcement learning (RL) techniques have achieved great success in many different applications. However, their heavy reliance on complex deep neural networks makes most RL models uninterpretable, limiting their application in ...
Highlights- A multi-method ensemble evolutionary algorithm is proposed for Interpretable RL.
- Interpretability is achieved by using Decision Trees as RL agents.
- Imitation Learning is leveraged to jumpstart the evolutionary algorithm.
- A ...
Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent SystemsThe potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem. One alternative to heuristic trial-and-error is preference-based RL (PbRL), where a reward function is ...
Comments