π-Light: Programmatic Interpretable Reinforcement Learning for Resource-Limited Traffic Signal Control

Authors

  • Yin Gu Anhui Province Key Laboratory of Big Data Analysis and Application, School of Data Science & School of Computer Science and Technology, University of Science and Technology of China State Key Laboratory of Cognitive Intelligence
  • Kai Zhang Anhui Province Key Laboratory of Big Data Analysis and Application, School of Data Science & School of Computer Science and Technology, University of Science and Technology of China State Key Laboratory of Cognitive Intelligence
  • Qi Liu Anhui Province Key Laboratory of Big Data Analysis and Application, School of Data Science & School of Computer Science and Technology, University of Science and Technology of China State Key Laboratory of Cognitive Intelligence
  • Weibo Gao Anhui Province Key Laboratory of Big Data Analysis and Application, School of Data Science & School of Computer Science and Technology, University of Science and Technology of China State Key Laboratory of Cognitive Intelligence
  • Longfei Li Ant Financial Services Group
  • Jun Zhou Ant Financial Services Group

DOI:

https://doi.org/10.1609/aaai.v38i19.30103

Keywords:

General

Abstract

The recent advancements in Deep Reinforcement Learning (DRL) have significantly enhanced the performance of adaptive Traffic Signal Control (TSC). However, DRL policies are typically represented by neural networks, which are over-parameterized black-box models. As a result, the learned policies often lack interpretability, and cannot be deployed directly in the real-world edge hardware due to resource constraints. In addition, the DRL methods often exhibit limited generalization performance, struggling to generalize the learned policy to other geographical regions. These factors limit the practical application of learning-based approaches. To address these issues, we suggest the use of an inherently interpretable program for representing the control policy. We present a new approach, Programmatic Interpretable reinforcement learning for traffic signal control (π-light), designed to autonomously discover non-differentiable programs. Specifically, we define a Domain Specific Language (DSL) and transformation rules for constructing programs, and utilize Monte Carlo Tree Search (MCTS) to find the optimal program in a discrete space. Extensive experiments demonstrate that our method consistently outperforms baseline approaches. Moreover, π-Light exhibits superior generalization capabilities compared to DRL, enabling training and evaluation across intersections from different cities. Finally, we analyze how the learned program policies can directly deploy on edge devices with extremely limited resources.

Published

2024-03-24

How to Cite

Gu, Y., Zhang, K., Liu, Q., Gao, W., Li, L., & Zhou, J. (2024). π-Light: Programmatic Interpretable Reinforcement Learning for Resource-Limited Traffic Signal Control. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 21107-21115. https://doi.org/10.1609/aaai.v38i19.30103

Issue

Section

AAAI Technical Track on Safe, Robust and Responsible AI Track