Abstract
In contemporary urban, traffic signal control is still enormously difficult. Multi-agent reinforcement learning (MARL) is a promising ways to solve this problem. However, most MARL algorithms can not effectively transfer learning strategies when the agents increase or decrease. This paper proposes a new MARL algorithm called cooperative dynamic delay updating twin delayed deep deterministic policy gradient based on the exponentially weighted moving average (CoTD3-EWMA) to solve the problem. By introducing mean-field theory, the algorithm implicitly models the interaction between agents and environment. It reduces the dimension of action space and improves the scalability of the algorithm. In addition, we propose a dynamic delay updating method based on the exponentially weighted moving average (EWMA), which improves the Q value overestimation problem of the traditional TD3 algorithm. Moreover, a joint reward allocation mechanism and state sharing mechanism are proposed to improve the global strategy learning ability and robustness of the agent. The simulation results show that the performance of the new algorithm is better than the current state-of-the-art algorithms, which effectively reduces the delay time of vehicles and improves the traffic efficiency of the traffic network.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Guo Q, Li L, Ban X J (2019) Urban traffic signal control with connected and automated vehicles: A survey. Transp Res Part C: Emerging Technol 101:313–334
Gao K, Zhang Y, Su R, Yang F, Suganthan P N, Zhou M (2018) Solving traffic signal scheduling problems in heterogeneous traffic network by using meta-heuristics. IEEE Trans Intell Transp Syst 20 (9):3272–3282
Wei H, Zheng G, Gayah V, Li Z (2019) A survey on traffic signal control methods. arXiv:1904.08117
Deng L Y, Liang H C, Wang C-T, Wang C-S, Hung L-P (2005) The development of the adaptive traffic signal control system. In: 11th International conference on parallel and distributed systems (ICPADS’05), vol 2. IEEE, pp 634–638
Zhang Y, Zhou Y (2018) Distributed coordination control of traffic network flow using adaptive genetic algorithm based on cloud computing. J Netw Comput Appl 119:110–120
Qiao Z, Ke L, Zhang G, Wang X (2021) Adaptive collaborative optimization of traffic network signal timing based on immune-fireworks algorithm and hierarchical strategy. Appl Intell. https://doi.org/10.1007/s10489-021-02256-yhttps://doi.org/10.1007/s10489-021-02256-y
Yu X, Qiao Y, Li Q, Xu G, Kang C, Estevez C, Deng C, Wang S (2020) Parallelizing comprehensive learning particle swarm optimization by open computing language on an integrated graphical processing unit. Complexity
Zhang Y, Zhou Y, Lu H, Fujita H (2021) Spark cloud-based parallel computing for traffic network flow predictive control using non-analytical predictive model. IEEE Trans Intell Transp Syst
Zhang B, Zheng Y-J, Zhang M-X, Chen S-Y (2015) Fireworks algorithm with enhanced fireworks interaction. IEEE/ACM Trans Comput Biol Bioinform 14(1):42–55
Sutton R S, Barto A G (2018) Reinforcement learning: An introduction. MIT press
Wiering MA, Veenen J , Vreeken J, Koopman A (2004) Intelligent traffic light control. Utrecht University: Information and Computing Sciences
Prashanth LA, Bhatnagar S (2010) Reinforcement learning with function approximation for traffic signal control. IEEE Trans Intell Transp Syst 12(2):412–421
Ozan C, Baskan O, Haldenbilen S, Ceylan H (2015) A modified reinforcement learning algorithm for solving coordinated signalized networks. Transp Res Part C: Emerging Technol 54:40–55
El-Tantawy S, Abdulhai B, Abdelgawad H (2013) Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown toronto. IEEE Trans Intell Transp Syst 14(3):1140–1150
Zhang Y, Zhou Y, Lu H, Fujita H (2020) Traffic network flow prediction using parallel training for deep convolutional neural networks on spark cloud. IEEE Trans Ind Inf 16(12):7369–7380
Zhao L, Zhou Y, Lu H, Fujita H (2019) Parallel computing method of deep belief networks and its application to traffic flow prediction. Knowl-Based Syst 163:972–987
Arulkumaran K, Deisenroth M P, Brundage M, Bharath A A (2017) Deep reinforcement learning: A brief survey. IEEE Signal Proc Mag 34(6):26–38
François-Lavet V, Henderson P, Islam R, Bellemare M G, Pineau J (2018) An introduction to deep reinforcement learning. arXiv:1811.12560
Wang S, Liu H, Gomes P H, Krishnamachari B (2018) Deep reinforcement learning for dynamic multichannel access in wireless networks. IEEE Trans Cogn Commun Netw 4(2):257–265
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P et al (2018) Soft actor-critic algorithms and applications. arXiv:1812.05905
Zhang Y, Zhou Y, Lu H, Fujita H (2021) Cooperative multi-agent actor–critic control of traffic network flow based on edge computing. Futur Gener Comput Syst 123:128–141
Casas N (2017) Deep deterministic policy gradient for urban traffic light control. arXiv:1703.09035
Zhang F, Li J, Li Z (2020) A td3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment. Neurocomputing 411:206–215
Ceylan H, Bell MGH (2004) Traffic signal timing optimisation based on genetic algorithm approach, including drivers’ routing. Transp Res B Methodol 38(4):329–342
Wei H, Zheng G, Yao H, Li Z (2018) Intellilight: A reinforcement learning approach for intelligent traffic light control. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 2496–2505
Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746-752):2
Shamshirband S (2012) A distributed approach for coordination between traffic lights based on game theory. Int Arab J Inf Technol 9(2):148–153
Arel I, Liu C, Urbanik T, Kohls A G (2010) Reinforcement learning-based multi-agent system for network traffic. IET Intell Transp Syst 4(2):128–135
Wiering M, Vreeken J, Van Veenen J, Koopman A (2004) Simulation and optimization of traffic in a city. In: IEEE Intelligent Vehicles Symposium, 2004. IEEE, pp 453–458
Salkham A , Cunningham R, Garg A, Cahill V (2008) A collaborative reinforcement learning approach to urban traffic control optimization. In: 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol 2. IEEE, pp 560–566
Aziz HM, Feng Z, Ukkusuri S V (2013) Reinforcement learning-based signal control using r-markov average reward technique (rmart) accounting for neighborhood congestion information sharing. Technical report
Wang X, Ke L, Qiao Z, Chai X (2020) Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE Trans Cybern
Nguyen H D, Tran K P, Heuchenne C (2019) Monitoring the ratio of two normal variables using variable sampling interval exponentially weighted moving average control charts. Qual Reliab Eng Int 35(1):439–460
Pan L, Cai Q, Huang L (2020) Softmax deep double deterministic policy gradients. Adv Neural Inf Process Syst 33
Domb C (2000) Phase transitions and critical phenomena. Elsevier
Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR, pp 5571–5580
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Cai Q, Yang Z, Lee J D, Wang Z (2019) Neural temporal-difference learning converges to global optima. In: Advances in Neural Information Processing Systems, pp 11315–11326
Sadhu A K, Konar A (2018) An efficient computing of correlated equilibrium for cooperative q-learning-based multi-robot planning. IEEE Transactions on Systems, Man, and Cybernetics: Systems
Alshehri A, Badawy A-H A, Huang H (2020) Fq-ago: Fuzzy logic q-learning based asymmetric link aware and geographic opportunistic routing scheme for manets. Electronics 9(4):576
Abed-Alguni B H, Paul D J, Chalup S K, Henskens F A (2016) A comparison study of cooperative q-learning algorithms for independent learners. Int J Artif Intell 14(1):71–93
Banerjee D, Sen S (2007) Reaching pareto-optimality in prisoner dilemma using conditional joint action learning. Auton Agent Multi-Agent Syst 15(1):91–108
Buşoniu L, Babuška R, De Schutter B (2010) Multi-agent reinforcement learning: An overview. Innov Multi-Agent Syst Appl-1, pp 183–221
Agogino A K, Tumer K (2008) Analyzing and visualizing multiagent rewards in dynamic and stochastic domains. Auton Agent Multi-Agent Syst 17(2):320–338
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv:1706.02275
Sutandi A C (2020) Advanced traffic control systems: Performance evaluation in a developing country. LAP Lambert Academic Publishing
Chu T, Wang J, Codecà L, Li Z (2019) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst
Acknowledgements
This work was supported by National Natural Science Foundation of China (No. 61973244, 72001214).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qiao, Z., Ke, L. & Wang, X. Traffic signal control using a cooperative EWMA-based multi-agent reinforcement learning. Appl Intell 53, 4483–4498 (2023). https://doi.org/10.1007/s10489-022-03643-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03643-9