Traffic flow control using multi-agent reinforcement learning

https://doi.org/10.1016/j.jnca.2022.103497Get rights and content

Abstract

One of the technologies based on information technology used today is the VANET network used for inter-road communication. Today, many developed countries use this technology to optimize travel times, queue lengths, number of vehicle stops, and overall traffic network efficiency. In this research, we investigate the critical and necessary factors to increase the quality of VANET networks. This paper focuses on increasing the quality of service using multi-agent learning methods. The innovation of this study is using artificial intelligence to improve the network’s quality of service, which uses a mechanism and algorithm to find the optimal behavior of agents in the VANET. The result indicates that the proposed method is more optimal in the evaluation criteria of packet delivery ratio (PDR), transaction success rate, phase duration, and queue length than the previous ones. According to the evaluation criteria, TSR 6.342%, PDR 9.105%, QL 7.143%, and PD 6.783% are more efficient than previous works.

Introduction

In recent years, existing methods in computer science and especially artificial intelligence have attracted the attention of experts to control traffic. Congestion (density) is a common phenomenon at intersections and long queues of cars behind traffic lights that is seen every day in large cities. This congestion that occurs continuously at different levels and aspects imposes huge costs on communities. Congestion caused by sudden events affecting traffic are accidents, bad weather, technical defects and so on. Intelligent transportation systems are the ones that use telecommunication and control technologies, hardware and software in land transportation systems.

One of the advantages of artificial intelligence and especially reinforcing learning is that they do not need the environment model and the intervention of the designer (human). Meanwhile, multi-way systems have gained a lot of fans in traffic control due to their closeness to the nature of traffic problems. In other words, the learning agent continuously acquires the necessary knowledge for optimal performance through interaction with the environment. The learning agent performs actions in the environment sequentially and receives feedback on those actions in the form of rewards. Due to the uncertainty in traffic environments, their high complexity and dynamics, it is very difficult to gain a primary knowledge of the environment and put it in the mind of the agent. Therefore, it is vital to have a learning mechanism through which the agent can acquire the necessary knowledge while making decisions and intelligently interacting with the uncertain environment.

In this study, we use artificial intelligence and reinforcement learning to create data and datasets that actually include test data, training data, and simulated data. Due to the nature of the traffic problem that is the unavailability of training data, the unavailability of educational data, supervised learning methods are not appropriate to achieve this goal. We need a method that can acquire the necessary knowledge without using training data (unsupervised learning). The use of artificial intelligence and especially reinforced learning as one of the successful approaches in machine learning can provide such an opportunity. In this method, the agent achieves the optimal policy (proper timing of the traffic light) through interacting with the uncertain environment. The learning agent performs actions in the environment sequentially and receives feedback on those actions in the form of rewards.

In traffic lights control, scheduling parameters according to the existing traffic conditions (for example, the number of cars waiting at the intersection and the delay time) to achieve some goals (e.g., maximum number of vehicles crossing the intersection, reduced travel time, reduced queue length and network efficiency) are optimized. A coordination mechanism to find the optimal behavior of all agents is another goal of this paper. In this research, Q-learning algorithm is used for more comprehensive studies (Nazib and Moh, 2021).

Applying reinforcing learning in traffic control, which is a multi-agent problem, is associated with different challenges. The most important challenge is that learning agents (traffic lights) are learning individually and independently, and eventually find the optimal behavior of the individual that will not necessarily lead to the optimal behavior of all agents. Addressing this challenge in traffic control is one of the main goals of this study.

We also address the multi-intersection traffic control problem using artificial intelligence, and in particular reinforcing learning. In this study, we aim to reduce queue length, travel time and increase network efficiency, as well as increase the quality of service of the traffic network.

In this study, we want to achieve high quality of service, low latency and increased response time as evaluation criteria. If the stability of the network is concerned, then the traffic network and all evaluation criteria would be disrupted. It is therefore essential to use prediction methods to be able to manage and control the traffic network. Reinforcement learning as one of the successful approaches in machine learning can provide such an opportunity. To reduce queue length, travel time, increase network efficiency, and increase the quality of service of the traffic network, we have used the Q-Learning algorithm by adjusting its parameters.

Also, by using traffic patterns for highways and maps where there are intersections, we detect intersections with high traffic density and then provide some offers to our users in order to use the network online.

One of the goals we are pursuing in this research is to increase the quality of service in the Vent network. As the quality of service increases, the efficiency of the traffic network increases and our clients are satisfied. In addition, the network can have more bandwidth capacity.

Controlling and managing the traffic network is another goal and if this goal is achieved, we can provide better services and offers to clients. Otherwise, our network is not online and clients cannot access network data. Controlling and managing the traffic network is another goal and if this goal is achieved, we can provide better services and offers to clients. Otherwise, our network is not online and clients cannot access network data.

We assume that, if we use prediction patterns, then we can control and manage traffic. Also, when we use traffic control and artificial intelligence methods, we can control the traffic network and increase the quality of service of the traffic network.

Multiple layers are shared for a series of services in one environment. In the input layer, the massive data of the traffic network is analyzed. In order to control and manage traffic, a series of special programs have been considered to increase user satisfaction. In addition, the network load bar can be also optimized, which requires a series of data centers for very large metrics and scales in order to be able to respond to requests. which requires a series of data centers for very large metrics and scales to be able to respond to requests.

In this layer, a series of phrases are considered to respond to the quality of service, and the problem that arises is related to the level of access. In this layer, calculations are performed flexibly and can be performed by systems (see Fig. 1).

Section 2 provides a review of previous work. Section 3 describes the proposed method. The method used in this study will be explained in Section 3. In Section 5, the traffic-based agent simulation is studied. The design of guidance control based on reinforcement learning (defining environmental states, actions, and rewards) is discussed in Section 3. Simulation results are investigated in Section 5 and finally, Section 5 is devoted to discussion and conclusion.

Section snippets

Related work

In 2020,  Wu et al. (2020) Proposed a multi-agent automated communication algorithm (MAAC), which is a consistent traffic light control method based on the multi-agent reinforcement learning (MARL) and an automated communication protocol in edge computing architecture. The MAAC algorithm in fact combines a multi-agent automated communication protocol with MARL, allowing a representative to establish learning strategies to achieve global optimization in traffic signal control. Considering an

Proposed algorithm

In the proposed framework, a series of services is considered in a common environment and a layer is defined for each service. In the input layer, which is a low-level and common layer of data services, massive traffic data analysis is performed. In addition, there is special planning under the cloud to control and manage traffic in order to increase user satisfaction. The network load bar can also be optimized. At very large scales, several data centers are needed to respond to requests.

In

The basic traffic agent simulation

Vehicles and traffic lights are considered as autonomous and intelligent agents. In this study, traffic lights have the ability of reinforcement learning and improve their behavior during the simulation. But vehicles are of the reactive type and lack the ability to learn and have fixed and predetermined behaviors. At each step of the simulation, the position, speed, acceleration of the vehicles and the phase of the traffic lights are updated. Depending on the amount of traffic flow for each

Discussion and conclusion

Given that we initially assumed that artificial intelligence and traffic control methods could help us control traffic, results show that our assumptions were correct and we could approve them.

Due to the high complexity of traffic, it is difficult to consider predetermined behaviors in the VANET network. Reinforcement learning as one of the appropriate methods to create learning in agents can be effective in this field.

One of the major challenges in applying Reinforcement learning in

CRediT authorship contribution statement

A. Zeynivand: Conceptualization, Methodology, Software. A. Javadpour: Data curation, Writing – original draft. S. Bolouki: Supervision, Investigation. A.K. Sangaiah: Visualization, Investigation. F. Ja’fari: Software, Reviewing. P. Pinto: Validation, Reviewing. W. Zhang: Writing – review & editing.

Acknowledgments

This work was supported in part by the Fundamental Research Funds for the Central Universities under Grant No. HIT.OCEF.2021007, the Shenzhen Science and Technology Research and Development Foundation under Grant No. JCYJ20190806143418198, the National Key Research and Development Program of China under Grant No. 2020YFB1406902, the Key-Area Research and Development Program of Guangdong Province under Grant No. 2020B0101360001, the Guangdong Provincial Key Laboratory of Novel Security

Ahmad Zeynivand Department of Electrical & Computer Engineering, Tarbiat Modares University, Tehran, Iran. His research field includes fog computing, Cloud computing, Vehicular ad hoc network and using AI in fog and cloud computing for resource allocation and optimization in smart city.

References (22)

  • AlekoD.R. et al.

    An efficient adaptive traffic light control system for urban road traffic congestion reduction in smart cities

    Information

    (2020)
  • Ahmad Zeynivand Department of Electrical & Computer Engineering, Tarbiat Modares University, Tehran, Iran. His research field includes fog computing, Cloud computing, Vehicular ad hoc network and using AI in fog and cloud computing for resource allocation and optimization in smart city.

    Amir Javadpour obtained his M.Sc. degree in Medical Information Technology Engineering from University of Tehran, Iran, in 2014. From Guangzhou University, China, he received a Ph.D. in Computer Science/Mathematics/Cybersecurity. In addition, he has published papers with his colleagues in highly ranked journals and several ranked conferences in several topics including Cloud Computing, Software-Defined Networking (SDN), Big Data, Intrusion Detection Systems (IDS), and the Internet of Things (IoT), Moving Target Defence (MTD), Machine Learning (ML) and optimization algorithms. Additionally, he reviewed papers for several reputable venues such as IEEE Transactions on Cloud Computing, IEEE Transactions on Network Science and Engineering, ACM Transactions on Internet Technology, the Journal of Supercomputing, several journals of Springer and Elsevier, etc. and also, he is the Technical Program Committee (TCP) Member of various conferences.

    Sadegh Bolouki He is an Assistant Professor of Electrical and Computer Engineering Sadegh Bolouki received the B.S. degree in electrical engineering from the Sharif University of Technology, Tehran, Iran, and the Ph.D. degree in electrical engineering from Polytechnique Montréal, Montreal, QC, Canada, in 2008 and 2014, respectively. Since September 2018, he has been an Assistant Professor of Electrical and Computer Engineering with Tarbiat Modares University, Tehran. He is also a Research Associate with the Department of Mechanical Engineering, Polytechnique Montréal. His experience includes postdoctoral tenures with the Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem, PA, USA, and a Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Champaign, IL, USA. His research interests broadly include the areas of machine learning, network science, and game theory.

    Professor Arun Kumar Sangaiah received the masters of engineering degree from Anna University, Chennai, India, and the Ph.D. degree from VIT University, Vellore, India. He is currently a Professor with the School of Computing Science and Engineering, VIT University. He is holding a Visiting Professor position in various universities around the globe. Furthermore, he has visited many research centers and universities in China, Japan, Singapore, and South Korea to join collaboration towards research projects and publications. His outstanding scientific production spans more than 300 contributions published in high standard ISI journals, such as IEEE Communication Magazine, IEEE Systems Journals, and IEEE Internet of Things (IoT). In addition, he has authored/edited eight books (Elsevier, Springer, etc.) and edited several special issues in reputed ISI journals, such as IEEE Communication Magazine, IEEE Transactions on Industrial Informatics, IEEE Internet of Things, ACM Transaction on Intelligent Systems and Technology, etc. He has also registered one Indian patent in the area of computational intelligence. His Google Scholar Citations reached more than 13300 with h-index: 66 and i10-index: 248. His research interests include E-learning, machine learning, software engineering, computational intelligence, IoT., Dr. Sangaiah is an Editorial Board Member and an Associate Editor for many reputed ISI journals. Furthermore, he is the recipient of many awards that include India-Top-10 Researcher award, Chinese Academy of Science-PIFI overseas visiting scientist award, etc.

    Forough Ja’fari is a Senior Researcher in cybersecurity and computer science. She received her Bachelor’s degree from Sharif University of Technology and her Master’s degree in Computer Network Engineering from Yazd University, Iran. She is a visiting scholar researcher at Guangzhou University, China. Cloud computing, Software-Defined Networking (SDN), Intrusion Detection Systems (IDS), Internet of Things (IoT), Moving Target Defence (MTD), and Machine Learning are some of her research interests. She is currently a Guest Editor (GE) of Cluster Computing (CLUS) Journal, as well as a reviewer for several journals and conferences.

    Pedro Pinto received the M.S. degree in communication networks and services from the University of Porto, Portugal, and the Ph.D. degree in telecommunications jointly from the Universities of Minho, Aveiro, and Porto, Portugal. He is currently a Professor, the Director of the M.S. degree in cybersecurity, and the Data Protection Officer with the Instituto Politécnico de Viana do Castelo, Portugal. He is also a member of the INESC TEC Research Institution. He has published and reviewed several papers in international conferences and journals, and his research interests include areas of computer networks, data privacy, and cybersecurity.

    Weizhe Zhang He is currently a Professor and Ph.D. Supervisor in School of Computer Science and Technology, Harbin Institute of Technology, China. He received his B.Eng, M.Eng and Ph.D. degree of Engineering in computer science and technology in 1999, 2001 and 2006 respectively from Harbin Institute of Technology. He has been a visiting professor at the Department of Computer Science, University of Illinois at Urbana-Champaign (UIUC), USA, from Aug. 2013 to Aug 2014. He has been a visiting scholar at the Department of Computer Science, University of Houston (UH), USA, from Aug. 2005 to Feb 2006. Dr. Zhang has published more than 240 scientific papers in the well-established journals including IEEE Transactions on Cloud Computing, IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Industrial Electronics, IEEE Transactions on Industrial Informatics, IEEE Internet of Things Journal, IEEE Transactions on Intelligent Transportation Systems, IEEE Transactions on Vehicular Technology, IEEE Transactions on Network Science and Engineering, and in the reputable conferences such as IEEE ICDM, IEEE ICPP, IEEE GLOBECOM, IEEE MSST, IEEE ICC, IEEE CLUSTER, IEEE IPDPS, IEEE ICPADS, ACM CIKM, IFIP NPC etc. He conducts research in cloud computing, high performance computing, parallel and distributed system, real-time computing and computer security. He is a senior member of the IEEE, Lifetime member of ACM, and distinguished member of CCF (China Computer Federation).

    View full text