Abstract
With the advancement of technology and the improvement of the standard of living, the number of vehicles gradually increases, and at the same time, the road problems worsen. This paper aims to synthesize the current researches regarding the prediction of dangers in traffic roads. A comparative study was therefore conducted to explore the various solutions and strategies employed to classify dangers and to address the most important techniques used. An experiment was also developed to classify the most used techniques in the same context and over a unique dataset. In addition, to synthesize several key points in the prediction of the dangers (in traffic road), this research allowed us to conclude that ML algorithms can predict the severity of road accidents with over 90.46% accuracy and 0.94 AUC. Random forest and XGboost are the best models for predicting freeway crashes.
1 Introduction
Intelligent transport systems ITS refer to the use of data and technologies advances for the administration of issues in traditional vehicle frameworks. They use innovative technologies in these areas to improve safety, effectiveness, efficiency, accessibility, and sustainability of the transport network. Nowadays, the increase in traffic and the number of vehicles creates new needs in terms of intelligent management [1]. This area (intelligent traffic management) is a very important role of ITS witch interests us. In fact, intelligent Transportation Systems can make a significant contribution to improving the management of road traffic (urban, inter-urban and peri-urban) and more generally of the various modes of transport.
In many cases, it is difficult to fully understand the characteristics of the transport system and the relationships between them; AI allowed the development of many intelligent solutions for complex problems like complex systems that cannot be managed using traditional methods. Numerous scientists have exhibited the benefits of AI in transport. An example of that includes:
-
Accident forecasting.
-
Identification of the factors contributing to accidents: human, environmental, traffic, and pavement related factors.
-
Traffic management by forecasting traffic flow, traffic conflicts, and speed.
-
Prediction of accident severity.
-
Detection and identification of obstacles, crashes, cracks, imperfections, and aberrations.
-
Classification of driver behavior.
Road traffic accidents are among the most critical problems facing the world as they cause many deaths and injuries every year as well as economic losses. Accurate models for predicting accident risks such as the severity of traffic accidents are an essential task for transportation systems.
The objective of this works is to investigate the potentials of AI to identify and predict the types of risks in traffic to reduce their incidence.
The rest of the paper is organized as follows. Section 2 exposes the related works in the field of accident risk prediction. Section 3 presents a comparative study between those works and details their analysis and discusses their most important results. Section 4 develops an experimental study between the most relevant AI techniques used in this area over the same dataset and context. Finally, conclusions are drawn and future research directions are indicated in Sect. 5.
2 Related Work
Highway safety remains a challenge for researchers and practitioners. Therefore, accident risk prediction helps to investigate and resolve these issues, and to improve operations and safe. In this section, we investigate variants research papers conducted in this context. In [2] authors aimed to predict the risk of accidents on highways taking into account both the impact of critical events for safety and traffic conditions. Based on the data from the simulation experiment, a new accident risk forecasting model based on traffic conflict technique TCT (collision time) and BPNN back propagation neural network is introduced. The results are the case of risk and risk-free status; we have the TPR, AUC and accuracy 70.79%, 0.75 and 67.79%, and in the high and low risk case, the values of TPR, AUC and precision are 82.33%, 0.93 and 86.62%. More [3] proposes a BN-RF model to predict the risk of an accident in real time using data collected on the highway. The proposed method uses random forest (RF) to rank the importance of explanatory variables according to the Gini index and Bayesian network (BN) is built to predict the crash in real time. The corresponding performance is evaluated by the ROC curve, where they found 88.35%. [4] addresses the problem of vehicle accident risk prediction, and the Adaboost trichotomy algorithm using in the field of VANET-Big Data extraction, with SMOTE and One-Hot encoding (AdaBoost-SO) to achieve the vehicle accident risk prediction model. Article [5] attempts to create a new type of real-time traffic accident prediction model to solve the problem of unbalanced data. The unbalanced classification algorithm is optimized in three aspects: At the output level, at the data level, and at the MLP algorithm level. Study [6] aimed to develop a model for predicting accident risks in real time on highways in foggy conditions. Bayesian logistic regression models of three time slices before crashes were established to estimate the relationship between the binary response variable (crash or not crash) and the input variables. In [24] proposes an LSTM-based framework considering traffic data of different temporal resolutions (LSTMDTR) for crash detection on freeways. The traffic conditions before an accident are defined as preconditions for the accident, and the traffic conditions according to specific criteria, which do not lead to an accident, are defined as normal traffic conditions. The detection model is designed to rank the pre-crash conditions against normal traffic conditions. Therefore, collision samples (i.e., pre-crash conditions) and non-collision samples (i.e., normal traffic conditions). The article [7] proposes a model for predicting the risk of accident in real time on arteries more specifically on road segments, using a long-term convolutional neural network (LSTM-CNN). In addition, the objective of research [8] is to use deep learning models to detect the occurrence of an accident. The work focuses on designing deep neural networks and performing experiments on parameter tuning, choice of data sets, to compare with traditional machine learning models. In [25] Logistic Regression (LR), Decision Tree (DT), Naive Bayesian Classifiers (NB), Random Forest (RF), K-Nearest Neighbor Algorithm (k-NN), Bayesian Logistic Regression (BLR) and Vector Machine support (SVM) are used to predict road accidents based on driving simulation. Then select the most accurate predictive model that will help reduce these highway crashes. Based on the area under the ROC curve, the random forest was approximately 0.826. In [26] the study aims to develop a model for detecting the risks of driving in the deceleration zone on a motorway with generalized regression neural network (GRNN) and ANOVA as a feature selection method.
In addition to all these works, other researchers take into account, inter-domain data, and spatiotemporal dependencies. In [9], the objective is to study how the deep learning approach spatiotemporal convolutional long short-term memory network (STCL-Net) contributes to the prediction of short-term collision risks at the city scale by exploiting multi-source data sets. The data collected was divided into three types of data: The type of variables is only spatially varied but temporally static during the study period, the type of variables varies only temporally but spatially static during the study period and ultimately type of variables varies both in space and in time during the study period. In [10] applies and compares two statistical methods, the negative binomial (NB) model and a random negative binomial (RENB) model, in predicting the number of accidents by considering the unobserved heterogeneity in accident data, and identifies the key factors involved in the accident to improve road safety. The article [11] proposes a multi-view spatiotemporal deep learning framework to merge inter-domain urban data. In addition, a ResNet based multitasking learning framework with a speed inference model to realize the prediction of the accident risk distribution in the near future. [12] Presents an important feature of traffic accidents the spatiotemporal correlation, and then built a deep learning model (LSTM) for the prediction of traffic accident risk based on the spatiotemporal correlation model.
The behavior of drivers and their emotional states strongly influence the efficiency of driving and can contribute to the increase of the various transport problems. Several works have studied these aspects. Taxi drivers around the world often have very long driving hours and experience frequent fatigue. These conditions are associated with a high prevalence of fatigue and accidents. The objective of article [13] is to provide a validated prediction model that helps to understand the association of taxi driver fatigue-related accident risk (FRAR) and related factors. They used self-reported data on fatigue-related accidents; the validity of these data was questioned in the questionnaires. They adopted logistic ridge regression (LRR), logistic regression (LR), and support vector machine (SVM) methods to fit and validate the models.
Speed has a direct influence on the frequency and severity of accidents. In [14] bidirectional recurrent neural network BRNN is good at processing time series data for speed prediction and comparing with long short-term memory (LSTM) model and gated recurrent unit (GRU) model.
Accurate and real-time forecasting of traffic flows plays an important role in the construction of intelligent transport systems and in the control and induction of traffic. Article [19] focuses on forecasting short-term traffic flows for the next given period (5 min). They used (k-means) for grouping historical data according to their models. Subsequently the predictor is formed for each group based on a CNN and a LSTM. The objective of the article [20] is to try to build a deep architecture for the prediction of traffic flow (15 min) which can solve the problem of incomplete data and learn the spatiotemporal correlations on the traffic network from the deep hierarchical representation of the characteristics. Article [21] presents a residual deconvolution based deep generative network (RDBDGN) to manage the problem of predicting long-term traffic flows on elevated highways. [22] Attempts to take into account the impact of precipitation when predicting traffic flow with deep bi-directional long short-term memory (DBL) model.
Predicting the severity of traffic accidents is an essential task for transportation systems. The main objective of this study [15] is to build a model to classify the severity of injuries and to select a set of factors influencing the severity of road accidents also in [16]; three data mining models were applied to provide a comprehensive analysis of the risk factors related to the severity of road accidents.
3 Comparative Study and Discussion
3.1 Comparative Study
In this section, the solutions discussed above are compared in Table 1. Our objective is to compare the proposed solution depending of many characteristics, which are:
-
Year.
-
Task: the paper objective.
-
Dataset: which cover the data source and type, its size, even if the dataset is static or dynamic (in this case we specify if the stored data change depending on time, space or both of them).
-
Technical: the approach synthesis.
-
Comparison: the ML or DL technique compared.
-
Validation: the model performance.
3.2 Discussion
This researches analysis shows that, the risk in traffic takes different contexts. A large number of works focus on the forecasting and identification of accident risks, classification of driver behavior and accident severity, the identification of factors contributing to accidents, the conflict prediction and traffic flow.
For the traffic flow, this study shows that, the purpose of predicting traffic flow is to predict the number of vehicles in a given time interval based on historical traffic information and numerous researches suggest that 15 min are considered a typical time interval.
The traffic conditions at five-minute intervals before an accident are very sensitive to the prediction of accident risk.
Prediction of road safety risks generally belongs to the category of supervised learning objects. In addition, the problem of creating an accident risk prediction model is a typical problem of classifying unbalanced data, which means that the number of majority class samples is much higher than that of the minority class samples.
The traffic data of a single time resolution cannot fully represent the traffic trend and dynamic transitions at different time intervals, to solve this problem; we can take into account the traffic data of different spatiotemporal resolutions.
Even if, the prediction of the occurrence of an accident may not only be associated with the characteristics of the traffic, other factors such as weather conditions or human behavior may also contribute to an accident. However, Collision cases should contain information about the date and time of the accident, duration, location, severity of the accident (collected at the same time, in the same place, on the same day of the week, and in the same season as the collision event), the effect of location and weather on traffic conditions can be effectively eliminated.
Traffic data analysis can help capture, the spatial dependencies and extract the spatial characteristics from variables that are only spatially varied but temporally static during the study period. The temporal dependencies and extract the temporal characteristic from the type of variable varies only temporally but spatially static during the study period. And also, the spatiotemporal characteristics from the variables vary both spatially and in time over the period of study.
At the algorithm level, we can conclude that:
Unlike the static data, which can be efficiently trained using Machine Learning algorithms, dynamic data (spatially and or temporarily dependent) need more sophisticated algorithms such as Deep Learning.
Predicting collision risks involves matching each type of data to the appropriate algorithm and then merging the results.
Training learning algorithms parallelly may work better than sequentially, other interesting alternative can be testing different models and then combining the models to improve performance.
The most widely used algorithm for risk prediction is LSTM. Figure (see Fig. 1) shows the eight most used algorithms. The most important performance found was 0.93 AUC.
At the technical level, the dropout technique can reduce overfitting and improve the generalizability of the model. Besides, release layers are added to avoid overfitting. Furthermore, the most used performance measures for risk prediction are accuracy, AUC, and RMSE.
In the next section, we develop an experimental study to explore the different solutions and strategies employed to classify risk severalty and to address the most important techniques used.
4 Experiments
The purpose of this experiment is to detect the most performant technique used to predict accident risk; our goal is to build the classification rules for prediction of the best performing model.
This section discusses the methods used in this research study, including data description, preprocessing, building the classification models, and extracting the required knowledge.
4.1 Dataset and Preprocessing
The UK governmentFootnote 1gathers and distributes (normally on a yearly premise) nitty-gritty data on road accidents the nation over. This information includes, but is not limited to, geographic locations, weather conditions, vehicle type, number of casualties, and vehicle maneuvers, making it a very interesting and comprehensive data set for analysis and research. The data comes from the UK government’s Open Data website, where the Department for Transport published it.
Our objective is to predict the severity of the accident. This severity is divided into two categories, fatal or severe and slight. We had more than 2 million observations and nearly 60 characteristics. We therefore sampled the data in approximately 146342 observations and 34 characteristics in 2014.
The data preparation was carried out before each construction of the model. The process includes various steps including cleaning, standardization, feature selection, and transformation. The reaction variable was paired with one (1) demonstrating lethal or extreme mishap while zero (0) showing minor mishap.
Major causes of accidents include speed limit, weather, road surface conditions, light conditions, road hazards, type of roads, pedestrian crossings - Physical facilities and pedestrian crossings - Human and other controls.
The data was partitioned into 70% learning and 30% testing. To overcome the problem of unbalanced data, we have used the SMOTE resampling strategy as there was a imbalance in the percentage of fatal and severe injuries compared to other injuries. The SMOTE algorithm generates synthetic positive instances to increase the minority class proportion [17].
4.2 Classification Models and Performance Measurement
To develop the injury severity prediction model, we have used the most used and performant techniques founded in our literature review. The following classification techniques were studied:
-
Machine learning: Decision tree, logistic regression, random forest, XGboost and Naive Gaussian Bayesian.
-
Deep learning: LSTM, bidirectional LSTM, GRU, Bi-GRU.
To assess the performance of our models for classification problems and detect the most important ones, many metrics exist. In this study, we used: Accuracy, precision, sensitivity or recall, F1_score, ROC curve, and area under the receiver operating characteristic curve (AUC) to better perform this comparison. The following Table 2 shows the results for each model.
Precision is a measure of accuracy or quality, while recall is a measure of completeness or quantity. Bi-LSTM achieves the highest precision among all classifiers of 86.26%, and the XGboost classifier achieves the greatest recall of 99.51%. The Random Forest classifier achieves the highest 90.46% accuracy among all classifiers for the test set with Smote.
AUC is the area under the ROC curve and is a ratio between 0 and 1, where a value of 1 indicates a perfect classifier, while a value close to 0.5 indicates a bad model, since this is equivalent to a random classification [18]. RF and XGboost algorithms are the best classification models with 0.94. These encouraging AUCs provide statistical evidence for the excellent classification ability of Random Forest and XGboost in this study. All the results are illustrated in the following figures (see Fig. 2 and Fig. 3).
The test result shows that based on the confusion matrix, RF and XGboost seem to perform better than the other models. Based on the area under the ROC curve, RF and XGboost had an area of approximately 0.94.
5 Conclusions
Highway safety has remained a challenge for researchers and practitioners. Therefore, accident risk prediction helps to investigate and resolve safety issues, and helps to improve operations and safety management.
In this article, we went through a deep study and analysis of the articles, comparisons of different approaches, and finalized by an experiment to evaluate the performance of the most used techniques to predict accident risks.
From the comparative study of the articles, we came out with many conclusions about the problematic (prediction of accident risks), data, algorithms and technique. In the experimentation, we used a UK Government database with 146,342 accidents to predict accident severity. This study investigated the efficiency of machine learning and deep learning algorithms to build classifiers that are precise and reliable. It shows that based on the area under the ROC curve and confusion matrix the RF and XGboost algorithms seem to perform better than the other models 90.46% accuracy and 0.94 AUC.
For future work, we plan to exploit multi-source datasets to analyze spatial and temporal dependencies to help provide a different comparison for predicting accident risk. On the other hand, the prediction of accident risk may not only be associated with traffic data, but human behavior could also contribute to an accident. Our objective is also to study, analyze and classify the behavior of drivers through the detection of facial emotions.
Notes
- 1.
References
Mfenjou, M.L.: Methodology and trends for an intelligent transport system in developing countries. Sustain. Comput. Inf. Syst. 19, 96–111 (2018). https://doi.org/10.1016/j.suscom.2018.08.002
Wang, J., Kong, Y., Fu, T.: Expressway crash risk prediction using back propagation neural network: a brief investigation on safety resilience. Accid. Anal. Prev. 124, 180–192 (2019). https://doi.org/10.1016/j.aap.2019.01.007
Wu, M., Shan, D., Wang, Z., Sun, X., Liu, J., Sun, M.: A Bayesian network model for real-time crash prediction based on selected variables by random forest. In: ICTIS 2019 - 5th International Conference on Transportation Information and Safety, pp. 670–677 (2019). https://doi.org/10.1109/ICTIS.2019.8883694
Zhao, H., Yu, H., Li, D., Mao, T., Zhu, H.: Vehicle Accident risk prediction based on AdaBoost-SO in VANETs. IEEE Access. 7, 14549–14557 (2019). https://doi.org/10.1109/ACCESS.2019.2894176
Peng, Y., Li, C., Wang, K., Gao, Z., Yu, R.: Examining imbalanced classification algorithms in predicting real-time traffic crash risk. Accid. Anal. Prev. 144, 105610 (2020). https://doi.org/10.1016/j.aap.2020.105610
Zhai, B., Lu, J., Wang, Y., Wu, B.: Real-time prediction of crash risk on freeways under fog conditions. Int. J. Transp. Sci. Technol. 9, 287–298 (2020). https://doi.org/10.1016/j.ijtst.2020.02.001
Li, P., Abdel-Aty, M., Yuan, J.: Real-time crash risk prediction on arterials based on LSTM-CNN. Accid. Anal. Prev. 135, 105371 (2020). https://doi.org/10.1016/j.aap.2019.105371
Huang, T., Wang, S., Sharma, A.: Highway crash detection and risk estimation using deep learning. Accid. Anal. Prev. 135, 105392 (2020). https://doi.org/10.1016/j.aap.2019.105392
Bao, J., Liu, P., Ukkusuri, S.V.: A spatiotemporal deep learning approach for citywide short-term crash risk prediction with multi-source data. Accid. Anal. Prev. 122, 239–254 (2019). https://doi.org/10.1016/j.aap.2018.10.015
Yan, Y., Zhang, Y., Yang, X., Hu, J., Tang, J., Guo, Z.: Crash prediction based on random effect negative binomial model considering data heterogeneity. Phys. A. Stat. Mech. Appl. 547, 123858 (2020). https://doi.org/10.1016/j.physa.2019.123858
Zhou, Z., Chen, L., Zhu, C., Wang, P.: Stack ResNet for short-term accident risk prediction leveraging cross-domain data. In: Proceedings - 2019 Chinese Automation Congress, CAC 2019, pp. 782–787 (2019). https://doi.org/10.1109/CAC48633.2019.8996483
Ren, H., Song, Y., Wang, J., Hu, Y., Lei, J.: A Deep Learning Approach to the Citywide Traffic Accident Risk Prediction. In: IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC. 2018-Novem, pp. 3346–3351 (2018). https://doi.org/10.1109/ITSC.2018.8569437
Li, M.K., Yu, J.J., Ma, L., Zhang, W.: Modeling and mitigating fatigue-related accident risk of taxi drivers. Accid. Anal. Prev. 123, 79–87 (2019). https://doi.org/10.1016/j.aap.2018.11.001
Bohan, H., Yun, B.: Traffic flow prediction based on BRNN. In: ICEIEC 2019 - Proceedings of 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (2019)
Almamlook, R.E., Kwayu, K.M., Alkasisbeh, M.R., Frefer, A.A.: Comparison of machine learning algorithms for predicting traffic accident severity. In: 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology, JEEIT 2019 - Proceedings. 272–276 (2019). https://doi.org/10.1109/JEEIT.2019.8717393
AlKheder, S., AlRukaibi, F., Aiash, A.: Risk analysis of traffic accidents’ severities: an application of three data mining models. ISA Trans. 106, 213–220 (2020). https://doi.org/10.1016/j.isatra.2020.06.018
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. (2002). https://doi.org/10.1613/jair.953
James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning with application in R (2013)
Ma, D., Sheng, B., Jin, S., Ma, X., Gao, P.: Short-term traffic flow forecasting by selecting appropriate predictions based on pattern matching. IEEE Access. 6, 75629–75638 (2018). https://doi.org/10.1109/ACCESS.2018.2879055
Liao, S., Chen, J., Hou, J., Xiong, Q., Wen, J.: Deep convolutional neural networks with random subspace learning for short-term traffic flow prediction with incomplete data. In: Proceedings of the International Joint Conference on Neural Networks, 1–6 July 2018. https://doi.org/10.1109/IJCNN.2018.8489536
Zang, D., Fang, Y., Wei, Z., Tang, K., Cheng, J.: Traffic flow data prediction using residual deconvolution based deep generative network. IEEE Access. 7, 71311–71322 (2019). https://doi.org/10.1109/ACCESS.2019.2919996
Wang, J., Hu, F., Xu, X., Wang, D., Li, L.: A deep prediction model of traffic flow considering precipitation impact. In: Proceedings of the International Joint Conference on Neural Networks, July 2018. https://doi.org/10.1109/IJCNN.2018.8489033
Formosa, N., Quddus, M., Ison, S., Abdel-Aty, M., Yuan, J.: Predicting real-time traffic conflicts using deep learning. Accid. Anal. Prev. 136, 105429 (2020). https://doi.org/10.1016/j.aap.2019.105429
Jiang, F., Yuen, K.K.R., Lee, E.W.M.: A long short-term memory-based framework for crash detection on freeways with traffic data of different temporal resolutions. Accid. Anal. Prev. 141, 105520 (2020). https://doi.org/10.1016/j.aap.2020.105520
Al Mamlook, R.E., Ali, A., Hasan, R.A., Mohamed Kazim, H.A.: Machine learning to predict the freeway traffic accidents-based driving simulation. In: Proceedings of the IEEE National Aerospace Electronics Conference, NAECON, July 2019, pp. 630–634 (2019). https://doi.org/10.1109/NAECON46414.2019.9058268
Qi, W., Wang, Z., Tang, R., Wang, L.: Driving risk detection model of deceleration zone in expressway based on generalized regression neural network. J. Adv. Transp. (2018). https://doi.org/10.1155/2018/8014385
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bouhsissin, S., Sael, N., Benabbou, F. (2022). Prediction of Risks in Intelligent Transport Systems. In: Lazaar, M., Duvallet, C., Touhafi, A., Al Achhab, M. (eds) Proceedings of the 5th International Conference on Big Data and Internet of Things. BDIoT 2021. Lecture Notes in Networks and Systems, vol 489. Springer, Cham. https://doi.org/10.1007/978-3-031-07969-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-07969-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07968-9
Online ISBN: 978-3-031-07969-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)