1 Introduction
According to the
National Highway Traffic Safety Administration (NHTSA), the number of auto accidents has increased annually from 5.419 million in 2010 to 6.756 million in 2019—a 20% increase over the last 10 years. Of the 6.756 million accidents in 2019, 41% resulted in injuries or fatalities [
1]. This trend is likely to continue to as the number of vehicles on the road increases.
Researchers have used a variety of approaches to investigate causes of auto accidents. For example, in a 1980 report, Treat [
2] examined over a five-year period how frequently various human, environmental, and vehicular factors were involved in traffic accidents by studying 13,568 police-reported accidents, of which 2,258 were investigated on-scene by technicians and 420 by a multidisciplinary team. Human errors were identified as definite causes in 70.7% of the accidents, environmental factors in 12.4%, and vehicular factors in 4.5%. In 20% of the cases, no definite cause was identified. A taxonomy of direct human causes was developed based on an information-processing model of the driver as a vehicle controller. Singh [
3] analyzed data from 5,470 crashes over the period from July 3, 2005 to December 31, 2007. Driver, vehicle, and environment-related information were collected at crash scenes as part of the National Motor Vehicle Crash Causation Survey, which was conducted by the U.S. National Highway Transportation Safety Administration (NHTSA). The last event in the crash causal chain—a.k.a, the critical reason for the crash—was attributed to the driver in 94 percent (±2.2%) of the crashes, to failure or degradation of a vehicle component in 2 percent (±0.7%) of the crashes, and to the environment (slick roads, weather, etc.) in 2 percent (±1.3%) of crashes. Recognition errors accounted for about 41 percent (±2.1%), decision errors 33 percent (±3.7%), and performance errors 11 percent (±2.7%) of crashes. Dingus et al. [
4] used a naturalistic driving dataset of 905 injurious and property damage crashes; they found driver-related factors—such as error, impairment, fatigue, and distraction—in almost 90% of crashes.
Environmental factors including weather conditions (rain, sleet, snow, and fog) and road pavement conditions (wet, snowy/slushy, or icy) can cause major accidents. On average, nearly 5,000 people are killed and over 418,000 people are injured in weather-related crashes each year [
5]. According to NHTSA statistics, the top environmental factors leading to collisions are slick roads (50%) and glare (17%) [
3].
In short, major causes of accidents include human error and environment factors, with human error sharing about 71–90% [
2–
5]. Major categories of human error factors include speeding, distraction, fatigue, and drunk driving. Major categories of environmental factors include road conditions and weather conditions [
5]. In this study, we broaden the scope of distraction to include fatigue and drunk driving, and broadened road conditions to include weather conditions, since snow and rain contribute to degradation of road conditions
. Therefore, speeding, distraction, and road conditions were considered primary factors in designing the driving advisor tool described in this study.2 Literature Review
This section briefly reviews research on advanced driver assistance technologies. In addition, because successful use of these technologies requires human drivers to have appropriate levels of trust in the technology, research related to human drivers’ interactions with driver assistance systems is also reviewed.
Advanced driver assistance systems.
Advanced driver assistance systems (ADAS) are active automotive safety systems that utilize advanced sensors such as cameras, radar, lidar, and map databases, comprised of a hardware layer for sensing and a software layer of intelligence for post processing and decision making [
6]. ADAS are often classified by the level of automation they achieve, using the
Society of Automotive Engineers (SAE) scale, which ranges from 0 (No automation) to 5 (Full Automation) [
6–
7]. While there has been significant
research and development (R&D) activity at levels 4 and 5, most ADAS systems currently in the market are between levels 2 and 3; [
6,
8].
Yi et al. [
9] suggest that driving assistance systems can be classified into three categories: (1) safe driving systems—such as adaptive cruise control, lane keeping, collision avoidance—which focus on the vehicle; (2) driver monitoring systems, which monitor drivers and warn them about abnormal driving behaviors and cognitive states; and (3) in-vehicle information systems that provide information and services for the driver, such as directions and traffic conditions. These applications have been implemented using a variety of technologies for sensing and perception (cameras, radar, lidar) and decision-making (artificial intelligence, machine learning and data fusion) [
10–
15].
The bulk of ADAS efforts reported in the literature have focused on enhancing vehicle capabilities with a view toward achieving level 5. However, all currently available ADAS applications require a human driver to be alert and ready to take control if needed. It appears humans will continue to be involved in driving for the foreseeable future. However, increasing levels of driving automation introduces new complexities to human interactions with cars and can be a double-edged sword [
16–
18]. For example, studies with level 3 vehicles have found that situations in which the driver must manually take over control from the automated mode
increase collision risk with surrounding vehicles [
13].
ADAS technologies often provide other types of driver assistance in addition to autonomous driving. For example, Tesla's Model X emits a tone or beep to alert drivers when their hands are not on the steering wheel. Honda and Jaguar have projects to detect a driver's mental state based on factors such as facial expressions, voice, heart rate, and respiration rate [
9]. However, Yi et al. [
9] note that these systems are generic—based on models developed from behavior of many different drivers—not personalized to individual drivers.
Needed are adaptive technologies that can help drivers of autonomous vehicles avoid crashes based on multiple real-time data streams. For example, one way of assisting drivers is to provide adaptive speech-based advice as needed, such as telling the driver to speed up, slow down, or stop. This guidance can be based on external factors (such as road or weather conditions), vehicle factors (such as speed and lane keeping), and indicators of the driver's internal state (such as fatigue). Trust in the ADAS can also be a consideration.
Trust. A fundamental issue affecting human interactions with autonomous vehicles is trust [
19]. To successfully interact with an ADAS, a human driver needs to have an appropriate level of trust in the system, known as
calibrated trust [
20]. Too much trust can cause the human to fail to intervene when the system is performing incorrectly. Too little trust fails to leverage the benefits of the system. If human involvement is required, an ADAS should be able to assess how much trust the driver/operator is placing in the system and to consider trust in determining how to provide driver assistance.
Learned trust is a construct that captures how trust evolves over time from initial introduction of an agent to experiencing interaction with an agent to longer‐term interactions [
21].
Situational trust is the construct that captures how trust changes based on the external environment (i.e., road types, road conditions, traffic, weather) and internal dynamic characteristics of the operator (mood, attentional capacity, self‐confidence) [
21]. Recently, learned and situational trust were specifically mapped onto measures of automated driving [
19].
New research has also more carefully mapped how specific task behaviors such as operator interventions and verification and response time during automation use can correspond to trust behaviors [
22]. Specifically to the driving domain, braking is a common way to disengage automated driving and parking and thus can serve as an indicator of distrust. Researchers have further confirmed this in the lab by using braking frequency and magnitude as indicators of distrust in automated driving styles [
23]. Using a real Tesla vehicle, researchers have used driver interventions, through braking, to show that distrust decreased with multiple uses of the automated parking system [
24] and how distrust decreased by showing how the system worked compared to being told how it worked [
25].
Previous research has extensively modeled the relationship between trust, reliance, and ultimate use of automated and robotic systems [
16,
20–
22,
24–
27]. Early work focused on the relationship between machine accuracy, operator self-confidence, reliance, and trust [
16,
17,
28–
31]. For example, relative trust (trust – self-confidence) was shown to predict reliance on the automation [
29]. Other work showed that trust in the system was higher when automation is more accurate or reliable [
30–
32]. Dynamic models of trust calibration mapping different stages of interaction prior to interaction and during interaction have been carefully mapped out [
20,
21,
27]. Subsequent work endeavored to chart the antecedents of trust in automated and robotic systems broadly classifying important factors including the machine, the human and the environment and context [
26,
33,
34]. Critically, recent work has provided a direct mapping of theoretical trust concepts originally conceived of by Mayer et al. [
35] directly onto the many measures of trust—including self-report, behavioral and physiological indices—that have developed over the last several decades. As indicators of risk taking, behavioral measurements—such as interventions, verification behaviors, reliance, and response time—can be indicators of the trust relationship [
22]. A more tailored approach specifically to trust in automated driving was recently detailed [
19].
Assessing trust. A number of techniques have been developed to measure trust in automated systems such as self-driving vehicles and robots [
22,
30,
33,
36,
37]. Survey instruments that collect self-reports of perceptions of trust—such as the
Trust Of Automated Systems Test (TOAST) [
34] and the
Multi-Dimensional Measure of Trust (MDMT) [
38,
39]—have been most commonly used. Physiological measures such as eye-tracking, EEG, and galvanized skin response have also been used [
24,
25,
40,
41]. In addition, when using a vehicle (rather than a simulation), telemetry data such as location, turning, braking, acceleration, and lane keeping can be used to assess driving behavior.
Vehicle telemetry data is the most ecologically valid way of collecting data related to trust, but as of this writing, there have been relatively few reports of telemetry data being used to assess trust in ADAS. Trust in automation of autonomous vehicles can be described for individual features such as autopilot, cruise control, turning, braking, acceleration, and lane keeping. The Tesla Model X controller records and broadcasts this type of information in real-time via its
Controller Area Network (CAN) bus architecture. These data can be accessed via an
On-Board Diagnostics (OBD) port in real time and used to assess if a driver is under or over- trusting their vehicle's capabilities. Sensory devices, such as Tobii eye tracker or Mobileye vision system, can also be used to detect lanekeeping and distracted driving [
19,
42].
Situation awareness. Situation awareness is an important factor that determines driving performance. For example, a recent computational approach for modeling driver's intent using naturalistic driving data demonstrated that lane change performance improved for drivers that checked their mirrors for more than six seconds [
43], an approximate measure for situation awareness. Driving with automated vehicles might raise unique issues such as drivers finding themselves “out of the loop” [
44] or with situation awareness being affected while driving with different levels of automated assistance [
45]. A review found that situation awareness can deteriorate during adaptive cruise control and highly automated driving when engaged in non-related driving tasks but can improve if drivers are motivated, instructed to pay better attention, or receive feedback [
46]. More recent work, investigating driving with real automated vehicles on the road, has demonstrated reduced situation awareness of the automated vehicle, increased complacency, and over-trust in automation [
45,
47].
Summary. The majority of ADAS efforts reported in the literature have focused on enhancing vehicle capabilities with a view toward achieving fully automated driving. However, all currently available ADAS applications require a human driver to be alert and ready to take control if needed. Partially automated driving introduces new complexities to human interactions with cars and can even increase collision risk. A better understanding of drivers’ trust in automation may help reduce these complexities.
Techniques for measuring trust in automated systems include use of surveys to collect self-reports of perceptions of trust; use of physiological measures such as eye-tracking, EEG, and galvanized skin response; and, in the case of autonomous driving, use of vehicle telemetry data such as location, turning, braking, acceleration, and lane keeping. Although vehicle telemetry data is the most ecologically valid way of collecting data related to trust, there have been relatively few reports of vehicle telemetry data being used to assess trust in ADAS. Needed is research on the feasibility of using vehicle telemetry data to understand the driver's state of mind.
In addition, although some ADAS technologies provide other types of driver assistance—such as a tone or beep to alert drivers when their hands are not on the steering wheel—these systems are not personalized to individual drivers. Needed are adaptive technologies that can help drivers of autonomous vehicles avoid crashes based on multiple real-time data streams.
The objectives of this research are to (1) identify sensory information and vehicle telemetry data needed to increase driving safety; (2) propose an architecture for an adaptive assistant that can provide verbal guidance to drivers of autonomous vehicles; (3) develop multi-stage sensor fusion models to provide adaptive assistance for drivers; (4) evaluate the models using in-field and simulated data; and (5) suggest future work on adaptive assistance for drivers of autonomous vehicles.
3 Architecture for Adaptive Autonomous Driving Advisor
The overall goal of this research is to develop an Adaptive Autonomous Driving Advisor (AADA) that can provide adaptive speech-based advice as needed, such as telling the driver to speed up, slow down, or stop. AADA will be built upon an existing data acquisition and measurement system called Ergoneers. Ergoneers is a custom-built PC-based system that includes multiple communication ports as well as CAN bus ports. Sensory devices such as GPS, Tobii eye tracker, Mobileye Camera, and the Tesla CAN bus cable can be integrated for acquisition of both vehicle data and driver physical data such as eye and head movements.
AADA will be based on the factors identified in the Introduction: Speed, Road Conditions, Distraction, and Trust. To acquire this information, data from Tesla's CAN bus, GPS, Tobii eye tracker, and Mobileye camera will be utilized and integrated to trigger the AADA to provide appropriate voice instructions via a multistage modeling approach. Figure
1 outlines which sensors will provide which kinds of measurements and information and how the information will be fused via Stage I and Stage II models. Stage I will include four models: a linear model to measure the speed condition (i.e., speeding, normal, or below speed limit), two weighted utility models to predict road conditions and driver distraction, and an
Artificial Neural Network/Support Vector Machine/Random Forest (ANN/SVM/RF) model to predict trust. Stage II will use an ANN/SVM/RF model to integrate the outputs from the Stage I models and trigger appropriate voice instructions.
In this paper, we focus on the development of the adaptive ANN/SVM/RF models used in Stages I and II.The ultimate goal is to develop an adaptive sensor fusion algorithm that can improve its performance as the amount of data increases, since the Adaptive Autonomous Driving Advisor can be used by the same individual over time.
Artificial neural networks (ANN) and
support vector machines (SVM) were used to develop the machine learning algorithms due to their ability and cost efficiency when handling regressions with high dimensional, non-linear, covariant inputs [
49–
52]. The
Random Forest (RF) method was also used due to its reputation for robustness to real-world data.
Artificial neural networks are widely recognized supervised learning machine algorithms that can be used to correlate hypernonlinear problems [
53,
54]. However, ANN models have some drawbacks. Most notably, they require considerable training time to make accurate predictions, and they typically fail to measure “unknown” data due to their stochastic nature [
55,
56]. Therefore, a deterministic non-linear regression method may be preferred when limited training data are available.
Support vector machines, another family of supervised learning models used in classification and regression analyses, are deterministic [
57,
58]. The support vector machine is intended to be a robust tool for classification and regression in noisy, complex domains. The two key features of support vector machines are generalization theory, which leads to a principled way to choose a hypothesis; and kernel functions, which introduce non-linearity in the hypothesis space without explicitly requiring a non-linear algorithm [
59].
A principal difference between SVMs and ANNs lies in risk minimization mechanics [
60–
62]: SVMs employ the
structural risk minimization (SRM) principle to minimize an upper bound on the expected risk, whereas ANNs apply traditional
empirical risk minimization (ERM) to training data. In several fields [
49,
55,
56,
60–
65], SVM models are more robust and deterministic than ANNs while SVM predictions are comparable to ANN results. However, SVM model accuracy levels depend heavily on the experimental data used.
Random Forest is an ensemble machine learning method that is a collection of individual decision trees, resulting in many predictions; the majority of the predictions is used to obtain the final prediction [
50,
51]. Decision trees split on features to create decision boundaries based on the gini impurity measure, a popular algorithm to optimally split nodes. Each decision tree randomly permutes the order of features, resulting in different best splits to create distinct trees with their respective predictions. Random Forests inherently perform feature selection as well as reduce overfitting on the training set due to taking a majority vote across all the decision trees. In addition, RF methods require relatively little configuration to obtain a high accuracy.
4 Stage I Model Development: Predicting Trust in Automation
The focus of the Stage I model development process was on developing an ANN/SVM/RF model for using vehicle data to predict a user's trust in automation. Hoff & Bashir propose a framework consisting of three types of trust: Dispositional Trust, Learned Trust, and Situational Trust [
21]. Madison et al. describe how Hoff & Bashir's framework might be applied in the context of driving automation [
19]. Dispositional trust considers user characteristics such as age, personality, tendency to take risks, and attitudes toward automation. Learned trust is trust based on past experience with a specific system. Situational trust varies based on the external environment and the internal state of the driver. For example, situational trust in an ADAS might vary based on the driver's perception of the vehicle's ability to perform under certain driving scenarios. As part of the modeling process, we also conducted experiments in realistic driving conditions with a Tesla Model X to assess the applicability of Hoff & Bashir's framework within the context of driving automation.
4.1 Experiment Setup
In June-July 2021, nine subjects participated in designed experiments. Subjects were males between the ages of 18–22 with no prior experience with the Tesla autopilot. The Tesla vehicle was a Model X version 2021.4.18. Each subject had three drives along the same route. For the first two drives, one drive had to be Manual and the other had to be Autopilot; the sequence was left up to the driver. For the third drive, the driver was allowed to choose either Autopilot or Manual mode. While in the Autopilot mode, subjects could disengage and re-engage the Autopilot if desired; the Manual mode was manual only. A manual driving mode was included to create a baseline for individual driving performance in the Tesla. This then presents a way to compare autopilot + human driver performance to human driver performance alone.
The experiment tasks included (1) complete preliminary individual attribute surveys, including the
Trust Of Automated Systems Test (TOAST), before driving, (2) drive the Tesla around a designated loop-shaped route three times (Figure
2) using one of three modes (Manual, Autopilot, and Driver's Preference) each time; (3) complete surveys at the end of each loop, including the
Multi-Dimensional Measure of Trust (MDMT); and (4) complete a post drive questionnaire at the end of the drive. The post-drive questionnaire asked participants to rate how much they trusted the Autopilot feature, which was considered to indicate trust in automation.
Figure
2 shows the learned trust associated with the three different drives and situational trust along the driving path for several different driving situations (downhill, straight line, turn, and curve). The goal was to use vehicle data about driving behavior under different drives and situations to model a driver's level of trust in automation. Results from TOAST were used to assess dispositional trust; results from the MDMT and the post-drive questionnaire were used to assess learned trust; and comparisons of driving data under different road conditions were used to assess situational trust.
4.2 Modeling Process
The modeling process was as follows:
•
Identify which of the self-report trust measures best indicate Trust in Automation.
•
Identify vehicle data that strongly correlate with the Trust in Automation measures.
•
Fit the data into a distribution and generate data for modeling and testing.
•
Develop and evaluate model to predict Trust in Automation based on the identified effective attributes.
•
Further evaluate model accuracy using field data and other tests.
Following are the experiments, analysis, and modeling efforts based on the procedure described above.
Step 1. Identify which of the self-report trust measures best indicate Trust in Automation. To identify which self-report measures best indicate Trust in Automation, we first computed correlation coefficients between the Multi-Dimensional Measure of Trust (MDMT) survey administered at the completion of each loop and the post-drive questionnaire. The MDMT measures 16 attributes of trust [
39]. The scale is divided into two major constructs: capacity trust and moral trust. Capacity trust has two subscales: reliable and capable. The reliable subscale has four attributes including reliable, predictable, someone you can count on, and consistent. The capable subscale has four attributes including capable, skilled, competent, and meticulous. Moral trust has two subscales: ethical and sincere. The ethical subscale has four attributes including ethical, respectable, principled, and integrity. The sincere subscale has four attributes including sincere, genuine, candid, and authentic. The breakdown and alphas for each dimension can be found in Ullman and Malle [
48]. The post-drive questionnaire asked subjects to rate their Trust in Automation using a Likert scale.
Table
1 shows the correlations between the post-drive trust rating and each of the 16 MDMT attributes. Note that the MDMT uses a 7-point Likert scale and the post-drive questionnaire uses a 5-point Likert scale. The 16 MDMT attributes can be grouped into four sub-scales:
Reliable, Capable, Ethical, and
Sincere [
39]. Table
2 shows the correlations between the four subscale values and the trust rating from the post drive questionnaire.
Results suggest that both the individual attribute Reliable and the Reliable subscale assessed during the evaluation points are moderately correlated to the self-assessment of Trust in Automation in the post drive questionnaire. The Ethical subscale also showed some degree of correlation, but since the correlation coefficient for Reliable was higher than Ethical, only the data for Reliable (from the evaluation points and post drive) was considered as a measure of performance (independent variable) in formulating the model to predict Trust in Automation.
Step 2. Identify vehicle data that strongly correlate with the Trust in Automation measures. Vehicle data were collected in real-time from the Tesla Model X CAN bus via the OBD port of a custom-built Ergoneer data acquisition system. Each experiment ran for about two hours with approximately one hour of vehicle data. The data file contains information for 143 different attributes; each data file contains approximately 1.5 million records related to the vehicle. Processing the data to identify when each major event happened is a very challenging task. We first used a bird's eye view approach to recognize major events such as the starting and stopping time of each drive when the auto-pilot is On or Off. Figure
3 shows GPS data of the driving route and a bird's eye view of the data for autopilot, speed, braking, and distance to the lane line to the left of the vehicle, which was used to track the vehicle's lane keeping performance.
This view can reveal a variety of information, including when an event (such as enabling Autopilot for the drive) starts and stops; and the length, frequency, and variation of each event. For example, a value is 3 when the autopilot is On and 2 when it is Off; so from the Autopilot plot in Figure
4, we can determine that there were three different drives, that the first and third drives were in autopilot mode, and that the first drive lasted about 15 minutes. From the Distance to Left plot, we can see how much a driver deviates from the lane line to the left of the vehicle. By examining the band width and shape, we can determine if the vehicle is driving in a straight line. If a driver does not have good control of the vehicle, the band will fluctuate a great deal. For example, in the plot shown in Figure
4, the driver consistently shifts to the right.
To identify the vehicle data that correlate strongly with the Trust in Automation performance measure, correlation coefficients were calculated to determine input candidates for the prediction model. There were three driving modes (Manual, Autopilot, Driver's Choice) and four driving situations (straight line, downhill, curve, turn) that were recorded for each drive. Although the Tesla CAN bus broadcasted data for 143 attributes, not all of the attributes were considered useful for modeling purposes. Therefore, for each situation (e.g., downhill driving), data from only six attributes were collected: number of times brakes were applied, average braking time, average speed, average speed standard deviation, average distance to the left line, and standard deviation of distance to the left line. The 24 attributes (delineated in Table
3) were computed for each driving attempt in this study.
There were six viable datasets, each with two attempts in Autopilot mode, for a total of 12 Autopilot attempts. For each driving attempt, the correlation coefficient between each of the 24 attributes and the subject's self-assessment of Trust in Automation were calculated. Table
4 shows the 7 vehicle attributes out of the 24 that yielded moderate to strong correlations with the self-assessment of Trust in Automation. For development of the prediction model, we used data from the top three attributes—DSa (Downhill Speed average), TLs (Turn Length standard deviation), and CSs (Curve Speed standard deviation)—as inputs, because these attributes all correlated strongly with the independent variable of Trust in Automation (based on the MDMT Reliability sub-scale).
Step 3. Fit the data into a distribution and generate data for modeling and testing. Because the number of viable data sets was limited, we first fit the data identified in Step 2 into distributions, then generated data needed for modeling. The vehicle data captured while the driver was in Autopilot mode were fit into a suitable distribution and validated based on the Lilliefors test. The Lilliefors test, which is based on the Kolmogorov–Smirnov test, is used for small data sets (less than 25). Following is a summary of the process for each attribute of interest and performance measures.
(a)
Trust in Automation: The top two distribution fitting candidates were Lognormal and Normal distributions. Since the lower and upper bounds range from 1 to 7, the suggested distribution was further adjusted to Normal (4,0.95) which covers approximately 99.73% of the population.
(b)
Downhill Speed Average (DSa): The top distribution fitting candidate was Normal (44.40,0.62).
(c)
Turn Length standard deviation (TLs): The top distribution candidate was Pareto (1.3657,0.032478).
(d)
Curve Speed standard deviation (CSs): The top two distribution candidates were ExtValue (3.8081,2.6365) and Normal (5.2911,3.3464). We chose the normal distribution for this study because we believe speed has variations caused by the subject as well as other factors such as road conditions.
Step 4. Develop and evaluate model to predict Trust in Automation based on the identified effective attributes. Nonlinear machine learning modeling techniques (ANN, SVM, and RF) were applied to model and predict a user's Trust in Automation based on vehicle data. Model effectiveness was then evaluated by randomly dividing the data into testing and evaluation sets. For the ANN model, we applied a 3-2-1 ANN topology with DSa, TLs, and CSs as the three input nodes, two hidden layers, and one output node, which was
Trust in Automation (TIA). Table
5 shows the results from the TIA
ANN, TIA
SVM, and TIA
RF models. The prediction accuracies for TIA
ANN and TIA
SVM were very close, but the accuracies of TIA
RF were significantly lower. The accuracy did not increase much as sample size increased to 1,000 data sets. The accuracy is defined as
1- ((TPi – TTi)/TTi)%, where TP
i is the predicted trust value for data sample
I and TT
i is the target trust value for data sample
i. The accuracy value in the tables below is the average accuracy of a given set of samples (such as 125 data sets).
Step 5. Further evaluate model accuracy using field data and other tests. The prediction accuracy of the designed models was further evaluated using other available field data and tests of robustness.
Manual and Driver Preference modes. The above models were built using data generated by the subjects when they are driving in
Autopilot mode, because the
Autopilot data are likely different from the data collected in the
Manual and Driver Preference modes (denoted as
M&D). For example, there could be a potential learning effect on driver performance and perception of vehicle capability when using Autopilot for a second time. However, the M&D data were used to evaluate the noise tolerance (robustness) of the developed model. Table
6 shows the testing results, which suggest the models are noise tolerant. In particular, the TIA
ANN model performed better than the TIA
SVM and TIA
RF models.
Noise Tolerance and Sensitivity Analysis. To test the robustness of the developed models, we arbitrarily added noise to the data to see how the models would respond in terms of accuracy. Noise was added by multiplying all the original values in the dataset by 5%, 10%, or 15%. For example, a 5% noise level was added using the following equation: Value
5% noise = (original value) * 1.05. In other words, the noise values deviate 5% from the original values. This level of noise was added to every single data point in the data set; therefore, all the data used for training, testing, and validation had the same noise level. Table
7 shows the results for the two models. Results suggest only 1% to 2% difference when noise level increases to 15%. In one case, the accuracy increased as the noise level increased. This suggests that the introduction of noise into the data sets turned out to fit the underlying data distribution.
Use of additional vehicle attributes. We explored further improving model accuracy by adding additional driving attributes into the modeling process. For example, Table
4 shows that the correlation between the average distance to the
left line (TLa) and Trust in Automation was −0.79. Data from Autopilot mode on TLa,
CSa (Curve-Speed Average), TSa (Turn-Speed Average), and
TBl (Turn-Brake Length) were fitted into distributions and validated using the Lilliefors test. Statistical testing results suggested Expon(0.0172, −0.1793), ExtValueMin(41.6520, 3.8349), Pareto(0.96522, 2.9800), and Uniform(23.4088, 31.3075) were the top candidates for these attributes, respectively. Additional data sets were generated based on those distributions. Table
8 shows that when going from three to four attributes, the accuracy of the ANN model increased slightly. However, ANN accuracy did not continue to increase when using five, six, or seven attributes. Also, SVM accuracy decreased when going from three to four attributes. Because our ultimate goal is to process vehicle attribute data online in real time, we chose to use three attributes to reduce computational complexity.
4.3 Stage I Findings and Discussion
Initially, we looked for time spent on autopilot and number of braking events as indicators of subjects’ trust in automation. However, these attributes did not strongly correlate with the individual assessment of trust on automation. This may have been because only 20% of the driving path consisted of downhill, curve, and turn situations (see Figure
3(a)). The trust signal was stronger when focusing only on sections of the driving path that require a greater cognitive load such as downhill, curve, and turns. Results suggest that the
Downhill Speed Average (DSa), Turn Length standard deviation (TLs), and
Curve Speed standard deviation (CSs) attributes yield stronger correlation coefficient with self-assessment of trust in automation. These attributes can potentially be used to continuously assess a subject's trust in automation as the number of driving attempts increases.
Of the three machine learning models developed, ANN and SVM yielded better accuracy than RF under a variety of conditions, including added noise, sample size variations, and tests with field data from the manual and driver preference modes.
The developed models (ANN, SVM, and RF) appear to be robust/fault tolerant. Even with added noise increases of up to 15%, accuracy was reduced only by 2% to 3%. in some cases, the accuracy increased 2% to 3%. When tested with data not previously seen by the models (from the manual and driver preference modes), the ANN model (0.72) performed better than SVM (0.66) and much better than RF (0.33).
The models need appropriate sample sizes. If the sample size is too small (less than 25), the models cannot find a good fit; therefore, the accuracy is not good. However, as sample size increases, model accuracy may not increase in proportion to the amount of data. Future research could include understanding how to find the right sample size when developing machine leaning models.
Accuracy did not improve more than 1% when the number of vehicle attributes (i.e., number of nodes) increased. This may be because the top three attributes had strong correlations with the Trust in Automation variable; whereas the other four attributes had relatively modest correlation coefficients with the self-assessment of trust in automation. Future research may include examining the extent to which these attributes are independent of one another in addition to further study of their correlation with the performance measure (trust in automation).
For purposes of providing personalized driving assistance in real time, using fewer attributes can be advantageous, in that computational requirements are reduced. Ultimately, vehicle attribute data may replace self-assessments of trust on the vehicle capability (learned trust). Future directions may include the use of dispositional trust to replace learned trust. A machine learning model such as ANN can be developed as the base model based on individual differences. The model can then improve itself as the individual drives more often and more data are generated. Therefore, the model becomes a personalized model that adapts and becomes smarter over time.
5 Stage II Model Development: Adaptive Driving Assistant Model (ADAM)
The focus of the Stage II model development process was on developing
Adaptive Driving Assistant Models (ADAM) based on ANN/SVM/RF techniques that can integrate the outputs from the four Stage I models (in Figure
1) and trigger appropriate voice instructions.
5.1 Model Development
Following are the steps followed in developing the ADAM models: (1) classify risk factors into categories and levels; (2) identify sensory device(s) for use in detecting risk factors; and (3) develop sensor fusion algorithms to integrate sensory data.
Step 1: Classify risk factors into categories and levels. Based on the literature review [
2–
5], four categories of risk factors are considered:
Speed, Distraction, Road Conditions, and
Trust. Within each factor, there are three levels of severity:
Over, Normal, and
Under. For example, if the factor is Speed, the levels would be
Over Speed, Normal, and
Under Speed. Therefore, there are 81 possible combinations (3 × 3 × 3 × 3) to be considered in this study. In addition, there are five possible types of advice that the system can provide:
Slow down, Speed up, Brake, Stop, and
Nothing.
Step 2. Identify sensory devices for use in detecting risk factors. Sensory devices were designated for monitoring each factor. Some of the data are from Tesla's CAN bus and some are from external sensory devices. Table
9 shows devices used for each factor.
Step 3. Develop sensor fusion algorithms to integrate sensory data. For the ANN model, a 4 × 3 × 1 topology was used, representing four inputs, three hidden nodes, and one output. The four inputs are speed, distraction, road conditions, and trust, and the one output is the type of guidance to provide to the driver. Between the inputs and outputs are hidden layers that connect the input nodes to output nodes via linkages. Each linkage has a weight and function that can be used to recognize non-linear fitting. For the SVM model, a regression model built upon speed, distraction, road conditions, and trust data sets is employed to predict the type of guidance to provide to the driver.
The ANN model was built using the MATLAB Neural Network Toolbox TrainLM function, which is based on the Levenberg-Marquardt backpropagation algorithm.
The SVM regression model was trained and cross-validated using the FitrSVM function within MATLAB's Support Vector Machine (SVM) Toolbox. FitrSVM maps the predictor data using Radial Basis Function (RBF) kernel functions. An SVM model has two essential factors: cost (c) and gamma (g). Cost means tolerance of error, which determines the generalizability of the model. Gamma is a parameter of the RBF kernel, which is inversely proportional to the number of support vectors that affect training and prediction speed. To train an efficient model that will neither overfit nor underfit, the values of c and g must be kept within an appropriate range. Hence, grid-search and cross-validation (cv) are utilized to find the best c and g automatically. To initiate the grid-search, a set of c and g values should be designated for the parameters. Based on selected scoring standards, the best settings will be determined after exhausting all the various combinations of parameters. To prevent the model from becoming too complicated, which may lead to overfitting, cross-validation is implemented simultaneously with grid search. The training sets will be divided into several subsets randomly. One subset is designated as a training set for each round, and the others serve as validation sets. These two mechanisms (grid search and cross-validation) were combined to adjust the parameters to improve training efficiency and model performance.
Cross-validation was also utilized in training and tuning the Random Forest model. The max depth of the trees is determined post training. Setting a maximum depth of the decision trees prunes the leaves which reduces overfitting and potentially removes the influence of noise.
The procedure for producing the output node values for the ANN and SVM models was as follows:
(1)
For each input type, assign relative weight for each level of severity.
(2)
Calculate the accumulated weights for each possible outcome of the output node.
(3)
Fit the weights of each outcome into a distribution.
(4)
Redistribute data into groups based on the number of possible outcomes of the output node; so, the number of data groups on the histogram is equal to the number of possible outcomes of the output nodes. Each group of data represents the probability of each possible outcome.
Table
10 shows the weights assigned to each factor.
Since four factors are considered and each factor has three levels, there are 81 possible combinations. After assigning weights to the severity of each factor, we can fit the overall value of each possible outcome into a distribution. At the same time, we can arrange the number of data groups into a histogram based on the number of predetermined outcomes, which is five in this case. As shown in Figure
5, a normal distribution with a mean and standard deviation of (7,1.4811) is a relatively good fit and happens to form five different groups with boundary values of 4.9, 6.35, 7.7, and 9.1. We can further normalize each outcome value between 0 and 1 using these group boundary values. The outcome values are calculated assuming Speed, Distraction, and Road condition factors are equally weighted plus half of Trust (since Trust is negatively correlated with the outcomes).
Figure
6 shows the data distribution, which is calculated assuming the outcome is the summation of four equally weighted factors (Speed, Road Condition, Distraction, and Trust). The distribution of the 81 possible outcomes suggests a normal distribution of N (8,1.64), which covers the range of 4 to 12 with 97.5% of the population after consolidating nine groups of data into five groups. The data distribution resembles a uniform distribution of U(0.95, 5.05). Figure
7 shows the distribution after the groups were consolidated.
5.2 Model Evaluation
To evaluate the Stage II models, we used both simulated data and field data to represent the outputs from the four Stage I models. Two approaches were used—comprehensive and historical—based on the data source. In the historical approach, data from external sources are used to represent past and current situations. In the comprehensive approach, data are generated to simulate a broad spectrum of events. This approach can include rare events, allowing possible future events to be represented. Using these two approaches together allows us to thoroughly evaluate and assess the robustness of the proposed models.
Comprehensive. For the comprehensive approach, we fit each factor value into a distribution, and then generate more data based on the underlying distribution with revised parameters as needed. Figures
8,
9, and
10 show possible distribution candidates for fitting existing data and revision of the underlying distribution to generate additional data. These are
Uniform (1,3), Normal (2,0.3), and
Triangular (1,2,3) distributions.
Tables
11,
12, and
13 show the accuracy of ADAM's sensor fusion algorithm-based ANN, SVM, and RF methods under conditions of different distributions and number of data sets. Results suggest that (1) accuracy increases as sample size increases; (2) there is not much increase in accuracy after sample size reaches 500 data sets; (3) the algorithms perform a little better when data follows a normal distribution; and (4) ADAM
ANN provides more stable (less variation) accuracy. With smaller samples, ADAM
ANN performs slightly better than ADAM
SVM but ADAM
RF performs much worse.
Historical. In this approach, we use historical data collected by government agencies, insurance companies, and third-party research foundations representing events in the U.S. to generate Speed, Distracted Driving, and Road Condition data for the input nodes of the ADAM
ANN, ADAM
SVN, and ADAM
RF algorithms. To generate data on the Trust in Automation factor, we rely on research findings from Dikmen & Burns [
66].
Dikmen and Burns [
66] reported that a survey of Tesla drivers was conducted to ask about their confidence in the Autopilot and common features. Overall, participants reported high levels of trust in Autopilot (M = 4.02, SD =.65) and moderate levels of initial trust (M = 2.83, SD = .82) on 5-point Likert scales. Trust in Autopilot was positively correlated with frequency of Autopilot use, self-rated knowledge about Autopilot, ease of learning, and usefulness of Autopilot displays.
Table
14 shows categories of factors and the frequency with which they affect driving, according to a 2016 survey by AAA [
67]:
Overall driving distraction categories include use of electronics (on phone, texting, reading emails), fatigue, and driving while impaired. Assuming these three categories are independent of one another, the percentage of driving distraction can be estimated as 18% for driving under the influence, 20% for distracted driving, and 62% including driving in tired or normal condition (because driving while tired does not necessarily cause accidents, this group was combined with the normal driving condition group.)
The AAA survey [
67] also summarized how drivers behave when speeding:
•
Nearly half of all drivers (48 percent) report going 15 mph over the speed limit on a freeway in the past month, while 15 percent admit doing so fairly often or regularly.
•
About 45 percent of drivers report going 10 mph over the speed limit on a residential street in the past 30 days, and 11 percent admit doing so fairly often or regularly.
Based on data from The Washington Post [
68] and Caring.com [
69], 20% of drivers who are age 65 or above tend to drive under the speed limit. Table
15 summarizes these statistics and classifies these four categories into three groups with associated percentages: Group A: over speed limit, 19%; group B: under speed limit, 18.3%; and group C: within speed limit, 62.7%.
According to ten-year averages from 2007 to 2016 analyzed by Booz Allen-Hamilton based on NHTSA data [
5], on average, over 5,891,000 vehicle crashes occur each year. Approximately 21% of these crashes—nearly 1,235,000—are weather-related. Weather-related crashes are defined as those crashes that occur in adverse weather (i.e., rain, sleet, snow, fog, severe crosswinds, or blowing snow/sand/debris) or on slick pavement (i.e., wet pavement, snowy/slushy pavement, or icy pavement). The vast majority of weather-related crashes happen on wet pavement (70%) and during rainfall (46%). A much smaller percentage of weather-related crashes occur during winter conditions: 18% during snow or sleet, 13% on icy pavement, and 16% on snowy or slushy pavement. Only 3% happen in the presence of fog [
5]. Table
16 summarize these statistics data about accidents caused by road/weather conditions.
Based on the above historical statistics for Speeding, Distracted Driving, and Road Conditions, data sets were generated for modeling and evaluating the ADAM
ANN and ADAM
SVM sensor fusion algorithms. Table
17 shows both algorithms performed well with accuracy ranging from 87% to 95%.
5.3 Stage II: Findings and Discussion
In Stage II, we designed and evaluated three machine learning models (ANN, SVM, and RF) for providing driving advice to drivers of autonomous vehicles using data generated from historical statistics and fitted distributions. For the models based on historical statistics, model accuracy ranged from 90% to 95% for ANN, 87% to 94% for SVM, and 85% to 95% for RF for the fitted distributions. For the models based on fitted distributions, model accuracy ranged from 85% to 89% for ANN, 80% to 92% for SVM, and 59% to 73% for RF. These results suggest (1) all three models perform better when using data generated from historical statistics than from fitted distributions. This may be due to the data in the fitted distribution having greater variation than historical statistics; (2) ANN and SVM models are a good fit for this application; (3) the ANN model seems most stable and adaptive, and gradually improves its accuracy as sample size increases; (4) the SVM model in one instance improved its accuracy faster than the other two models (Table
11, 250 data sets). Overall, the modeling methodology appears to be correct and yields good results. Future directions include (1) using a hybrid SVM and ANN modeling approach to improve model accuracy at different sample sizes; (2) development of a novel model based upon ANN topology for real-time data processing; and (3) development of a plug-in portable hardware system that incorporates sensory devices and machine learning algorithms to provide real-time personalized voice advice to a driver.