Keywords

1 Introduction

Frustration is a negative affective state that occurs when goal-directed behavior is blocked (e.g. [1]). Because driving is normally done for a purpose (e.g. going to work, taking the kids to school or quickly driving to the super market), drivers frequently experience frustration when they face obstacles, such as traffic jams and red lights or have problems to program their in-vehicle navigation or infotainment systems due to badly designed interfaces. Frustration can lead to aggressive behaviors (e.g. [2]) and can affect driving behavior due to negative effects on cognitive processes relevant for driving [3]. In addition, negative user experience coming along with frustration impacts user interaction with technical systems in general and has a significant influence on the acceptance of technical systems [4]: The lower the quality of the user experience, the lower the acceptance and thus also the willingness to use and buy a technical system. However, frustrating experiences when using technical systems, especially in complex traffic, cannot always be avoided by design for every situation. Here, affect-aware vehicles that recognize the driver’s degree of frustration and, based on this, offer assistance to reduce frustration or mitigate its negative effects promise remedy (e.g. [5,6,7,8]). As a prerequisite for the development of frustration-aware vehicles, a method for recognizing frustration in real-time is needed. Humans communicate emotions by changing their facial expression, so that understanding a person’s facial expression can help to infer his or her current emotional state [9]. Hence, an automated recognition of a frustrated facial expression could be an important brick for developing affect-aware vehicles. Interestingly, recent studies identified facial activation patterns that correlate with the experience of frustration during driving that may be automatically classified as indicator for frustration [10, 11].

Consequently, the goal of this work is to present a facial expression classifier that is capable of determining whether or not a frustrated face was shown based on video recordings. In order to show its real-time capability, we integrated the classifier into a demonstrator, the Frust-O-Meter, which works as part of a realistic driving simulator. In the following, we describe the development and performance of the classifier, introduce the modules of the Frust-O-Meter as well as their interplay and finally discuss potential improvements of the demonstrator together with ideas for further research.

2 Facial Expression Classifier

2.1 Short Description of Data Set

The data set used for training and validation of the classifier stems from an earlier driving simulator study with 30 participants conducted to investigate facial muscle activities that are indicative for frustration. Participants had to drive through an urban street with the task to quickly deliver a parcel to a client. Obstacles, such as red traffic light, traffic jams, or construction sites occurred that blocked the participant and thus induced frustration (for details, see [10, 11]). Participants’ faces were videotaped using a standard IP-cam (resolution of 1280 × 720 pixels) with 10 frames per second. The software package Emotient FACET (Imotions, Copenhagen, Denmark) was used to extract information regarding the evidence of activity of 19 facial action units (AUsFootnote 1) frame-wise from the facial videos. Using a data-driven clustering approach based on the AU data averaged in windows of 1 s, five facial expressions were identified to predominantly occur in the data set corresponding a frustrated facial expression, two slightly different neutral expressions (neutral 1 and neutral 2), smiling and frowning (for details, see [11]). These facial expressions were used as labels for the classifier development for this paper. Thus, the final data set contained the activity information for 19 AUs together with a label of a facial expression (neutral 1, smiling, frowning, frustrated or neutral 2) for each second from 30 participants driving roughly for 10 min (30 participants × 10 min × 60 s ~ 18,000 data points).

2.2 Classifier Set-up for Frame-to-Frame Frustration Classification

To classify the labeled data, a multi-layer perceptron (MLP) was used. This type of supervised learning method uses interconnected layers of neurons to learn and generalize over training examples. An MLP learns by adjusting the initially random weights, which are attached to each neuron in the net in order to minimize the output error. The algorithm used to adjust the weights is the backpropagation algorithm [13]. It is a way of realizing gradient descent to minimize the error of the net which allows adjusting the weights and learns to discriminate the different classes of data.

A feature vector with 19 dimensions (corresponding to the 19 AUs) was fed with a batch size of 32 into the net. Three fully connected hidden layers (32, 16 and 16 neurons) were used to project onto the output layer. The activation of the output layer (5 neurons) was calculated with a SoftMax function to generate the probabilities for each label. The argmax of the output layer returned the predicted label for each sample. The classifier was implemented in Python. The computational graph underlying the neuronal net was written with the TensorFlow package.

2.3 Evaluation Procedure

Before the training of the artificial neural network was started, 20% of the data were randomly split as hold-out set for later testing and never used during training. The remaining 80% of the data were split into 70% training and 30% validation data. In each epoch both sets were randomly shuffled before using the training data for training and the validation set to check the performance at the end of each epoch and test for potential overfitting of the net. After finishing the training, the hold-out set was used to test the performance of the net on previously unseen data. In the end, the structure and weights were saved for the usage in the demonstrator at a later point.

2.4 Classification Results

The classification result shown in Fig. 1 represents the performance of the trained net on the test set. The true labels are plotted on the y-axis against the predicted label on the x-axis. An overall accuracy of 69% was reached on the test set with the MLP. With 93%, the accuracy was highest for the frustrated facial expression, despite relatively high accuracies also for most of the other expressions (neutral 1: 70%, smiling: 71%, frowning: 79%). Solely the expression ‘neutral_2’ had lower accuracies (41%) and was repeatedly misclassified as frowning. As our goal was to construct a classifier for detecting the facial expression of frustration, the performance of this net was considered to be acceptable.

Fig. 1.
figure 1

Confusion matrix depicting the classification accuracy of the employed classifier.

3 Demonstrator: The Frust-O-Meter

In order to present the real-time capability of the classifier, we integrated it into a demonstrator, called the Frust-O-Meter. The Frust-O-Meter is currently integrated into a realistic driving simulator and consists of the five modules (1) a webcam, (2) preprocessing unit, (3) user model, (4) adaptation unit and (5) user interface (see Fig. 2 for a sketch of the architecture). The modules are detailed in the following:

Fig. 2.
figure 2

Sketch of the architecture of the Frust-O-Meter.

  • Webcam: The webcam is a standard Logitech C920 webcam recording with a resolution of 1920 × 1080 at a framerate of 30 fps. The camera is mounted on the dashboard to record the participants face from a frontal position. The video data is streamed to the preprocessing unit.

  • Preprocessing unit: The purpose of the preprocessing unit is to extract the frame-wise activation of the facial AUs from the video streams and to make these available for the user model for further processing. In the current version, the commercial software package Emotient FACET (Imotions, Copenhagen, Denmark) is used in this step, which can estimate the evidence of activity for 19 AUs for each frame. Thus, a 19-dimensional vector for each frame is passed on to the user model.

  • User model: In the user model, the preprocessed data regarding the facial activation are interpreted to gain an estimate of the user’s current degree of frustration. For this, initially the model classifies the incoming AU data using the trained facial expression classifier described above with respect to the currently shown facial expression (two different neutral expressions, smiling, frowning, frustrated, see 2 Facial Expression Classifier). Following the results reported in [11] that the frequency of showing the frustrated facial expression correlates with the subjective frustration experience, the occurrence of this facial expression is integrated over the last 20 data points in order to estimate the current degree of frustration. This means that the frustration estimate could take values between 0 (no frustrated facial expression was shown in the last 20 frames) and 20 (the frustrated facial expression was shown permanently during the last 20 data points). The result of the frustration estimation from the user model is passed on to the adaptation unit and the user interface.

  • Adaptation unit: The idea of the adaptation unit is to select and execute an appropriate adaptation strategy that supports the user in mitigating her or his currently experience level of frustration or help to reduce the negative effects frustration has on (driving) performance. Currently, this is realized in a very simple form by randomly playing one of two happy music songs (either Hooked on a Feeling in the version by Blue Swede or Have You Ever Seen the Rain by Creedence Clearwater Revival) via loud speakers for about one minute once the frustration estimate reached a threshold value of 15 or above.

  • User interface: Finally, the user interface has the purpose to present the frustration estimate as well as the output of the facial expression classifier to the user. In the simulator the user interface was shown on a monitor in the center console of the vehicle mock-up. The upper half of the monitor contained a display of the frustration estimate in the style of a speedometer (which explains Frust-O-Meter as name for the demonstrator), in which values above 15 are displayed in red, while the remainder is shown in white (see Fig. 2). The lower half contains smileys for the facial expressions together a time series display of the classifier output over a configurable time window.

The modules of the Frust-O-Meter work in real-time, so that user could drive through a simulated urban drive (realized in Virtual Test Drive, VIRES Simulationstechnologie, Bad Aibling, Germany) with frustrating situations, such as construction sites or red lights (cf. [2, 10, 11]). In this way, it is possible for the users to experience a real-time adaptation of a system to their current frustration level.

4 Discussion and Outlook

Here, we presented a real-time capable classifier for recognizing a frustrated facial expression from video streams and its integration into the Frust-O-Meter, a demonstrator of a frustration-aware vehicle. The Frust-O-Meter links a real-time estimation of the degree of frustration from the facial expression with a simple adaptation and a user interface to communicate the current level of frustration. The user model of the demonstrator estimates the current degree of frustration based on a temporal integration of facial expressions classified as frustrated. As classifier, we used a multi-layer perceptron that was trained on video recordings of 30 participants experiencing frustration in a driving simulator study. The adaption to the frustration of the user is currently realized by playing a happy song when a certain degree of frustration is realized.

While the Frust-O-Meter is a useful means to render possible that people experience the idea of a real-time adaption to their degree of frustration, there are a lot of challenges that need to be tackled to develop a real fully-functioning frustration-aware vehicle. For instance, affective states like frustration are multi-component processes [14, 15] that not only manifest in the facial expression, but also come along with changes in cognitive appraisals, physiology, gestures or prosody. Therefore, information from other sensors besides a webcam (e.g. electrocardiogram, infrared imaging, or mricrophone [16,17,18,19]) should be integrated into the user model to improve the frustration estimation. Surely, extending the sensor set also demands a more sophisticated preprocessing unit and user model. Moreover, the only possible adaptation was playing a happy song randomly chosen from a set compiled based on the authors’ preferences. Although music has the potential to improve the drivers’ mood [20], other strategies may even be better suited. Very promising seem to be empathic voice assistants that support drivers in dealing with their frustration or help to overcome the causes of frustrations (e.g. offer help when dealing with a badly designed interface) [8, 21]. In order to realize this, information about the context to derive the cause of frustration in the user model as well as suitable approaches to select and apply the best possible strategy in the adaptation unit are needed as for example described in [22]. To sum up, in spite of the delineated ideas for further research, the Frust-O-Meter is an elegant way to demonstrate a frustration-aware vehicle based on automated recognition of facial expressions from video recordings.