Keywords

1 Introduction

Psychological scale is commonly used to screen for psychological problems based on the theory about the attention bias of emotional disorder people [1], such as the Depression Anxiety Stress Scale (DASS) [2], Cognitive Emotion Regulation Questionnaire (CERQ) [3] and Mini International Neuropsychiatric Interview [4]. However, the scales have some shortcomings: Children may not understand the problems in scales and the subjects may deliberately choose some options which do not meet the actual status.

The research on attention bias and psychological problem by using image substitution has the advantages of objectivity and intuitive reaction. It has become an important method of psychology research but the relationship between image semantics and psychological status still remains difficulties. Anderson et al. [14] introduced text and natural scene images to capture the phenomenon of negative attraction. More elements related with image are gradually introduced into behavioral experiments, such as informative picture, emotional faces [5, 6]. Bao et al. [7] made a novel study on the semantic mapping between MMPI and scene images, providing an affective image library and marked those images as positive or negative. Response time is widely used in psychology research as an important characteristic. Li et al. [8], Wang [12] proposed a paradigm based on natural scene images and emotional human face pictures. They used the keyboard response time to realize the distinction between different people’s mental states. These studies focused on the observation of different people’s responses according to the analysis about how people perceiving and dealing with image stimulation.

In addition to the keyboard, eye tracker has been used in psychological experiments in recent years. Non-contact eye movement information is directly captured with a high degree of acceptance and precision. In recent years, many scholars have adopted eye movement method to analyze problem. Bloem et al. [11] used visual images to discover the role of gaze in mental imagery and memory. Duque and Vázquez [10] used positive and negative emotional faces to observe the double attention bias in clinical depression with eye tracking.

However, these methods use the eye movement heat map or fixation points as features, while ignoring the importance of eye movement length and response time which reflect the subjects’ overall status in the experiment. On the other hand, those methods based on keyboard response time may not fully reflect people’s attention, since the response time is related with several factors, such as the saliency of the face expression and the age of people. The eye movement characteristics extraction, which relies on the experimental paradigm and eye movement data processing algorithm, needs to be able to reflect the subject’s psychological characteristics accurately. Furthermore, the combination of keyboard and eye movement describes the people’s psychological status better. The fusion of keyboard response time and eye movement reflects the unit response of subject and generate high-dimensional features, which also improve the classification accuracy.

In this paper, we present a whole set of system with a new paradigm, the face images are shown at the left or right side in the background with same probability. We collect keyboard response time of identification for emotional faces and record the eye movement during the experiment. By analyzing the collected data, we find that there exists significant discrepancies between normal and abnormal people. The accuracy of the experiment is improved compared with Li [8] and Wang [12] ’s methods.

2 Experiment

2.1 Materials and Participants

We use 16 emotional face images (8 positive and 8 negative) from Taiwanese Facial Expression Image Database [16] as the foreground stimuli, 80 emotional images (40 positive and 40 negative images) chosen from ThuPIS [8] as the background scenes. All the face images are converted to gray. All the images in ThuPIS have been choosen from IAPS [17] and Google and screened based on the Bao [7] ’s method. The sample of face and scene images are shown in Fig. 1.

30 patients with depression disorder (23 males and 7 females, age Mean = 22.5, Standard Deviation = 3.86) are from two hospitals, 30 normal controls (18 males and 12 females, age M = 23.2, SD = 0.69) are university students.

Fig. 1.
figure 1

The example of facial expressions and scene picture. The first row is positive faces and scenes, the second row is negative faces and scenes.

2.2 Model

The whole system is divided into 4 parts as shown in Fig. 2.

Experimental paradigm. The purpose of the experiment paradigm is to observe and analyze the subjects’ response data through the emotional images. Detailed content is introduced in Sect. 2.3.

Data collection. The eye movement data are collected by Tobii eye tracker, which is popular with psychological research. Eye tracker is used to record the eye movement characteristics in visual information processing. The psychological activities have a direct or indirect relationship with eye movement. Every subjects were calibrated by using the Tobii EyeX Interaction Application before the start of the test. The subjects’ response time is collected by keyboard or button.

Characteristics extraction. The collected data is converted into a feature vector by a fusion algorithm which is introduced in Sect. 3.1.

Data analysis. In this system, the Support Vector Machine (SVM) [16] is used to classify normal people and depressed people, we also use SPSS (Statistical Product and Service Solutions, a software) for significance test.

2.3 Procedure

The experiment is Competing-Priming effect experiment (C-P). Compared with the former researches [7, 8], it puts forward some improvements in the location of face images. The face images are shown at the left or right side in the background with same probability. The participates are required to read the instructions on the screen. Procedure of this experiment is shown in Fig. 2. First, participants are given the opportunity to practice 20 trials, then they will be asked to complete 80 formal experimental trials. At first, we present the background of scene and an emotional face appears on left or right side randomly after 500–1000 ms. Subjects need to make judgment by the button. This study focused on the competing and priming effect of the background. Eye tracking path, response time and the accuracy of each trial are recorded.

Fig. 2.
figure 2

The system model.

3 Reaction Characteristics Extraction

3.1 Extraction Algorithm

Eye Movement. F(x, y, z, t) is obtained at each moment by the Tobii eye tracker, indicating that the fixation point is at (x, y), and the distance from the screen to the eye is z at time t. The coordinate is shown in Fig. 3.

Fig. 3.
figure 3

Coordinate and procedure.

During a single group of experiments, the background image appears at time \(t_1\), the foreground face appears at time \(t_2\), the subject presses the key at time \(t_3\). Then we construct three sets A, B, C according to these three time sets. Set A = \(\{(x, y, t) | t_1<t<t_2\}\), which represents the subjects’ eye movement during the period from the appearance of the background to the appearance of the human face. During this time, the subjects focus on the background image, we call it cognitive period. Set B = \(\{(x, y, t) | t_2<t<t_3\}\), which is the period from the time when human face appears until the subjects make the decision. At this time, the subjects processed the foreground and the background images and make a button selection, we call it selective period. Set C = \(\{(x, y, t) | t_1<t<t_3\}\), which is the union of A and B, represents the eye movement data of whole trials.

For a set of data with length n, (\(x_1\), \(y_1\), \(t_1\)), ..., (\(x_n\), \(y_n\), \(t_n\)), we process the data in Fig. 4. Step 1, 2 confirm the continuity and length of data. Distance (i, i + 1) is the pixel distance between point i and point i + 1. Step 3 determines whether the data points are in the scope of screen. The final output is obtained by the merging of all steps.

Fig. 4.
figure 4

Eye movement data preprocessing we need to determine the continuity and integrity of the data through three steps.

Eye movement path length is calculated as:

$$\begin{aligned} L = \sum _{i=1}^{n-1} \sqrt{(x_{i+1}-x_i)^2+(y_{i+1}-y_i)^2)}. \end{aligned}$$
(1)

Salvucci and Goldberg [9] summarized some methods for calculating fixation points, including I-VT, I-HMM, I-DT, I-MST, I-AOI algorithm. In this paper we use I-VT (fast) and I-DT (accurate and robust) algorithm. I-VT algorithm calculates point-to-point velocities for each point, labels each point below velocity threshold as a fixation point and collapses consecutive fixation points into fixation groups. Velocity threshold is set to 900 pixels/second. I-DT algorithm sets dispersion threshold and duration threshold. Considering the image is a two-dimensional signal, we use Euclidean distance instead of Manhattan distance.

$$\begin{aligned} Dispersion\ D = max \sqrt{(x_{i} - x_{j})^2 +(y_{i} - y_{j})^2} \quad i,j\in [1,2,\ldots n] \end{aligned}$$
(2)

The dispersion threshold is set to 30 pixels by including \(1/2^{\circ }\) to \(1^{\circ }\) of visual angle. The duration threshold is set to 83 ms (5 interval of eye tracker).

Fig. 5.
figure 5

The Example of eye tracking path. Every white circle represents that there is a fixation point. These 4 pictures all have positive face. The pictures show us eye movement path of normal people (first row) and depressed people (second row) (Color figure online)

The sample of eye movement path is shown in Fig. 5. The path starts from red, then changes through green to blue gradually. The human face image appears at the moment when line changes into green. That is, the red-green path is the cognitive path, the green-blue path is the selective path.

Response Time. We calculate the mean and variance of the collected data which is divided into four groups based on the combination of foreground and background. Specific algorithm is shown in Algorithm 1.

figure a

The purpose of the data preprocessing steps 1–4 is to remove the test data due to misunderstanding the experimental requirements or lack of concentration and the situation that participants were distracted in some trials. Step 5 removes abnormal test data caused by a series of external reasons, such as software abnormalities, database bugs.

We define the subscript 0 as negative, 1 as positive, 01 as the combination of negative scenes and positive face images, etc. Therefore, we obtained 4 response time and their mean value (\(M_{00}\), \(M_{01}\), \(M_{10}\), \(M_{11}\), \(M_M\)), 4 standard deviation and their mean value (\(STD_{00}\), \(STD_{01}\), \(STD_{10}\), \(STD_{11}\), \(STD_M\)).

3.2 Significance Test

We make significance test for mean time (\(M_{00}\), \(M_{01}\), \(M_{10}\), \(M_{11}\), \(M_M\)), length (\(L_{00}\), \(L_{01}\), \(L_{10}\), \(L_{11}\), \(L_M\)) and fixation points (\(P_{00}\), \(P_{01}\), \(P_{10}\), \(P_{11}\), \(P_M\)).

The independent-sample t test. T-test is to use t distribution theory to infer the probability of occurrence of the difference, so as to compare the difference between the two average. The significant for this context is the prospect of different attributes of the background and foreground, the different mental states of the people.

In the significance test, value F represents the ratio of the variance to the residual of regression model, value Sig is calculated according to value F.

3.3 SVM

We use SVM to discriminate the normal and depressed people’s psychological status. The data forms achieved by our system are \(M_i\)(\(M_{00i}\), \(M_{01i}\), \(M_{10i}\), \(M_{11i}\), \(M_{Mi}\)) and their labels is \(Y_i \in \{-1,1\}\), i = 1, 2, ... N. We suppose that the first q samples are positive samples and the latter N-q samples are negative samples. There are two questions in the practical application of our system.

  1. (i)

    The data collection is unbalanced between the two groups of people, depressed people’s data is less than the normal people’s data.

  2. (ii)

    Under the condition of certain false alarm probability, the large-scale screening system needs higher accuracy of negative sample.

In view of the above problems, we need two different penalty factors \(C_+\), \(C_-\) to substitute factor C in SVM algorithm, so the problem is:

$$\begin{aligned} min\varphi (\omega )=\frac{1}{2}\Vert \omega \Vert ^2+C_+\sum _{i=1}^q\xi _i+C_-\sum _{i=q+1}^N\xi _i \end{aligned}$$
(3)
$$\begin{aligned} s.t. \left\{ \begin{array}{lll} Y_i(\omega \cdot M_i+b)-1+\xi _i \ge 0 \\ \xi _i \ge 0 \end{array} i=1,2,\ldots N \right. \end{aligned}$$
(4)

By using the Lagrangian function, the problem turns into Eqs. 5 and 6:

$$\begin{aligned} minQ(\alpha )=\frac{1}{2}\sum _{i=1}^N\sum _{j=1}^N\alpha _i\alpha _jY_iY_jM_i\cdot M_j \end{aligned}$$
(5)
$$\begin{aligned} s.t. \left\{ \begin{array}{lll} 0 \le \alpha _i \le C_+,i=1,2,\ldots ,q \\ 0 \le \alpha _j \le C_-,j=q,q+1,\ldots ,N\\ \end{array} \qquad \sum _{i=1}^N Y_i \alpha _i=0 \right. \end{aligned}$$
(6)

After using the SMO algorithm iterations, we find the best set of \(\alpha _i\) to divide the hyper plane. \(C_->C_+\) results that the weight of the negative sample is greater than the positive sample. This makes the classification hyperplane closer to the positive samples so as to achieve the purpose of screening. The same analysis is also suitable for length, fixation points and all kinds of their combinations.

Fig. 6.
figure 6

The PR curve and ROC of single feature and fusion feature

4 Result

The histograms of the eye tracking length, fixation points and response time are shown in Fig. 7. The significant analysis of characteristics are shown in Tables 1, 2 and 3 (S = significant, NS = not significant).

Fig. 7.
figure 7

The histogram of eye tracking length, fixation points and response time.

We use these data to train a classifier through cross-validation and then to distinguish two types of people. We use these separate features and their fusion feature to train SVM models respectively. The results are shown in Table 4. The PR Curve and ROC are shown in Fig. 6. There are some inflection and turning points in the curve because of the scale of data and several classification error.

Table 1. The significant analysis of eye tracking length
Table 2. The significant analysis of fixation points
Table 3. The significant analysis of response time
Table 4. Results trained with single feature and fusion feature through cross-validation

5 Discussion

In this eye movement experimental paradigm, the emotional face images appear at left or right randomly. Compared with the paradigm with face images in the middle, this new method maintains the basic model of the case meanwhile promoting the psychological semantics of the task. Its advantage is obtaining better observation of the subjects’ psychological status and avoiding some situations that the subjects who deliberately stare at the center of the screen to wait for the face images emerge. Moreover, the system uses the combination of the eye movement and response time improve the accuracy of classification.

After the screening of eye movement and response time data, we obtained 49 sets of data (24 normal, 25 abnormal). As shown in the histograms, the normal people’s eye movement path length is shorter than the depressed people’s length, which means that the depressed people need longer eye movement path in this experiment. The normal subjects’ fixation points number is less than the depressed people’s number, which means that the depressed people need more fixation points in this experiment. This result implies that depressed people need more attention to understand the picture. The response time of two groups is obviously a bimodal distribution, and the response time of normal people is faster than that of depressed people.

According to the results of significant analysis, most of the eye movement length, number of fixation points and response time characteristics are significant. The result indicates that these features are discriminative reflections of the subjects’ psychological status. While the \(P_{11}\)-Set C, \(M_{11}\)-Set B, \(L_{11}\)-Set C, \(L_{01}\)-Set B and \(L_{01}\)-Set B aren’t significant in independent sample t-test, which means there are no significant differences between the two groups in the case of positive facial stimulation or positivie background initiating. It is an powerful evidence for the negative attraction phenomenon in depressed people.

The eye movement length reflects the scanning distance of the subject’s attention. The number of fixation points shows the area concerned by subjects. The keyboard response time shows the time between the stimulus presentation and the beginning of the reaction. Through cross-validation, the classification accuracy are 77.56%, 75.51%, 71.42% and 79.59% in the using of these characteristics respectively, indicating these features are discriminative. Although the accuracy of response time is not bad, its sensitivity is lower than fixation points and eye movement length. According to the trait of duration sensitive and locally adaptive, I-DT algorithm get a higher accuracy than I-VT algorithm and its sensitivity is the highest in single feature. The fusion of keyboard response time, eye movement length and fixation points improve the classification accuracy to 83.67%, what is more important is that the increase is due to sensitivity. By adjusting the penalty weights of the positive and negative samples to meet the demand of the screening system, the sensitivity increases from 76% to 92% with the expense of recall. The PR curve and the ROC also show that the fusion feature performs better than the individual features. The above results confirm the effectiveness of feature fusion in our system.