A Method for Extracting Eye Movement and Response Characteristics to Distinguish Depressed People

Le, Chao; Ma, Huimin; Wang, Yidong

doi:10.1007/978-3-319-71607-7_43

Chao Le¹⁶,
Huimin Ma¹⁶ &
Yidong Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10666))

Included in the following conference series:

International Conference on Image and Graphics

2591 Accesses
3 Citations

Abstract

Eye movement is an important characteristic in the field of image processing and psychology, which reflects people’s attention bias. How to design a paradigm with the function of psychological discrimination to extract significant eye movement characteristic is a challenging task. In this paper, we present a novel psychology evaluation system with eye tracking. Negative and positive background images from IAPS and Google are chosen based on the Minnesota Multiphasic Personality Inventory (MMPI). Meanwhile, negative and positive face images are used as emotional foreground. The location of the face images is shown on the left or right randomly. In this paradigm, people with different psychological status have different characteristics of eye movement length, fixation points and response time. The experimental results show that these characteristics have significant discriminability and can be used to distinguish depressed and normal people effectively.

You have full access to this open access chapter, Download conference paper PDF

A Study on Depression Detection Using Eye Tracking

Exploring Eye Activity as an Indication of Emotional States Using an Eye-Tracking Sensor

Recognizing Polychronic-Monochronic Tendency of Individuals Using Eye Tracking and Machine Learning

Keywords

1 Introduction

Psychological scale is commonly used to screen for psychological problems based on the theory about the attention bias of emotional disorder people [1], such as the Depression Anxiety Stress Scale (DASS) [2], Cognitive Emotion Regulation Questionnaire (CERQ) [3] and Mini International Neuropsychiatric Interview [4]. However, the scales have some shortcomings: Children may not understand the problems in scales and the subjects may deliberately choose some options which do not meet the actual status.

The research on attention bias and psychological problem by using image substitution has the advantages of objectivity and intuitive reaction. It has become an important method of psychology research but the relationship between image semantics and psychological status still remains difficulties. Anderson et al. [14] introduced text and natural scene images to capture the phenomenon of negative attraction. More elements related with image are gradually introduced into behavioral experiments, such as informative picture, emotional faces [5, 6]. Bao et al. [7] made a novel study on the semantic mapping between MMPI and scene images, providing an affective image library and marked those images as positive or negative. Response time is widely used in psychology research as an important characteristic. Li et al. [8], Wang [12] proposed a paradigm based on natural scene images and emotional human face pictures. They used the keyboard response time to realize the distinction between different people’s mental states. These studies focused on the observation of different people’s responses according to the analysis about how people perceiving and dealing with image stimulation.

In addition to the keyboard, eye tracker has been used in psychological experiments in recent years. Non-contact eye movement information is directly captured with a high degree of acceptance and precision. In recent years, many scholars have adopted eye movement method to analyze problem. Bloem et al. [11] used visual images to discover the role of gaze in mental imagery and memory. Duque and Vázquez [10] used positive and negative emotional faces to observe the double attention bias in clinical depression with eye tracking.

However, these methods use the eye movement heat map or fixation points as features, while ignoring the importance of eye movement length and response time which reflect the subjects’ overall status in the experiment. On the other hand, those methods based on keyboard response time may not fully reflect people’s attention, since the response time is related with several factors, such as the saliency of the face expression and the age of people. The eye movement characteristics extraction, which relies on the experimental paradigm and eye movement data processing algorithm, needs to be able to reflect the subject’s psychological characteristics accurately. Furthermore, the combination of keyboard and eye movement describes the people’s psychological status better. The fusion of keyboard response time and eye movement reflects the unit response of subject and generate high-dimensional features, which also improve the classification accuracy.

In this paper, we present a whole set of system with a new paradigm, the face images are shown at the left or right side in the background with same probability. We collect keyboard response time of identification for emotional faces and record the eye movement during the experiment. By analyzing the collected data, we find that there exists significant discrepancies between normal and abnormal people. The accuracy of the experiment is improved compared with Li [8] and Wang [12] ’s methods.

2 Experiment

2.1 Materials and Participants

We use 16 emotional face images (8 positive and 8 negative) from Taiwanese Facial Expression Image Database [16] as the foreground stimuli, 80 emotional images (40 positive and 40 negative images) chosen from ThuPIS [8] as the background scenes. All the face images are converted to gray. All the images in ThuPIS have been choosen from IAPS [17] and Google and screened based on the Bao [7] ’s method. The sample of face and scene images are shown in Fig. 1.

30 patients with depression disorder (23 males and 7 females, age Mean = 22.5, Standard Deviation = 3.86) are from two hospitals, 30 normal controls (18 males and 12 females, age M = 23.2, SD = 0.69) are university students.

2.2 Model

The whole system is divided into 4 parts as shown in Fig. 2.

Experimental paradigm. The purpose of the experiment paradigm is to observe and analyze the subjects’ response data through the emotional images. Detailed content is introduced in Sect. 2.3.

Data collection. The eye movement data are collected by Tobii eye tracker, which is popular with psychological research. Eye tracker is used to record the eye movement characteristics in visual information processing. The psychological activities have a direct or indirect relationship with eye movement. Every subjects were calibrated by using the Tobii EyeX Interaction Application before the start of the test. The subjects’ response time is collected by keyboard or button.

Characteristics extraction. The collected data is converted into a feature vector by a fusion algorithm which is introduced in Sect. 3.1.

Data analysis. In this system, the Support Vector Machine (SVM) [16] is used to classify normal people and depressed people, we also use SPSS (Statistical Product and Service Solutions, a software) for significance test.

2.3 Procedure

The experiment is Competing-Priming effect experiment (C-P). Compared with the former researches [7, 8], it puts forward some improvements in the location of face images. The face images are shown at the left or right side in the background with same probability. The participates are required to read the instructions on the screen. Procedure of this experiment is shown in Fig. 2. First, participants are given the opportunity to practice 20 trials, then they will be asked to complete 80 formal experimental trials. At first, we present the background of scene and an emotional face appears on left or right side randomly after 500–1000 ms. Subjects need to make judgment by the button. This study focused on the competing and priming effect of the background. Eye tracking path, response time and the accuracy of each trial are recorded.

3 Reaction Characteristics Extraction

3.1 Extraction Algorithm

Eye Movement. F(x, y, z, t) is obtained at each moment by the Tobii eye tracker, indicating that the fixation point is at (x, y), and the distance from the screen to the eye is z at time t. The coordinate is shown in Fig. 3.

During a single group of experiments, the background image appears at time $t_1$, the foreground face appears at time $t_2$, the subject presses the key at time $t_3$. Then we construct three sets A, B, C according to these three time sets. Set A = $\{(x, y, t) | t_1<t<t_2\}$, which represents the subjects’ eye movement during the period from the appearance of the background to the appearance of the human face. During this time, the subjects focus on the background image, we call it cognitive period. Set B = $\{(x, y, t) | t_2<t<t_3\}$, which is the period from the time when human face appears until the subjects make the decision. At this time, the subjects processed the foreground and the background images and make a button selection, we call it selective period. Set C = $\{(x, y, t) | t_1<t<t_3\}$, which is the union of A and B, represents the eye movement data of whole trials.

For a set of data with length n, ($x_1$, $y_1$, $t_1$), ..., ($x_n$, $y_n$, $t_n$), we process the data in Fig. 4. Step 1, 2 confirm the continuity and length of data. Distance (i, i + 1) is the pixel distance between point i and point i + 1. Step 3 determines whether the data points are in the scope of screen. The final output is obtained by the merging of all steps.

Eye movement path length is calculated as:

$$\begin{aligned} L = \sum _{i=1}^{n-1} \sqrt{(x_{i+1}-x_i)^2+(y_{i+1}-y_i)^2)}. \end{aligned}$$

(1)

Salvucci and Goldberg [9] summarized some methods for calculating fixation points, including I-VT, I-HMM, I-DT, I-MST, I-AOI algorithm. In this paper we use I-VT (fast) and I-DT (accurate and robust) algorithm. I-VT algorithm calculates point-to-point velocities for each point, labels each point below velocity threshold as a fixation point and collapses consecutive fixation points into fixation groups. Velocity threshold is set to 900 pixels/second. I-DT algorithm sets dispersion threshold and duration threshold. Considering the image is a two-dimensional signal, we use Euclidean distance instead of Manhattan distance.

$$\begin{aligned} Dispersion\ D = max \sqrt{(x_{i} - x_{j})^2 +(y_{i} - y_{j})^2} \quad i,j\in [1,2,\ldots n] \end{aligned}$$

(2)

The dispersion threshold is set to 30 pixels by including $1/2^{\circ }$ to $1^{\circ }$ of visual angle. The duration threshold is set to 83 ms (5 interval of eye tracker).

The sample of eye movement path is shown in Fig. 5. The path starts from red, then changes through green to blue gradually. The human face image appears at the moment when line changes into green. That is, the red-green path is the cognitive path, the green-blue path is the selective path.

Response Time. We calculate the mean and variance of the collected data which is divided into four groups based on the combination of foreground and background. Specific algorithm is shown in Algorithm 1.

The purpose of the data preprocessing steps 1–4 is to remove the test data due to misunderstanding the experimental requirements or lack of concentration and the situation that participants were distracted in some trials. Step 5 removes abnormal test data caused by a series of external reasons, such as software abnormalities, database bugs.

We define the subscript 0 as negative, 1 as positive, 01 as the combination of negative scenes and positive face images, etc. Therefore, we obtained 4 response time and their mean value ($M_{00}$, $M_{01}$, $M_{10}$, $M_{11}$, $M_M$), 4 standard deviation and their mean value ($STD_{00}$, $STD_{01}$, $STD_{10}$, $STD_{11}$, $STD_M$).

3.2 Significance Test

We make significance test for mean time ($M_{00}$, $M_{01}$, $M_{10}$, $M_{11}$, $M_M$), length ($L_{00}$, $L_{01}$, $L_{10}$, $L_{11}$, $L_M$) and fixation points ($P_{00}$, $P_{01}$, $P_{10}$, $P_{11}$, $P_M$).

The independent-sample t test. T-test is to use t distribution theory to infer the probability of occurrence of the difference, so as to compare the difference between the two average. The significant for this context is the prospect of different attributes of the background and foreground, the different mental states of the people.

In the significance test, value F represents the ratio of the variance to the residual of regression model, value Sig is calculated according to value F.

3.3 SVM

We use SVM to discriminate the normal and depressed people’s psychological status. The data forms achieved by our system are $M_i$($M_{00i}$, $M_{01i}$, $M_{10i}$, $M_{11i}$, $M_{Mi}$) and their labels is $Y_i \in \{-1,1\}$, i = 1, 2, ... N. We suppose that the first q samples are positive samples and the latter N-q samples are negative samples. There are two questions in the practical application of our system.

(i)
The data collection is unbalanced between the two groups of people, depressed people’s data is less than the normal people’s data.
(ii)
Under the condition of certain false alarm probability, the large-scale screening system needs higher accuracy of negative sample.

In view of the above problems, we need two different penalty factors $C_+$, $C_-$ to substitute factor C in SVM algorithm, so the problem is:

$$\begin{aligned} min\varphi (\omega )=\frac{1}{2}\Vert \omega \Vert ^2+C_+\sum _{i=1}^q\xi _i+C_-\sum _{i=q+1}^N\xi _i \end{aligned}$$

(3)

$$\begin{aligned} s.t. \left\{ \begin{array}{lll} Y_i(\omega \cdot M_i+b)-1+\xi _i \ge 0 \\ \xi _i \ge 0 \end{array} i=1,2,\ldots N \right. \end{aligned}$$

(4)

By using the Lagrangian function, the problem turns into Eqs. 5 and 6:

$$\begin{aligned} minQ(\alpha )=\frac{1}{2}\sum _{i=1}^N\sum _{j=1}^N\alpha _i\alpha _jY_iY_jM_i\cdot M_j \end{aligned}$$

(5)

$$\begin{aligned} s.t. \left\{ \begin{array}{lll} 0 \le \alpha _i \le C_+,i=1,2,\ldots ,q \\ 0 \le \alpha _j \le C_-,j=q,q+1,\ldots ,N\\ \end{array} \qquad \sum _{i=1}^N Y_i \alpha _i=0 \right. \end{aligned}$$

(6)

After using the SMO algorithm iterations, we find the best set of $\alpha _i$ to divide the hyper plane. $C_->C_+$ results that the weight of the negative sample is greater than the positive sample. This makes the classification hyperplane closer to the positive samples so as to achieve the purpose of screening. The same analysis is also suitable for length, fixation points and all kinds of their combinations.

4 Result

The histograms of the eye tracking length, fixation points and response time are shown in Fig. 7. The significant analysis of characteristics are shown in Tables 1, 2 and 3 (S = significant, NS = not significant).

We use these data to train a classifier through cross-validation and then to distinguish two types of people. We use these separate features and their fusion feature to train SVM models respectively. The results are shown in Table 4. The PR Curve and ROC are shown in Fig. 6. There are some inflection and turning points in the curve because of the scale of data and several classification error.

Table 1. The significant analysis of eye tracking length

Full size table

Table 2. The significant analysis of fixation points

Full size table

Table 3. The significant analysis of response time

Full size table

Table 4. Results trained with single feature and fusion feature through cross-validation

Full size table

5 Discussion

In this eye movement experimental paradigm, the emotional face images appear at left or right randomly. Compared with the paradigm with face images in the middle, this new method maintains the basic model of the case meanwhile promoting the psychological semantics of the task. Its advantage is obtaining better observation of the subjects’ psychological status and avoiding some situations that the subjects who deliberately stare at the center of the screen to wait for the face images emerge. Moreover, the system uses the combination of the eye movement and response time improve the accuracy of classification.

After the screening of eye movement and response time data, we obtained 49 sets of data (24 normal, 25 abnormal). As shown in the histograms, the normal people’s eye movement path length is shorter than the depressed people’s length, which means that the depressed people need longer eye movement path in this experiment. The normal subjects’ fixation points number is less than the depressed people’s number, which means that the depressed people need more fixation points in this experiment. This result implies that depressed people need more attention to understand the picture. The response time of two groups is obviously a bimodal distribution, and the response time of normal people is faster than that of depressed people.

According to the results of significant analysis, most of the eye movement length, number of fixation points and response time characteristics are significant. The result indicates that these features are discriminative reflections of the subjects’ psychological status. While the $P_{11}$-Set C, $M_{11}$-Set B, $L_{11}$-Set C, $L_{01}$-Set B and $L_{01}$-Set B aren’t significant in independent sample t-test, which means there are no significant differences between the two groups in the case of positive facial stimulation or positivie background initiating. It is an powerful evidence for the negative attraction phenomenon in depressed people.

The eye movement length reflects the scanning distance of the subject’s attention. The number of fixation points shows the area concerned by subjects. The keyboard response time shows the time between the stimulus presentation and the beginning of the reaction. Through cross-validation, the classification accuracy are 77.56%, 75.51%, 71.42% and 79.59% in the using of these characteristics respectively, indicating these features are discriminative. Although the accuracy of response time is not bad, its sensitivity is lower than fixation points and eye movement length. According to the trait of duration sensitive and locally adaptive, I-DT algorithm get a higher accuracy than I-VT algorithm and its sensitivity is the highest in single feature. The fusion of keyboard response time, eye movement length and fixation points improve the classification accuracy to 83.67%, what is more important is that the increase is due to sensitivity. By adjusting the penalty weights of the positive and negative samples to meet the demand of the screening system, the sensitivity increases from 76% to 92% with the expense of recall. The PR curve and the ROC also show that the fusion feature performs better than the individual features. The above results confirm the effectiveness of feature fusion in our system.

References

Macleod, C., Mathews, A., Tata, P.: Attentional bias in emotional disorders. J. Abnorm. Psychol. 95(1), 15–20 (1986)
Article Google Scholar
Lovibond, S.H., Lovibond, P.F.: Manual for the depression anxiety stress scales. Psychology Foundation of Australia (1996)
Google Scholar
Garnefski, N., Kraaij, V., Spinhoven, P.: Negative life events, cognitive emotion regulation and emotional problems. Personal. Individ. Differ. 30, 1311–1327 (2001)
Article Google Scholar
Pinninti, N.R., et al.: MINI international neuropsychiatric schedule: clinical utility and patient acceptance. Eur. Psychiatry 18, 361–364 (2003)
Article Google Scholar
Laeng, B., Bloem, I.M., D’Ascenzo, S., et al.: Scrutinizing visual images: the role of gaze in mental imagery and memory. Cognition 131(2), 263–283 (2014)
Article Google Scholar
van Harmelen, A.-L., van Tol, M.-J., Demenescu, L.R., et al.: Enhanced amygdala reactivity to emotional faces in adults reporting childhood emotional maltreatment. Soc. Cogn. Affect Neurosci. 8(4), 362–369 (2012)
Article Google Scholar
Bao, S., Ma, H., Li, W., et al.: Discrimination of positive facial expressions is more susceptible to affective background scenes. Int. Proc. Econ. Dev. Res. (2014)
Google Scholar
Li, W., Ma, H., Wang, X., Shi, D.: Features Derived From Behavioral Experiments To Distinguish Mental Healthy People From Depressed People. Acta Press (2014)
Google Scholar
Salvucci, D.D., Goldberg, J.H.: Identifying fixations and saccades in eye-tracking protocols. In: Proceedings of the 2000 Symposium on Eye Tracking Research and applications, pp. 71–78 (2000)
Google Scholar
Duque, A., Vázquez, C.: Double attention bias for positive and negative emotional faces in clinical depression: evidence from an eye-tracking study. J. Behav. Ther. Exp. Psychiatry 46, 107–114 (2015)
Article Google Scholar
Bloem, I.M., D’Ascenzo, S., Tommasi, L.: Scrutinizing visual images: the role of gaze in mental imagery and memory. Cognition 131, 263–283 (2014)
Article Google Scholar
Wang, Y., Ma, H.: Identification differences among people under context of complex images. In: Bioelectronics and Bioinformatics (ISBB) (2015)
Google Scholar
Bradley, B.P., Mogg, K., Millar, N., et al.: Selective processing of negative information: effects of clinical anxiety, concurrent depression, and awareness. J. Abnorm. Psychol. 104(3), 532 (1995)
Article Google Scholar
Anderson, E., Siegel, E.H., Bliss-Moreau, E., et al.: The visual impact of gossip. Science 332(6036), 1446–1448 (2011)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Chen, L.F., Yen,Y.S.: Taiwanese Facial Expression Image Database. Brain Mapping Laboratory, Institute of Brain Science, National Yang-Ming University, Taipei (2007)
Google Scholar
Lang, P.J., Bradley, M.M., Cuthbert, B.N.: International affective picture system (IAPS): affective ratings of pictures and instruction manual. Technical report A-8. University of Florida, Gainesville, FL (2008)
Google Scholar

Download references

Acknowledgements

This research is supported by The National Key Research and Development Program of China (2016YFB0100901-1) and The National Natural Science Foundation of China (NSFC61171113).

Author information

Authors and Affiliations

Department of Electronic Engineering, Tsinghua University, Beijing, China
Chao Le, Huimin Ma & Yidong Wang

Authors

Chao Le
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yidong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huimin Ma .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
Dalian University of Technology, Dalian, China
Xiangwei Kong
UNSW, Sydney, New South Wales, Australia
David Taubman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, C., Ma, H., Wang, Y. (2017). A Method for Extracting Eye Movement and Response Characteristics to Distinguish Depressed People. In: Zhao, Y., Kong, X., Taubman, D. (eds) Image and Graphics. ICIG 2017. Lecture Notes in Computer Science(), vol 10666. Springer, Cham. https://doi.org/10.1007/978-3-319-71607-7_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-71607-7_43
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71606-0
Online ISBN: 978-3-319-71607-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

A Method for Extracting Eye Movement and Response Characteristics to Distinguish Depressed People

Abstract

Similar content being viewed by others

A Study on Depression Detection Using Eye Tracking

Exploring Eye Activity as an Indication of Emotional States Using an Eye-Tracking Sensor

Recognizing Polychronic-Monochronic Tendency of Individuals Using Eye Tracking and Machine Learning

Keywords

1 Introduction