Abstract
This paper outlines the first phase of our implementation of a system for non-intrusive estimation of a computer user’s affective state based on the Circumplex Model of Affect [1], from monitoring the user’s pupil diameter and facial expression [2]. The details of the original design plan for this system have been described previously [2]. The outline describes each part of data collecting process including: Obtaining 3D facial coordinates by Kinect, recording the pupil diameter signal, embedding the facial expression to Facial Animation Parameter indices, and the description of how the experiment will be setup.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
As we have described in our proposal [2] for a non-intrusive estimation of a computer user’s affective state based on the Circumplex Model of Affect [1], our goal is to build a supervised machine learning system to classify the user’s state of affect. Thus, the design of the data collecting process plays an extremely important role to achieve the right result. To collect the data, we set up the experiment in a scenario where human subjects will be presented with images from the International Affective Picture System (IAPS) [3] to elicit from them affective reactions that will be manifested through their involuntary changes in pupil diameter and in their facial expressions, while also reporting the subjective assessment of their reactions through the Self-Assessment Manikin (SAM) [4]. During the recording sessions a Kinect sensor will be used to collect the 3D facial coordinates and the Facial Animation Parameter Units (FAPUs) [5] from the subject’s face, as well as an estimate of the illumination level in the area around the eyes of the subject. Simultaneously, an Eye Gaze Tracking (EGT) system will be used to record the pupil diameter in the eyes of the subject. The self reports of arousal and valence marked by the subject in SAM for each IAPS image will also be recorded into the dataset for later use. The last part of this section will contain some brief explanation of terms used throughout this article. Then, in the following section, we will discuss in detail the experiment procedure, and the process of data acquisition as previously described in the early part of the section.
Circumplex Model of Affect: The Circumplex model of affect was introduced by Russell [1] as a proposal to model the affective state of a human. Russell proposed that an affective state consists of two parameters: arousal and valence. Their relationship can be visualized as a four-quadrant graph where valence is the horizontal axis and arousal is the vertical axis. Other studies have shown a similar trend where models have two parameters that are referred using different words [6, 7].
The International Affective Picture System: The International Affective Picture System (IAPS) is a large set of color photographs that elicit shifts in the subject’s arousal and valence. IAPS contains a wide variety of stimulus types for more than 1,000 exemplars of human experience such as joyful, sad, fearful, attractive, angry, simple objects, scenery, etc. The idea is to present the subject with visual stimuli to modify his/her affective state while recording his/her reaction. The IAPS has been used across various fields of study to investigate emotion and attention worldwide and it is well-known for its replication and robustness. Pictures from IAPS are rated with arousal, pleasure, and dominance mean values, based on reactions from men and women, which make them suitable to be used as stimuli in this study. More in-depth information about IAPS can be found in [3].
Self-Assessment Manikin: The Self-Assessment Manikin (SAM) [4] is a tool for a non-verbal, pictorial assessment reporting technique that directly expresses the pleasure, arousal, and dominance associated with the affective state of the subject while being exposed to a stimulus. We mainly focus on the 2-dimensional Circumplex Model of Affect; therefore, dominance reactions are not considered. As demonstrated in Fig. 1, the SAM figure varies along each scale. In the arousal scale, the left-most figure corresponds to the most extremely stimulated, excited, frenzied, jittery, wide-awake, or aroused state. While the other end of the scale represent a completely relaxed, calm, sluggish, dull, sleepy, or unaroused state. The scale ranges from 1 to 9 for the purpose of intermediate fine-grained rating. For the pleasure (valence) assessment, the scale works the same way as in arousal except, in this case, the left-most figure represents a highly happy, pleased, satisfied, contended, hopeful state; while the opposite end represents a very unhappy, annoyed, unsatisfied,melancholic, despaired, bored state.
2 Experiment Setup
The entire data collection process is depicted in the diagram shown in Fig. 2. The diagram describes the process handled by the AffectiveMonitor application [2] and indicates the list of output files for post-data analysis. Kinect, running on the primary machine is responsible to obtain 3D facial coordinates while the TM3 Eye-Gaze Tracker device running on a secondary machine records the pupil diameter signals and sends them over to the primary machine. Desired data are then recorded during the experiment session and are written out in a timely manner, for each frame, to output files. We show how the experiment has been set up and its environment in Fig. 3a.
2.1 Experiment Procedure
AffectiveMonitor has a separate “Experiment” interface tab section (Fig. 3b) to conduct the experiment from the start to the end. The experiment takes about 35 min and before the experiment session begins, the subject will go through the calibration process consisting of adjusting the shape of a 3D facial model, and adjusting the subject’s position for pupil diameter recording. 70 pictures selected from IAPS will be shown to the subject, one after another, until all samples are presented. For each sample, the subject is asked to look at the picture for 6 s, then immediately after, rate their affective state assessment via SAM (5 s). In between samples, a gray screen is shown during the resting period. The subject is urged to stay still during the first 6 seconds, when he/she is first presented with the stimulus in order to reduce the measurement interference that could occur during the recording process.
2.2 Sample Selection
For the experiment, we selected IAPS pictures on the basis of the mean and variance of arousal and valence that come with each picture from the IAPS repository. Our criterion on selecting the samples is based on the study of a 12-Point Affect Circumplex (12-PAC) model of Core Affect [8] which is also based on the Circumplex Model of Affect. The study introduces the modification of hypothetically dividing the Circumplex model into twelve segments called the 12-Point Affect Circumplex (12-PAC) structure. By finding the correlation between many previous studies and their own, the authors report their analysis, and their placement of moods on a 12-PAC structure as shown in Fig. 4b. Based on this study, we selected the IAPS samples that are located around desired angles of those core affects that have more than 60% likelihood to appear in the Circumplex Model. Accordingly, we selected 70 samples as shown in Fig. 4a and we also list the samples in Table 1 by the picture ID from IAPS, categorized by core affect description.
3 Data Acquisition
In this section, we explain the method of obtaining each parameter including 3D facial coordinates, pupil diameter, Facial Animation Parameter (FAP), and illumination around the facial area. All of them are recorded with the same timestamp by the AffectiveMonitor application.
3.1 3D Facial Coordinates
Kinect has provided the basic framework software called HD face [9]. The framework can detect the face of the closest person in front of the Kinect sensor and generate the person’s 3D facial mesh model in real-time. Another interesting prospect of this framework is an ability to reconstruct the person’s face shape by 3D scanning to acquire a very accurate characterization of the person’s face. Given all that, we have integrated this framework into our AffectiveMonitor application to benefit from all the functionality that Kinect has to offer. The mesh model can also be represented by 3D coordinates and can be thought of as markers attached on the subject’s face so whenever the subject’s facial expression changes, the markers also move according to the corresponding facial muscle movement. By recording frame by frame, we can observe the changes of 3D facial coordinates that occurs because of the subject’s facial expression.
One problem that arises during the design of the experiment is the impossibility to restrain the movement of the subjects during the experiment. Body shifts can alter the position and orientation of the subject’s face, which may complicate their processing. To circumvent the issue, we have built a feature in AffectiveMonitor to artificially re-position and re-orient the subject’s face before recording the values. Fortunately, Kinect also provides the pivot point as well as the orientation (in quaternion) of the face. Thus, we can reverse the rotation and transform the point cloud to neutral position at the origin by applying a change of coordinate frame as described in [10].
3.2 Pupil Diameter and Illumination
To acquire pupil diameter signals, we utilize the TM3 Eye-Gaze Tracker (EGT), which has a capability to measure the pupil diameter using the dark-pupil method. We set the sampling interval at 0.33 s and average samples in an average window of 30-samplewidth. The pupil diameter signals are then transferred to the primary machine via TCP/IP, over ethernet cable. AffectMonitor has a feature to plot the average of the pupil diameter dynamically as shown in Fig. 6a.
Many studies have shown that the pupil diameter is under the influence of the Autonomous Nervous System (ANS) and can be used as a marker for arousal level [11]. Unfortunately, pupil diameter is also susceptible to the amount of light on the retina. To bypass this issue, we plan to perform a post-processing step to eliminate the effect of the pupillary light reflex using an adaptive signal processing technique. In order to attain that goal, illumination around the eyes must also be recorded as one of the output parameters. We obtain the illuminance utilizing Kinect’s RGB camera by cropping the video around the eye area (Fig. 6b) and calculating the illumination based on the cropped video. A more detailed explanation on this subject will be reported in a separate article, under preparation (Fig. 5).
3.3 Facial Animation Parameter
The Facial Animation Parameter (FAP) is one concept of the components in MPEG-4 Face and Body Animation (FBA) International Standard (ISO/IEC 14496 -1 & -2) [13]. It describes a standard protocol to encode the virtual representation of human and humanoid movement, specifically around the facial region of a body. FAP is commonly used to describe basic actions of facial expression for a synthetic face; for instance, in the CANDIDE model [5]. The ability of FAP to encode the primitive expression information with small memory usage makes it interesting as an alternative method to preserve the subject’s facial expression.
The Facial Animation Parameters (FAP) are defined by the displacement between facial feature points defined by FBA (See Fig. 7) which are measured by Facial Animation Parameter Units (FAPUs). FAPUs are normally calculated from a neutral face and divided by 1024 so that the unit is small enough to enable FAPs to be represented in integer numbers. The purpose of FAPUs is to allow a consistent way to interpret FAPs indices for any facial model regardless of their shape and dimension. The description of the FAPUs and how to calculate them are listed in Table 2. We decide to output 19 FAPs listed in Table 3 which are actively related to basic facial expressions as desired output from the total of 68 FAPs [12]. Note that in Fig. 7, the numbering of the facial feature points is according to FBA; while, the index coordination system from Kinect is in different listing. See Table 3 for the correspondence between Kinect’s index coordination system and FBA’s coordination system.
4 Discussion and Conclusion
Given all the previous explanations, we would like to reemphasize that the purpose of this work is to collect the data suitable to train a supervised machine learning model, to classify the affective state of the subject in a Circumplex Model of Affect. In order to achieve that, we have to estimate two parameters, arousal and valence, our model. In case of arousal, we have found strong evidence supporting the notion that pupil diameter is influenced by the Autonomous Nervous System, which is responsible for the state of arousal. While, in the case of valence, we decided to estimate this parameter on the basis of the subject’s facial expression since pleasure and displeasure are directly expressed naturally by activity of the facial muscles. Two data formats representing facial expression are recorded, 3D facial coordinates and Facial Animation Parameter index and each has pros and cons. 3D coordinates are practical because they preserve the whole information recorded in the facial expression without losing any; while, FAP is better in the aspect of memory management. Other data that are collected along during the experiment, such as illuminance around the eye area, distance between the subject’s face and the Kinect sensor, and FAPU, are necessary for scaling adjustment and calibration. Data are obtained in a time-stamped manner where pupil diameter, FAP, 3D facial coordinates, and others are captured simultaneously and recorded together. Additionally, they are recorded in a customized output file for facilitating the transfer of the data to the analysis phase.
References
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)
Tangnimitchok, S., O-larnnithipong, N., Ratchatanantakit, N., Barreto, A., Ortega, F.R., Rishe, N.D.: A system for non-intrusive affective assessment in the circumplex model from pupil diameter and facial expression monitoring. In: Kurosu, M. (ed.) HCI 2018. LNCS, vol. 10901, pp. 465–477. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91238-7_38
Lang, P.J.: International affective picture system (IAPS): affective ratings of pictures and instruction manual. Technical report (2005)
Bradley, M.M., Lang, P.J.: Measuring emotion: the self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 25(1), 49–59 (1994)
Ahlberg, J., Ahlberg, J.: CANDIDE-3 - an updated parameterised face (2001)
Lang, P.J.: The emotion probe studies of motivation and attention. Technical report (1995)
Barrett, L.F., Bliss-Moreau, E.: Chapter 4 affect as a psychological primitive (2009)
Yik, M., Russell, J.A., Steiger, J.H.: A 12-point circumplex structure of core affect. Emotion 11(4), 705–731 (2011)
Rahman, M.: Understanding how the kinect works. In: Beginning Microsoft Kinect for Windows SDK 2.0, pp. 21–40. Apress, Berkeley(2017)
Kuipers, J.: Quaternions and Rotation Sequences: A Primer With Applications To Orbits, Aerospace and Virtual Reality. Princeton University Press, Princeton (1999). kuipers jb, 41 william street, Princeton, NJ 08540, USA. 1999. 372pp. Aeronaut. J. (1999)
Gao, Y., Barreto, A., Adjouadi, M.: Detection of sympathetic activation through measurement and adaptive processing of the pupil diameter for affective assessment of computer users. Am. J. Biomed. Sci 1(4), 283–294 (2009)
Zhang, Y., Ji, Q., Zhu, Z., Yi, B.: Dynamic facial expression analysis and synthesis with mpeg-4 facial animation parameters. IEEE Trans. Circ. Syst. Video Technol. 18(10), 1383–1396 (2008)
Pandzic, I.S., Forchheimer, R.: MPEG-4 fAcial Animation: The Standard, Implementation and Applications. Wiley, Hoboken (2003)
Acknowledgement
This research was supported by National Sciences Foundation grants HRD- 0833093 and CNS-1532061 and the FIU Graduate School Dissertation Year Fellowship awarded to Ms. Sudarat Tangnimitchok.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tangnimitchok, S., O-larnnithipong, N., Ratchatanantakit, N., Barreto, A. (2019). Affective Monitor: A Process of Data Collection and Data Preprocessing for Building a Model to Classify the Affective State of a Computer User. In: Kurosu, M. (eds) Human-Computer Interaction. Recognition and Interaction Technologies. HCII 2019. Lecture Notes in Computer Science(), vol 11567. Springer, Cham. https://doi.org/10.1007/978-3-030-22643-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-22643-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22642-8
Online ISBN: 978-3-030-22643-5
eBook Packages: Computer ScienceComputer Science (R0)