Abstract
There is still considerable disagreement on key aspects of affective computing - including even how affect itself is conceptualized. Using a multi-modal student dataset collected while students were watching instructional videos and answering questions on a learning platform, we investigated the two key paradigms of how affect is represented through a comparative approach: (1) Affect as a set of discrete states and (2) Affect as a combination of a two-dimensional space of attributes. We specifically examined a set of discrete learning-related affects (Satisfied, Confused, and Bored) that are hypothesized to map to specific locations within the Valence-Arousal dimensions of Circumplex Model of Emotion. For each of the key paradigms, we had five human experts label student affect on the dataset. We investigated two major research questions using their labels: (1) Whether the hypothesized mappings between discrete affects and Valence-Arousal are valid and (2) whether affect labeling is more reliable with discrete affect or Valence-Arousal. Contrary to the expected, the results show that discrete labels did not directly map to Valence-Arousal quadrants in Circumplex Model of Emotion. This indicates that the experts perceived and labeled these two relatively differently. On the other side, the inter-rater agreement results show that the experts moderately agreed with each other within both paradigms. These results imply that researchers and practitioners should consider how affect information would operationally be used in an intelligent system when choosing from the two key paradigms of affect.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Affective state labeling
- Circumplex Model of Emotion
- Inter-rater agreement
- Intelligent tutoring systems
- Affective computing
1 Introduction
Affect has become an important area of research within learning [1,2,3]. Data labeling is a preliminary step towards training machine learning models to provide affect-related analytics to teachers and learners. However, there is a lack of agreement in the related literature even for how affect is itself conceptualized. There are two major paradigms for affect representation: (1) Affect as a set of discrete states [4,5,6,7,8,9] and (2) Affect as a combination of a two-dimensional space of attributes [11].
There are several benefits to viewing student affect as a set of discrete states. One such benefit is easier understanding of students’ actual states and driving customized interventions accordingly. However, labeling discrete affective states presents a challenge to observers in distinguishing between closely-related affective states. For instance, confusion and frustration are often treated as separate affective states (e.g., [8]), but Liu et al. [10] argue that they may simply represent different ranges of a continuum. Researchers using discrete sets of affective states often also struggle with how to distinguish neutral affect from mild affect and how to handle uncommon affect outside the core affect labeling scheme. These challenges can represent major risks to the quality of affect labeling in ways that are not easily seen in overall inter-rater agreement values that cut across large numbers of constructs. These issues may particularly emerge in situations where affect labelers have limited training or are asked to label data where video is sometimes ambiguous, due to factors such as facial occlusion, adverse pose variations, gum chewing, or many other factors.
In this paper, we study this issue in a focused fashion by examining a set of discrete affective states that can be reasonably expected to correlate to specific locations within the Circumplex Model of Emotion [11]. Specifically, we study (see Fig. 1): Satisfied, which can be hypothesized to map to Positive Valence (regardless of Arousal); Bored, which can be hypothesized to map to Negative Valence and Low Arousal; and Confused, which can be hypothesized to map to Negative Valence and High Arousal. Using the student dataset in [12] and Human-Expert Labeling Process (HELP) [13] as a baseline labeling protocol, we test these hypotheses (i.e., whether these mappings between discrete affective states and Valence-Arousal are valid) and if affect labeling is more reliable with discrete affective states or Valence-Arousal.
2 Data Collection
In this study, we used student data which was a subset of a larger dataset previously collected through authentic classroom pilots [12]. These pilots took place in an afterschool Math course in an urban high school in Turkey. In these pilots, the students used an online educational platform to watch instructional videos and solve relevant questions. Meanwhile, our data collection application was running in the background to collect two video streams: (1) Student appearance videos from the camera (to monitor observable cues available in the student’s face or upper body); and (2) student desktop videos (to monitor contextual information).
3 Labeling Tool, Human-Experts, and Training
A labeling tool was developed and customized for use in multiple labeling experiments. In Fig. 2, a sample view for labeling Valence is shown.
Using HELP [13] as a baseline labeling protocol, five human experts with backgrounds in Psychology/Educational Psychology were recruited and trained (See Tables 1 and 2 for operational definitions of labels). Based on observed state changes, the experts provided their Valence-Arousal or discrete affect labels using all available cues (e.g., student video/audio, desktop recording with mouse cursor locations, and any relevant contextual information from the device and content platform).
In total, the human experts labeled seven hours of student data for Valence-Arousal labels first. One week later, we asked them to label the same data for discrete affect labels. Note that although the experts labeled Arousal using three different levels, we combined Low and Medium labels into a Low class for analysis of the labeled data based on the experimental results outlined in [14].
4 Comparing Discrete Affect Labels to Valence-Arousal Labels
4.1 Pre-processing of Label Data
To analyze labeling output data, both for discrete affect and Valence-Arousal labeling outputs, two pre-processing steps were taken: First, we applied windowing on the labeling output data to obtain aligned instance-wise labels of each individual expert. Second, to facilitate analysis, we derived a consensus label from all the expert labels for each instance, using majority voting in each case.
4.2 Metrics for Analysis
The derived consensus labels were then correlated to each other to measure the degree to which each discrete affective state mapped to each Valence-Arousal quadrant. Note that we already presented the hypotheses for how discrete affective states would map to Valence-Arousal in the Introduction section (Fig. 1). We calculated the degree of mapping using Precision, Recall, and F1-measures. For these calculations, the labeled set (e.g., discrete affective states) act as the true labels; whereas the mapped set (e.g., Valence-Arousal mapped to discrete affective states as hypothesized) serve as the predictions. Precision is calculated as the fraction of true predictions (i.e., true positives) to the number of all predictions (i.e., sum of true positives and false positives); whereas recall is calculated as the ratio of true predictions to all true labels (i.e., sum of true positives and false negatives). The F1 measure is calculated as the harmonic mean of precision and recall values, taking into account the trade-off between those two measures. In addition, we also checked inter-rater agreement measures for different labeling tasks (i.e., Discrete Affects, Arousal, Valence) to assess reliability of the obtained label data. As proposed in HELP [13], we utilized Krippendorff’s alpha metric to compute inter-rater agreement among experts.
4.3 Methods for Analysis
To investigate whether the discrete affective states (i.e., Satisfied, Bored, and Confused) actually map to the hypothesized Valence-Arousal quadrants, the degree of mappings was computed using the final labels for the following mapping/comparison sets:
-
Valence vs. Discrete Affect-to-Valence: We compared Valence labels to discrete affect labels, where affect labels were mapped to Valence labels using: Satisfied to Positive Valence, and Bored/Confused to Negative Valence.
-
Arousal vs. Discrete Affect-to-Arousal: We compared Arousal labels to discrete affect labels, where affect labels were mapped to Arousal labels using: Bored to Low Arousal, and Confused to High Arousal. Note that Satisfied samples were disregarded in this case since we hypothesized that they could map to both Low and High Arousal on the Circumplex Model of Emotion (See Fig. 1).
-
Discrete Affect vs. Valence/Arousal-to-Discrete Affect: We compared discrete affect labels to Valence-Arousal labels, where Valence-Arousal label pairs were mapped to discrete affect labels using: Low/High Arousal & Positive Valence to Satisfied, Low Arousal & Negative Valence to Bored, and High Arousal & Negative Valence to Confused.
5 Results
5.1 Mapping Between Discrete Affect and Valence-Arousal Labels
The Precision, Recall, and F1-measures calculated for each mapping sets are summarized in Table 3. As these results indicate, relatively higher F1 measures (consistent for both state-specific and overall results) could be achieved when discrete affect labels were mapped to Positive/Negative Valence (i.e., Valence vs. Discrete Affect-to-Valence). However, the F1 values were lower when discrete affect labels were mapped to High/Low Arousal (i.e., Arousal vs. Discrete Affect-to-Arousal). Although the overall F1 measures seemed reasonable when Valence-Arousal labels were mapped to discrete affects (i.e., Discrete Affect vs. Valence/Arousal-to-Discrete Affect), the state-specific measures highlighted the inconsistency. The reason behind that could be the fact that the distribution of High Arousal samples was lower than ~1.2% in the data, and the samples that were labeled as Confused were therefore drawn mostly from the Low-Arousal samples. This issue was mostly visible when we investigate the Valence-Arousal vs. Discrete Affect mapping Recall and F1 results. Note that although we disregarded Satisfied samples in Arousal vs. Discrete Affect-to-Arousal case with the hypothesis that they could map to both Low and High Arousal, we also checked and observed that among all the Satisfied instances, 99.8% are mapping to Low Arousal and only 0.2% are mapping to High Arousal. Note that this issue is common in all three discrete affective states: Satisfied (0.2% High Arousal), Bored (2.2% High Arousal), and Confused (3.3% High Arousal).
5.2 Inter-rater Agreement for Discrete Affects and Valence-Arousal Labeling
The inter-rater agreement results for discrete affect labeling compared to the Valence-Arousal labeling are given in Table 4. The average of all confusion matrices computed for discrete affect labels provided by all pairwise experts (i.e., any two expert pairs among the five experts) is given in Table 5. As these results indicate, the inter-rater agreement was lower for discrete affect labeling, where the pairwise confusion results showed that the experts had difficulty differentiating between Satisfied and any one of the other two states (Bored or Confused).
6 Conclusion
In this paper, through a comparative approach, we investigated the two key paradigms of how affect is represented: (1) Affect as a set of discrete states and (2) affect as a combination of a two-dimensional space of attributes. We specifically examined a set of discrete affective states (Satisfied, Confused, and Bored) that can be reasonably expected to map to specific locations within the Valence-Arousal dimensions of the Circumplex Model of Emotion [11]. We tested two major hypotheses: (1) Whether these mappings between discrete affects and Valence-Arousal are valid and (2) whether affect labeling is more reliable with discrete affects or Valence-Arousal. To investigate these hypotheses, we used HELP [13] as a baseline labeling protocol. Using HELP, five human experts labeled seven hours of student data for Valence-Arousal and discrete affect labels.
The relatively low F1 measures (See Table 3) indicate that the discrete affect labels (i.e., Satisfied, Bored, and Confused) do not directly map to Valence-Arousal quadrants in the Circumplex Model of Emotion [11]. This shows that the human experts perceived and labeled these two relatively differently although we reasonably expected the discrete affects to map seamlessly on the model. On the other side, the inter-rater agreement results (See Table 4) show that the experts moderately agree with each other in both discrete affect labeling and Valence-Arousal labeling.
There are two important implications of these major results to researchers in learning analytics field. First, how affect is conceptualized in one paradigm could not be seamlessly transferable to another paradigm (i.e., discrete affective states do not directly map to Valence-Arousal quadrants). Therefore, researchers need to decide on affect labels of interest at the beginning of research considering this limitation. Second, both discrete affect labeling and Valence-Arousal labeling resulted in moderate consensus among the experts. Therefore, researchers should consider how affect information would ultimately be used in a learning system (e.g., affect-aware interventions, feedback to content, etc.) when choosing from Valence-Arousal or discrete affect labeling to generate ground-truth labels for model development.
References
Sabourin, J., Mott, B., Lester, J.C.: Modeling learner affect with theoretically grounded dynamic Bayesian networks. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011. LNCS, vol. 6974, pp. 286–295. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24600-5_32
Jaques, N., Conati, C., Harley, J.M., Azevedo, R.: Predicting affect from gaze data during interaction with an intelligent tutoring system. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 29–38. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07221-0_4
Pardos, Z.A., Baker, R.S., San Pedro, M.O.C.Z., Gowda, S.M., Gowda, S.M.: Affective states and state tests: investigating how affect and engagement during the school year predict end of year learning outcomes. J. Learn. Anal. 1(1), 107–128 (2014)
Kapoor, A., Picard, R.W.: Multimodal affect recognition in learning environments. In: International Conference on Multimedia (2005)
Kapoor, A., Burleson, W., Picard, R.W.: Automatic prediction of frustration. Int. J. Hum.-Comput. Stud. 65(8), 724–736 (2007)
Hoque, M.E., McDuff, D.J., Picard, R.W.: Exploring temporal patterns in classifying frustrated and delighted smiles. Trans. Affect. Comput. 65(8), 323–334 (2012)
Grafsgaard, J.F., Wiggins, J.B., Boyer, K.E., Wiebe, E.N., Lester, J.C.: Automatically recognizing facial indicators of frustration: a learning-centric analysis. In: Affective Computing and Intelligent Interaction (2013)
Bosch, N., D’Mello, S., Baker, R., Ocumpaugh, J., Shute, V., Ventura, M., Zhao, W.: Automatic detection of learning centered affective states in the wild. In: International Conference on Intelligent User Interfaces (2015)
Arroyo, I., Cooper, D.G., Burleson, W., Woolf, B.P., Muldner, K., Christopherson, R.: Emotion sensors go to school. In: Artificial Intelligence in Education (AIED), vol. 200, pp 17–24 (2009)
Liu, Z., Pataranutaporn, V., Ocumpaugh, J., Baker, R.S.: Sequences of frustration and confusion, and learning. In: Proceedings of the 6th International Conference on Educational Data Mining, pp. 114–120 (2013)
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
Okur, E., Alyuz, N., Aslan, S., Genc, U., Tanriover, C., Arslan Esme, A.: Behavioral engagement detection of students in the wild. In: André, E., Baker, R., Hu, X., Rodrigo, M, du Boulay, B. (eds.) AIED 2017. LNCS (LNAI), vol. 10331, pp. 250–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61425-0_21
Aslan, S., Mete, S.E., Okur, E., Oktay, E., Alyuz, N., Genc, U., Stanhill, D., Arslan Esme, A.: Human expert labeling process (HELP): Towards a reliable higher-order user state labeling process and tool to assess student engagement. Educ. Technol. 57(1), 53–59 (2017)
Aslan, S., Okur, E., Alyuz, N., Arslan Esme, A., Baker, R.S.: Human expert labeling process: valence-arousal labeling for students’ affective states. In: Proceedings of the 8th International Conference in Methodologies and Intelligent Systems for Technology Enhanced Learning. Springer, Cham (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Aslan, S., Okur, E., Alyuz, N., Arslan Esme, A., Baker, R.S. (2018). Towards Human Affect Modeling: A Comparative Analysis of Discrete Affect and Valence-Arousal Labeling. In: Stephanidis, C. (eds) HCI International 2018 – Posters' Extended Abstracts. HCI 2018. Communications in Computer and Information Science, vol 851. Springer, Cham. https://doi.org/10.1007/978-3-319-92279-9_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-92279-9_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92278-2
Online ISBN: 978-3-319-92279-9
eBook Packages: Computer ScienceComputer Science (R0)