Abstract:
We devised and evaluated a multi-modal machine learning-based system to analyze videos of school classrooms for "positive climate" and "negative climate", which are two d...Show MoreMetadata
Abstract:
We devised and evaluated a multi-modal machine learning-based system to analyze videos of school classrooms for "positive climate" and "negative climate", which are two dimensions of the Classroom Assessment Scoring System (CLASS) [1]. School classrooms are highly cluttered audiovisual scenes containing many overlapping faces and voices. Due to the difficulty of labeling them (reliable coding requires weeks of training) and their sensitive nature (students and teachers may be in stressful or potentially embarrassing situations), CLASS- labeled classroom video datasets are scarce, and their labels are sparse (just a few labels per 15-minute video dip). Thus, the overarching challenge was how to harness modern deep perceptual architectures despite the paucity of labeled data. Through training low-level CNN-based facial attribute detectors (facial expression & adult/child) as well as a direct audio-to- climate regressor, and by integrating low-level information over time using a Bi-LSTM, we constructed automated detectors of positive and negative classroom climate with accuracy (10- fold cross-validation Pearson correlation on 241 CLASS-labeled videos) of 0.40 and 0.51, respectively. These numbers are superior to what we obtained using shallower architectures. This work represents the first automated system designed to detect specific dimensions of the CLASS.
Published in: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)
Date of Conference: 14-18 May 2019
Date Added to IEEE Xplore: 11 July 2019
ISBN Information: