Keywords

1 Introduction

The ability to program is an important skill that is slowly becoming a critical skill in the 21st century. The ability to systematically solve a problem by implementing an application has thus become a requirement to contribute to digital society. Computers and smart devices have become ubiquitous, and the Internet of Things (such as smart devices and appliances) have been integrated into our daily lifestyles. Thereby increasing the amount of data being generated and subsequently increasing the demand for individuals that can program [1]. Although education programs that target the programming skill set have come a long way, it is still plagued by many issues that hinder computer science education. Although there exist fundamental stumbling blocks, such as a lack of numeracy skills [2] and resources, there exist more subtle issues that make it difficult to learn to program. Students learn in different ways, and something needs to be done to take these differences into account.

Traditional teaching methods [3] attempt to maximize learning by targeting the attributes shared by the majority of the classroom. However, each student is more receptive to particular teaching methods and may not engage with other methods. Another problem is that the turnaround time for finding out which students are not engaging and how to address it can be quite high. Some educators (especially inexperienced educators) only ask for student feedback at the end of each semester when all the content has been completed, instead of more frequent feedback during the semester. The author argues the educator should align student feedback with their teaching methods and adapt accordingly, but if the feedback is sparse, so is its applicability to current students in the classroom.

The rise of ubiquitous computing allowed for students to access content in more flexible ways, thereby promoting more adaptive student learning [4]. However, current methods show the benefits are limited within a physical classroom setting, where the traditional passive-student approach still prevails. Access to a computer or technology becomes a prerequisite in this setting, thereby excluding the students who do not have access to these technologies. There is a need for adaptive teaching and learning methods that do not incur a great deal of cost for the student, but still, provide the personalized student experience which is conducive to more effective learning.

The article introduces such an approach by applying affective computing (where the student’s emotion is derived using a sensor, some processing and machine learning) to achieve adaptive teaching and learning in a physical classroom while limiting the cost to the educator’s side. It begins by defining the problem background, where the underpinning issues in computer science education are briefly unpacked, followed by an outline on adaptive learning and where adaptive teaching is relevant. Background to affective computing methods are then introduced, which is relevant for the proposed model that is subsequently discussed, along with a discussion on the preliminary results and recommendations for the implemented prototype. The article ends with a conclusion and future work.

2 Problem Background

Technology has progressed well, and the near prospects show even more potential in various domains. It has especially opened opportunities within the space of higher education. There is also evidence that shows there is improved access to education and satisfaction through distance learning [5]. More recently we can also see that blended learning has enhanced both the effectiveness and efficiency of more meaningful learning experiences [6]. Thereby showing there is a continuing inquiry into how best to use technology in higher education.

The fourth industrial revolution is characterized by the fusion of technologies that blur the lines of the physical, digital and biological spheres of our world, which have evolved at an exponential rate [7]. Emerging technology breakthroughs such as mainstream artificial intelligence, the Internet of Things and 3-D printing have disrupted certain industries such as manufacturing, logistics or commerce and the disruption of education is not far behind. The breakthroughs have to lead to a great deal of demand for the creation of new technologies and subsequently the programming skill, along with other skills within the space of computer science. Thereby making the education of computer science a crucial component in higher education.

2.1 Computer Science Education

Several challenges exist in computer science education and more needs to be done to cater for the current and upcoming demand for computer science within higher education. Teaching and learning methods exist that directly attempt to address these issues found in computer science education, but certain issues remain open problems. Fundamental stumbling blocks are still present in certain institutions (especially in third world institutions). There is a lack of resources, such as access to equipment, human capital [8], along with low levels of motivation and mathematical competency levels [2] which hinder the delivering of computer science graduates. If these fundamentals are not understood or accommodated, the student will find difficulty in learning anything. Innovative teaching methods that are cognizant of these constraints need to be introduced for us to provide for the fourth industrial revolution.

In the past decade, in response to proposed science education reform [9], there has been an increase in the use of active as opposed to passive teaching methods to improve computer science education. Historically, participatory teaching methods have always been seen as a critical component in the teaching of computer science [10]. However, the way we have engaged with the student has changed. Research [11] has shown that there is value in using constructivism in teaching computer science, where knowledge is constructed by the student instead of merely receiving it from the educator. Thereby shifting course design to include certain teaching methods to provide for both the effective and non-effective novice [12]. These teaching methods include pair programming [13], game-based learning [14] and using more accessible programming languages, such as Scratch [15]. However, an educator can introduce these learning methods within their context, but it may only benefit certain students, leaving the rest in the lurch.

2.2 Adaptive Teaching and Learning

Adaptive teaching and learning methods aim to maximize learning for the target student base by using information derived from the student to adapt teaching to their learning style, which in turn improves the learning process [4]. Adaptive learning can be defined as a learning system that monitors user behavior, interprets it according to a domain-specific model and acts on these interpretations to dynamically facilitate the learning process. Traditional learning methods have shown to be ineffective in achieving this individual or personalized learning experience [16], and this has lead to a pursuit of various teaching methods that may be able to achieve adaptive learning.

Adaptive learning methods discussed in the literature can be divided into four categories [16]. The first category of adaptive learning systems is called adaptive interaction and achieves adaptive learning by changing the way the user interfaces with the e-learning setting by changing aspects, such as color or font schemes to accommodate the user. The second category and most commonly used category is adaptive course delivery systems where course content is changed to make the student feel more comfortable, such as accommodating subjective assessments and providing the student with alternative paths or selections for course material. The third category consists of systems with content discovery and assembly where a concerted effort is made to tailor content based on historical student information and behavior during every course design phase. The final category includes adaptive collaboration support where continuous social interaction or communication is used to support the learning process [17].

All the above categories of adaptive learning are enacted within a specific environment, which comply with specific models in adaptive learning. The models in adaptive learning environments include the domain model, the learner model, group models and the adaptation model [16]. The domain model (also known as the application model) focuses on adaption efforts within the context of roles, relationships and course elements found in the intended application domain. The learner model adapts when changes occur in student behavior, demographics and achievements. Group models, are similar to the learner model, where they glean information from the characteristics for a group of similar students (instead of an individual) in a dynamic manner. The last model, the adaption model, facilitates adaption in various layers of abstraction to determine what, when and how certain aspects can be adapted.

Pea discusses two key dimensions required in the teaching process: the social dimension and technological dimension [18]. Historically much research has targeted the social dimension for facilitating more effective teaching. We are now beginning to understand how to best leverage the technological dimension of effective teaching, especially in computer science. Adaptive teaching and learning can be seen as the bridge between these two dimensions, and there is value in exploring where the two intersect. Thereby showing there is also value in exploring the varying levels of student input and new attributes that can be leveraged to facilitate better adaptive teaching in the classroom.

The area we explore is similar to the adaptive collaboration support category applied to the learner model, but we explicitly look at how support can be provided to the educator specifically for them to adapt their teaching during a class. The feedback delivered to the educator is derived in a novel way by using affective computing to gain student sentiment on specific content delivery to infer whether teaching is well received, while it is being delivered to the students.

2.3 Affective Computing

Picard defines affective computing as computing that relates to, arises from or influences emotions [19]. One of the key points Picard brings up when proposing the concept of affective computing is its benefit within a teaching and learning setting. The affect derived in these systems provides a key attribute that promotes learning: the ability to determine if the user is exhibiting enthusiasm, excitement or experiencing confusion, frustration and anxiety.

The premise is that certain emotions portrayed by a user are more conducive to learning and potentially negative emotions, which detract from learning. Educational psychologists have recently determined that emotions intertwined in teacher responses and student actions are an integral part of the teaching and learning process [20]. Their pursuit has lead to new theoretical frameworks that deviate from focusing on either individuals or environments without any social interactions, but rather leverage them to understand the classroom better learning context. Most of the research focuses on the educator emotions and their impact on learning. Emotions that include frustration when a student cannot grasp a concept or disappointment with a lack of effort from the student negatively impact the student and some research attempts to find ways of regulating these emotions [21, 22]. While more recent research explores student emotions, such as enjoyment, pride and hope and their relationship with the learning process [23]. However, the primary instrument used to capture or determine the emotions using surveys or interviews, which make insights derived from the classroom a more “offline” exercise.

Thankfully, technology has progressed to a point where a machine can be used to determine the emotion of a user, thereby automating the capture of these user emotions. Thereby opening up an avenue of research that leverages the capturing of emotion within an educational setting in a more “online” manner within the context of a physical classroom.

2.4 Similar Work

New entrants within this context attempt to derive emotion from students in a physical classroom setting using various physiological sensors. There are physical manifestations or attributes a user portrays when they experience emotion, and by capturing these attributes, one can derive their approximate emotion. Historically, physiological signals, such as skin conductance or heart rate have been used to determine user emotion in various contexts, such as lie detection. However, they come with their limitations [24]. One of these constraints being the requirement of special sensor equipment for each participant and the lack of portability the equipment exhibits, which limit its practicality within a physical classroom setting.

Shen, Wang and Shen use a collection of biofeedback devices to collect physical data such as heart rate, skin conductance, blood volume pressure and brain waves for every student [25]. Using labelled positive and negative emotion data from these sensors to train a Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classifier, they achieved between 60.8% and 86.3% accuracy depending how many sensors you factor in. However, some users are not very comfortable with wearing these sensors or providing these attributes, because they feel it is quite intrusive and it may not be yet practical within the physical classroom context.

Wu, Tzeng and Huang capture eye movements, brain waves and heartbeat while the student is playing a digital game designed to teach Newton’s law of motion [26]. They specifically outline there is a significant relationship between these physiological attributes and effective learning. However, similar to other work, it too suffers from privacy, hardware and practicality issues. More work needs to be done that introduces models, which capture student emotion in a less intrusive manner with minimal overhead.

3 Experiment Setup

The study serves as exploratory research that allows for further insights on deriving emotion within the domain of the physical classroom without being too intrusive. Once sufficient background on the problem domain and methods is explored, a model is formed, along with a basic implementation for a pilot study to derive insights relevant on whether there is value in using computer vision methods to derive emotion within a physical classroom.

3.1 Data Collection

In the pilot study, video footage of a small group of computer science lab students was captured using a Canon 80D placed in front of the classroom for three classes, which is set to capture video at a resolution of 1920 by 1080 at 60 frames per second. In the environment nominal lighting was provided and any occlusions within the scene were kept to a minimum. Each video sample contained footage from the beginning of the class until the end of the class with an average time of 80 min.

3.2 Data Analysis

Once the video was collected, the methods described in the following section are applied to capture, process and classify the emotions relevant to the study. The emotion results are plotted for the observer, along with the emotional mean. The emotions measured include:

  • anger

  • contempt

  • disgust

  • fear

  • happiness

  • neutral

  • sadness

  • surprise

The classification outcomes at various stages in the video footage are then observed and any important shifts are noted and collated to derive insights for the study.

4 Model

In order to derive emotion from the students in the physical classroom computer vision methods are employed to derive each student’s emotion as depicted in Fig. 1. Once captured, each video frame is sent for the region of interest (ROI) segmentation, where in this case is face detection in the scene using pre-trained Haar cascades. Any ROI sub-images less than 40 by 40 pixels are discarded because it is difficult to derive an emotion on such a low resolution with the current emotion classification method. Each ROI is then processed further to derive emotion scores for each category using Microsoft’s Cognitive Service Face API (version 1.0). The emotion scores for each ROI is then returned and consolidated to a mean emotion score for each category for a predefined time window, which can be set by the observer. The scores and significant events are then displayed with the report module and provides a brief notification on whether a class is going well or if the educator should adjust their teaching accordingly.

Fig. 1.
figure 1

A model for achieving adaptive teaching and learning using computer vision methods to derive emotion.

5 Results and Recommendations

Fig. 2.
figure 2

An example of the lab class group used for the pilot study, where the face ROI have been removed for privacy reasons.

The pilot implementation successfully derived the emotion scores for each of the students in the physical classroom. As shown in Fig. 2, even at a side profile view, the faces for most of the students in the classroom that are participating can be captured for further processing. Once the ROI images are sent to the Microsoft API the emotion scores are successfully returned in JSON format as seen in Fig 3. The emotion scores are then consolidated and parsed by the report module for display to the observer or educator. The observer can then use the report module to view the current and mean emotion scores and adjust teaching accordingly in future classes delivered to the same student class.

Fig. 3.
figure 3

An example of the emotion scores for one captured face in the class, depicting the neutral emotion.

Although the pilot implementation showed that it is possible to derive emotion in a physical classroom and use it to adjust teaching, some issues were encountered that hinder the capturing of student emotions in the classroom. These issues are mostly attributed to environmental or hardware constraints. For the students further back in the classroom ROI segmentation would fail at times and in some cases when they would be captured, deriving the emotion for them would be unsuccessful due to the low resolution of the ROI. The number of frames processed within a period was also limited by the API and bandwidth available, which can slow down processing of the frames, thereby warranting the investigation of a local emotion recognition method for further implementations in the study.

5.1 Privacy and Ethical Considerations

Deriving emotion for adapting teaching also comes with privacy and ethical implications. As with any computer vision technology that involves humans there is a chance that it can potentially be infringing on one’s privacy. The use of emotion score information beyond the scope of the work also presents problems. For example, general strain theory (GST) posits that strain or stressors increase the likelihood of negative emotions such as anger and frustration, which can lead to crime and delinquency [27]. If institutions use the data collected in the classroom to screen for potential criminals, it may not sit well with society. More so, laws such as the EU General Data Protection Regulation (GDPR) have a set a precedent of how data is processed within the public sector [28]. Care would need to be taken with regards to where and how information, such as students in a physical classroom, is being sent and used.

5.2 Insights

In the pilot study, there was also a residual impact when using the model within a classroom setting. Students were willing to participate in the pilot, because it may benefit their learning experience. The model brings about a certain amount of educator awareness that would normally only be seen in a seasoned educator. However, some aspects could be surprising even to a seasoned educator, such as students that maintain one emotional state may not be necessarily good for learning too or the surprise indicator may portray a relationship with attentiveness. However, the future adjustments as a result of prolonged negative sentiment do promote more interactivity on the educator’s part. The report model also confirms that student interaction does increase with more positive emotions in a classroom. Thereby showing there is value in using the model with less experienced educators that can not intuitively get a “feel” for the classroom.

5.3 Computer Science Education

Participatory teaching methods within the space of computer science education are on the rise. Evidence suggests there is a need for a shift in course design that addresses the unique methods of learning for each student [12]. The methods introduced thus far adjust content such as the type of programming language or the target problem to maximize learning [13,14,15], but this is not the only dimension that should be pursued for more personalized learning. There is room for deriving other student attributes, such as emotion or weak areas using technology that serve to assist the educator, especially in the sciences. Being able to quantify the extent to which student learning takes place is a promising value proposition and may be useful in the future.

Overall one can ask is there value in pursuing emotion recognition for computer science education or education as a whole. Although one has to be cognizant of the constraints experienced within this context, it still achieves the use case and it opens up a further avenue of research that may assist educational psychologists and educators alike in determining conducive conditions for student learning.

6 Conclusion

Changing the education landscape to include more participatory teaching methods to maximize student learning has been a challenge especially in the sciences. It is further complicated by the fact that certain students do not engage with certain participatory methods. An experienced educator can pick up any distance between these teaching methods and their students to facilitate adaptive teaching and learning.

Advances in the field of computer vision have shown potential in other domains, and an attractive inventory of methods have been identified, which warrant the investigation of using these technologies to achieve collaborative support-based adaptive teaching and learning for a physical classroom of students. By leveraging innovation within the field of computer vision, many application domains can benefit from insights derived in a scene to promote user effectiveness.

Although automation efforts within the fourth industrial revolution are typically not well received because it can lead to job loss, this study shows there is also potential in using the technology as an assistant mechanism for fields that require the “human touch”. The current and potential benefits cannot be ignored, as we endeavor to find the next generation of learning, which is aware of ideal conditions necessary for individual student learning.