Incorporating uncertainties in student response modeling by loss function regularization

doi:10.1016/j.neucom.2020.05.035

Neurocomputing

Volume 409, 7 October 2020, Pages 74-82

https://doi.org/10.1016/j.neucom.2020.05.035 Get rights and content

Abstract

Most existing research works involving deep learning focus on performance improvement by developing new architectures or regularizers. However, in this paper we study the modeling of uncertainty in recurrent networks for the application of student response modeling, more specifically, knowledge tracing. Knowledge tracing is an application of time series machine learning. It consists of inferring the mastery level of a skill for a student as they navigate a question bank, thus adjusting curriculum for efficient learning. Deep Knowledge Tracing (DKT) takes the deep learning approach for knowledge tracing and has achieved better results than models like Bayesian Knowledge Tracing (BKT) and Performance Factor Analysis (PFA). However, the opaqueness of these deep knowledge tracing models also brings some criticisms. Providing an uncertainty score for each prediction helps mitigate this opaqueness. To investigate uncertainty modeling in DKT, we first examine a popular way of modeling data dependent uncertainties using Monte Carlo and show how it is insufficient to model variance in data. Second, we show how to incorporate sensible uncertainties by explicitly regularizing the cross entropy loss function. Third, we evaluate our method both in three different real datasets and in a more controlled way using synthetic data. Using synthetic data allows us to quantitatively understand the generated uncertainties. The results show that our method provides comparable results with standard deep knowledge tracing models as well as meaningful prediction uncertainties.

Introduction

The field of deep learning has exploded in recent years, with the study of numerous architectures such as convolutional neural networks (CNN) for imaging and computer vision [1], variants of recurrent neural network (RNN) models [2], attention mechanisms [3], and memory networks [4]. Despite the progresses deep neural network based models have made, most researchers still treat these models as black boxes. Many have argued, including us, that understanding of a model and how the model makes decisions is as important as the performance. Can we trust these predictions? When a model is not certain about its predictions, we could offload decisions to human experts or more traditional algorithms. In this study, we address this problem by developing a loss function regularization term that works in a deep learning model that learns an uncertainty scale (or confidence level) for each prediction.

Deep learning models are primarily trained using Backpropagation [5] and once trained, the weights are fixed. Thus, we can think of these models are deterministic functions. For each input $x_{i}$ , there is always a corresponding output $y_{i}$ . But, what if we have an input $x_{i}$ that is far from the distribution of the training data. How do we interpret the output? A real world example is from a 2016 car accident that the assisted driving system confused the sky with the bright side of a trailer [6]. Directly targeting this problem is difficult. A more feasible way is to make the model learn some characteristics about the input data in a supervised or unsupervised manner. There are several works that tried to incorporate uncertainties into deep models [6], [7], [8]. They all tackle this problem using a Bayesian approach.

The Bayesian approach of deep learning, or Bayesian deep learning, assumes the weights of a deep neural network are not fixed, but, instead, sampled from a probability distribution (say Gaussian). The prediction is calculated by integrating over the posterior probability distribution. However, this integration is usually intractable, so approximation methods are often used. In this study, we consider two types of uncertainties: Epistemic uncertainty or model uncertainty, which is caused by the fact that there are numerous models that could have generated the dataset in hand and aleatoric uncertainty or input data dependent uncertainty, which is caused by noise in the observations or measurements. We can derive the epistemic uncertainty by sampling the weights of a network using Monte Carlo [6]. Aleatoric uncertainty can be derived by applying a noise distribution to the output of the network.

In this paper, we investigate these two kinds of uncertainties in the context of student response modeling, more specifically, knowledge tracing. We select knowledge tracing because it is possible to use a synthetic dataset to evaluate uncertainties in a quantitative way. Moreover, the two most popular domains in which these deep learning models are evaluated are image processing (computer vision) and natural language processing. We argue that knowledge tracing, as a core component of building intelligent tutoring systems, should receive more visibility by the machine learning community because it is an intriguing, complex problem with a diverse set of interested parties.

Knowledge tracing allows the inference of the mastery level of a skill for a student, thus is the key for building intelligent tutoring systems. Conventional statistic models include Bayesian Knowledge Tracing (BKT) [9] and Performance Factor Analysis (PFA) [10]. Deep Knowledge tracing (DKT) [11], which uses recurrent neural networks for modeling skills. Dynamic Key-Value Memory network (DKVMN) [12] which is based on memory network are also proposed in recent years. However, these deep learning based knowledge tracing models are difficult to explain and are not transparent. In this paper, we provide an uncertainty score for each prediction, thus helping to alleviate concerns caused by these black box models. We summarize our contributions as follows:

•
We empirically show that using Monte Carlo alone is insufficient to model data variance in knowledge tracing applications.
•
We show how to incorporate sensible uncertainties in deep knowledge tracing by explicitly regularizing the loss function. To the best of our knowledge, this is the first work that evaluates uncertainties in deep knowledge tracing.
•
We evaluate our methodology on three different real datasets as well as in a more controlled way using a novel synthetic dataset. The results show that our model produces reasonable uncertainties as well as comparable results to existing DKT models. Using our synthetic data, we show that our uncertainty modeling follows more closely to the underlying generated uncertainties than other uncertainty modeling approaches.

Section snippets

Related work

In this section we will first talk about how to incorporate uncertainties into deep models. We will mainly focus on epistemic uncertainty and aleatoric uncertainty. Then we will briefly talk about the deep models used for knowledge tracing. Our modeling is built upon these foundational works.

Uncertainties for knowledge tracing

In the previous section we introduced two kinds of uncertainties: Epistemic and Aleatoric. We now demonstrate how to incorporate these uncertainties into a deep knowledge tracing model. In the context of knowledge tracing, one can think of aleatoric uncertainty as being caused by the difficult level of an item and corresponding guess-ability of an item. For example, items that are too easy or too difficult will have lower uncertainties because students are more likely to get them correct or

Datasets

We evaluate our methodology on several datasets and compare our model with the model proposed by Kendall et al. [6]. The specifics of each datasets are given in Table 1. We have removed students with less than three attempts.

Assistment skill builder 09–10 and 14–15: This is considered the benchmark dataset [24] for comparing knowledge tracing models. A student is considered to have mastered one skill when meeting some criterion, say correctly answered three problems related to that skill in a

Results

The goal of our study as mentioned above is to build a model capable of providing a reasonable confidence level for each prediction. If we interpret the output variances as a measure of confidence level, lower variance should signify high confidence. We will first empirically show why Monte Carlo sampling alone may not help learn the data dependent variance. Then, we will compare the aleatoric uncertainties and epistemic uncertainties generated using Kendall’s method and our proposed method.

Conclusion

In this study, we proposed a method for incorporating prediction uncertainties into deep neural network models in the context of knowledge tracing. We first empirically showed that Monte Carlo sampling alone may not help learn the data dependent variance. Next, we proposed our methodology by explicitly regularizing the loss function to incentivize expected behavior. We evaluated our methodology on three different real datasets. The results show that our model can provide comparable results as

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Xinyi Ding: Conceptualization, Methodology, Software, Writing - original draft. Eric C. Larson: Supervision, Methodology, Writing - review & editing.

Mr Ding is a Ph.D candidate in the Department of Computer Science at SMU. He got his master degree of Computer Science from University of Texas at Dallas in May 2013. His research interests include ubiquitous computing, machine learning and educational data mining

References (27)

G. Wang et al.
Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks
Neurocomputing
(2019)
P. Gurevich et al.
Pairing an arbitrary regressor with an artificial neural network estimating aleatoric uncertainty
Neurocomputing
(2019)
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances...
S. Hochreiter et al.
Long short-term memory
Neural Comput.
(1997)
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you...
J. Weston, S. Chopra, A. Bordes, Memory networks, arXiv preprint...
D.E. Rumelhart et al.
Learning representations by back-propagating errors
Cogn. Model.
(1988)
A. Kendall, Y. Gal, What uncertainties do we need in bayesian deep learning for computer vision?, in: Advances in...
Y. Gal, Uncertainty in deep learning (Ph.D. thesis), University of Cambridge,...
A. Graves, Practical variational inference for neural networks, in: Advances in Neural Information Processing Systems,...

A.T. Corbett et al.

Knowledge tracing: modeling the acquisition of procedural knowledge

User Model. User-adapted Interaction

(1994)

P.I. Pavlik, Jr., H. Cen, K.R. Koedinger, Performance factors analysis–a new alternative to knowledge tracing. Online...

C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L.J. Guibas, J. Sohl-Dickstein, Deep knowledge tracing, in:...

Cited by (12)

Global and local neural cognitive modeling for student performance prediction[Formula presented]
2024, Expert Systems with Applications
Student performance prediction is a critical task in supporting decision-making for Intelligent Tutoring Systems (ITS). Correct predictions of student performance are prerequisites for ITS to supply intelligent services and optimize learning efficiency, e.g., recommending the most appropriate learning resources for each student. Existing methods mainly include cognitive diagnosis and knowledge tracing, both of which focus on students’ cognitive modeling based on their interactions on a sequence of items and give predictions by assessing if their cognitive states can meet the item requirements. Specifically, cognitive diagnosis only considers students’ global static cognitive states, while knowledge tracing focuses on students’ local dynamics. However, both global and local features are critical for predicting student performance. Therefore, in this paper, we propose a novel Global and Local Neural Cognitive (GLNC) model to capture both global and local features in student-item interactions for more accurate predictions. Specifically, we first learn students’ global cognitive level according to all student-item interactions. Then, we propose a self-attentive encoder based on the scaled dot-product attention mechanism to extract the local cognitive dynamics and the dependencies between students’ recent interactions. Finally, to obtain better predictions, we design a fused gate based on the similarity between students’ recently responded items and the item to be predicted to adaptively combine the global and local features. To evaluate the effectiveness of GLNC, we compare it with both cognitive diagnosis and knowledge tracing methods. All experiments are conducted on three public datasets that contain real student-item interactions on mathematics or language courses from various ITS. The experimental results demonstrate that GLNC achieves an average score of 0.7810 on the AUC metric, 0.7627 on the ACC metric, 0.3987 on the RMSE metric, 0.2023 on the $r^{2}$ metric, respectively achieving an average improvement of 1.84%, 1.07%, 1.87%, and 11.38% in contrast to existing state-of-the-art methods (i.e., NCD and LPKT). Moreover, we further analyze the performance of GLNC under different probabilities of guessing and slipping, the results indicate that GLNC is more robust against the influence of noisy data while considering both global and local features. Benefiting from the superior accuracy and stability, our proposed GLNC has a wide range of potential implications for ITS, which can be easily applied to improve students’ learning efficiency and experience.
RTSformer: A Robust Toroidal Transformer with Spatiotemporal Features for Visual Tracking
2024, IEEE Transactions on Human-Machine Systems
A robust attention-enhanced network with transformer for visual tracking
2023, Multimedia Tools and Applications
Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking
2023, Neural Computing and Applications
Precise modeling of learning process based on multiple behavioral features for knowledge tracing
2023, Journal of Intelligent and Fuzzy Systems
Target-Oriented Teaching Path Planning with Deep Reinforcement Learning for Cloud Computing-Assisted Instructions
2022, Applied Sciences (Switzerland)

View all citing articles on Scopus

Dr. Larson is an Associate Professor in the Department of Computer Science at SMU. He joined SMU in August 2013 after he received his PhD from the University of Washington. He is a fellow of the Hunt Institute for Engineering Humanity, member of the Darwin Deason Institute for Cybersecurity, Center for Global Health, and SMU AT&T Center for Virtualization. His research explores the interdisciplinary relationship of machine learning and signal/image processing with the fields of security, mobile health, education, chemistry, psycho-visual psychology, human-computer interaction, and ubiquitous computing

View full text

Incorporating uncertainties in student response modeling by loss function regularization

Abstract

Introduction

Section snippets

Related work

Uncertainties for knowledge tracing

Datasets

Results

Conclusion

Declaration of Competing Interest

CRediT authorship contribution statement

Neurocomputing

Neurocomputing

Long short-term memory

Neural Comput.

Learning representations by back-propagating errors

Cogn. Model.

Knowledge tracing: modeling the acquisition of procedural knowledge

User Model. User-adapted Interaction