Incorporating uncertainties in student response modeling by loss function regularization
Introduction
The field of deep learning has exploded in recent years, with the study of numerous architectures such as convolutional neural networks (CNN) for imaging and computer vision [1], variants of recurrent neural network (RNN) models [2], attention mechanisms [3], and memory networks [4]. Despite the progresses deep neural network based models have made, most researchers still treat these models as black boxes. Many have argued, including us, that understanding of a model and how the model makes decisions is as important as the performance. Can we trust these predictions? When a model is not certain about its predictions, we could offload decisions to human experts or more traditional algorithms. In this study, we address this problem by developing a loss function regularization term that works in a deep learning model that learns an uncertainty scale (or confidence level) for each prediction.
Deep learning models are primarily trained using Backpropagation [5] and once trained, the weights are fixed. Thus, we can think of these models are deterministic functions. For each input , there is always a corresponding output . But, what if we have an input that is far from the distribution of the training data. How do we interpret the output? A real world example is from a 2016 car accident that the assisted driving system confused the sky with the bright side of a trailer [6]. Directly targeting this problem is difficult. A more feasible way is to make the model learn some characteristics about the input data in a supervised or unsupervised manner. There are several works that tried to incorporate uncertainties into deep models [6], [7], [8]. They all tackle this problem using a Bayesian approach.
The Bayesian approach of deep learning, or Bayesian deep learning, assumes the weights of a deep neural network are not fixed, but, instead, sampled from a probability distribution (say Gaussian). The prediction is calculated by integrating over the posterior probability distribution. However, this integration is usually intractable, so approximation methods are often used. In this study, we consider two types of uncertainties: Epistemic uncertainty or model uncertainty, which is caused by the fact that there are numerous models that could have generated the dataset in hand and aleatoric uncertainty or input data dependent uncertainty, which is caused by noise in the observations or measurements. We can derive the epistemic uncertainty by sampling the weights of a network using Monte Carlo [6]. Aleatoric uncertainty can be derived by applying a noise distribution to the output of the network.
In this paper, we investigate these two kinds of uncertainties in the context of student response modeling, more specifically, knowledge tracing. We select knowledge tracing because it is possible to use a synthetic dataset to evaluate uncertainties in a quantitative way. Moreover, the two most popular domains in which these deep learning models are evaluated are image processing (computer vision) and natural language processing. We argue that knowledge tracing, as a core component of building intelligent tutoring systems, should receive more visibility by the machine learning community because it is an intriguing, complex problem with a diverse set of interested parties.
Knowledge tracing allows the inference of the mastery level of a skill for a student, thus is the key for building intelligent tutoring systems. Conventional statistic models include Bayesian Knowledge Tracing (BKT) [9] and Performance Factor Analysis (PFA) [10]. Deep Knowledge tracing (DKT) [11], which uses recurrent neural networks for modeling skills. Dynamic Key-Value Memory network (DKVMN) [12] which is based on memory network are also proposed in recent years. However, these deep learning based knowledge tracing models are difficult to explain and are not transparent. In this paper, we provide an uncertainty score for each prediction, thus helping to alleviate concerns caused by these black box models. We summarize our contributions as follows:
- •
We empirically show that using Monte Carlo alone is insufficient to model data variance in knowledge tracing applications.
- •
We show how to incorporate sensible uncertainties in deep knowledge tracing by explicitly regularizing the loss function. To the best of our knowledge, this is the first work that evaluates uncertainties in deep knowledge tracing.
- •
We evaluate our methodology on three different real datasets as well as in a more controlled way using a novel synthetic dataset. The results show that our model produces reasonable uncertainties as well as comparable results to existing DKT models. Using our synthetic data, we show that our uncertainty modeling follows more closely to the underlying generated uncertainties than other uncertainty modeling approaches.
Section snippets
Related work
In this section we will first talk about how to incorporate uncertainties into deep models. We will mainly focus on epistemic uncertainty and aleatoric uncertainty. Then we will briefly talk about the deep models used for knowledge tracing. Our modeling is built upon these foundational works.
Uncertainties for knowledge tracing
In the previous section we introduced two kinds of uncertainties: Epistemic and Aleatoric. We now demonstrate how to incorporate these uncertainties into a deep knowledge tracing model. In the context of knowledge tracing, one can think of aleatoric uncertainty as being caused by the difficult level of an item and corresponding guess-ability of an item. For example, items that are too easy or too difficult will have lower uncertainties because students are more likely to get them correct or
Datasets
We evaluate our methodology on several datasets and compare our model with the model proposed by Kendall et al. [6]. The specifics of each datasets are given in Table 1. We have removed students with less than three attempts.
Assistment skill builder 09–10 and 14–15: This is considered the benchmark dataset [24] for comparing knowledge tracing models. A student is considered to have mastered one skill when meeting some criterion, say correctly answered three problems related to that skill in a
Results
The goal of our study as mentioned above is to build a model capable of providing a reasonable confidence level for each prediction. If we interpret the output variances as a measure of confidence level, lower variance should signify high confidence. We will first empirically show why Monte Carlo sampling alone may not help learn the data dependent variance. Then, we will compare the aleatoric uncertainties and epistemic uncertainties generated using Kendall’s method and our proposed method.
Conclusion
In this study, we proposed a method for incorporating prediction uncertainties into deep neural network models in the context of knowledge tracing. We first empirically showed that Monte Carlo sampling alone may not help learn the data dependent variance. Next, we proposed our methodology by explicitly regularizing the loss function to incentivize expected behavior. We evaluated our methodology on three different real datasets. The results show that our model can provide comparable results as
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Xinyi Ding: Conceptualization, Methodology, Software, Writing - original draft. Eric C. Larson: Supervision, Methodology, Writing - review & editing.
Mr Ding is a Ph.D candidate in the Department of Computer Science at SMU. He got his master degree of Computer Science from University of Texas at Dallas in May 2013. His research interests include ubiquitous computing, machine learning and educational data mining.
References (27)
- et al.
Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks
Neurocomputing
(2019) - et al.
Pairing an arbitrary regressor with an artificial neural network estimating aleatoric uncertainty
Neurocomputing
(2019) - A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances...
- et al.
Long short-term memory
Neural Comput.
(1997) - A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you...
- J. Weston, S. Chopra, A. Bordes, Memory networks, arXiv preprint...
- et al.
Learning representations by back-propagating errors
Cogn. Model.
(1988) - A. Kendall, Y. Gal, What uncertainties do we need in bayesian deep learning for computer vision?, in: Advances in...
- Y. Gal, Uncertainty in deep learning (Ph.D. thesis), University of Cambridge,...
- A. Graves, Practical variational inference for neural networks, in: Advances in Neural Information Processing Systems,...
Knowledge tracing: modeling the acquisition of procedural knowledge
User Model. User-adapted Interaction
Cited by (12)
Global and local neural cognitive modeling for student performance prediction[Formula presented]
2024, Expert Systems with ApplicationsRTSformer: A Robust Toroidal Transformer with Spatiotemporal Features for Visual Tracking
2024, IEEE Transactions on Human-Machine SystemsA robust attention-enhanced network with transformer for visual tracking
2023, Multimedia Tools and ApplicationsRepformer: a robust shared-encoder dual-pipeline transformer for visual tracking
2023, Neural Computing and ApplicationsPrecise modeling of learning process based on multiple behavioral features for knowledge tracing
2023, Journal of Intelligent and Fuzzy SystemsTarget-Oriented Teaching Path Planning with Deep Reinforcement Learning for Cloud Computing-Assisted Instructions
2022, Applied Sciences (Switzerland)
Mr Ding is a Ph.D candidate in the Department of Computer Science at SMU. He got his master degree of Computer Science from University of Texas at Dallas in May 2013. His research interests include ubiquitous computing, machine learning and educational data mining.
Dr. Larson is an Associate Professor in the Department of Computer Science at SMU. He joined SMU in August 2013 after he received his PhD from the University of Washington. He is a fellow of the Hunt Institute for Engineering Humanity, member of the Darwin Deason Institute for Cybersecurity, Center for Global Health, and SMU AT&T Center for Virtualization. His research explores the interdisciplinary relationship of machine learning and signal/image processing with the fields of security, mobile health, education, chemistry, psycho-visual psychology, human-computer interaction, and ubiquitous computing.