Elsevier

Neurocomputing

Volume 409, 7 October 2020, Pages 74-82
Neurocomputing

Incorporating uncertainties in student response modeling by loss function regularization

https://doi.org/10.1016/j.neucom.2020.05.035Get rights and content

Abstract

Most existing research works involving deep learning focus on performance improvement by developing new architectures or regularizers. However, in this paper we study the modeling of uncertainty in recurrent networks for the application of student response modeling, more specifically, knowledge tracing. Knowledge tracing is an application of time series machine learning. It consists of inferring the mastery level of a skill for a student as they navigate a question bank, thus adjusting curriculum for efficient learning. Deep Knowledge Tracing (DKT) takes the deep learning approach for knowledge tracing and has achieved better results than models like Bayesian Knowledge Tracing (BKT) and Performance Factor Analysis (PFA). However, the opaqueness of these deep knowledge tracing models also brings some criticisms. Providing an uncertainty score for each prediction helps mitigate this opaqueness. To investigate uncertainty modeling in DKT, we first examine a popular way of modeling data dependent uncertainties using Monte Carlo and show how it is insufficient to model variance in data. Second, we show how to incorporate sensible uncertainties by explicitly regularizing the cross entropy loss function. Third, we evaluate our method both in three different real datasets and in a more controlled way using synthetic data. Using synthetic data allows us to quantitatively understand the generated uncertainties. The results show that our method provides comparable results with standard deep knowledge tracing models as well as meaningful prediction uncertainties.

Introduction

The field of deep learning has exploded in recent years, with the study of numerous architectures such as convolutional neural networks (CNN) for imaging and computer vision [1], variants of recurrent neural network (RNN) models [2], attention mechanisms [3], and memory networks [4]. Despite the progresses deep neural network based models have made, most researchers still treat these models as black boxes. Many have argued, including us, that understanding of a model and how the model makes decisions is as important as the performance. Can we trust these predictions? When a model is not certain about its predictions, we could offload decisions to human experts or more traditional algorithms. In this study, we address this problem by developing a loss function regularization term that works in a deep learning model that learns an uncertainty scale (or confidence level) for each prediction.

Deep learning models are primarily trained using Backpropagation [5] and once trained, the weights are fixed. Thus, we can think of these models are deterministic functions. For each input xi, there is always a corresponding output yi. But, what if we have an input xi that is far from the distribution of the training data. How do we interpret the output? A real world example is from a 2016 car accident that the assisted driving system confused the sky with the bright side of a trailer [6]. Directly targeting this problem is difficult. A more feasible way is to make the model learn some characteristics about the input data in a supervised or unsupervised manner. There are several works that tried to incorporate uncertainties into deep models [6], [7], [8]. They all tackle this problem using a Bayesian approach.

The Bayesian approach of deep learning, or Bayesian deep learning, assumes the weights of a deep neural network are not fixed, but, instead, sampled from a probability distribution (say Gaussian). The prediction is calculated by integrating over the posterior probability distribution. However, this integration is usually intractable, so approximation methods are often used. In this study, we consider two types of uncertainties: Epistemic uncertainty or model uncertainty, which is caused by the fact that there are numerous models that could have generated the dataset in hand and aleatoric uncertainty or input data dependent uncertainty, which is caused by noise in the observations or measurements. We can derive the epistemic uncertainty by sampling the weights of a network using Monte Carlo [6]. Aleatoric uncertainty can be derived by applying a noise distribution to the output of the network.

In this paper, we investigate these two kinds of uncertainties in the context of student response modeling, more specifically, knowledge tracing. We select knowledge tracing because it is possible to use a synthetic dataset to evaluate uncertainties in a quantitative way. Moreover, the two most popular domains in which these deep learning models are evaluated are image processing (computer vision) and natural language processing. We argue that knowledge tracing, as a core component of building intelligent tutoring systems, should receive more visibility by the machine learning community because it is an intriguing, complex problem with a diverse set of interested parties.

Knowledge tracing allows the inference of the mastery level of a skill for a student, thus is the key for building intelligent tutoring systems. Conventional statistic models include Bayesian Knowledge Tracing (BKT) [9] and Performance Factor Analysis (PFA) [10]. Deep Knowledge tracing (DKT) [11], which uses recurrent neural networks for modeling skills. Dynamic Key-Value Memory network (DKVMN) [12] which is based on memory network are also proposed in recent years. However, these deep learning based knowledge tracing models are difficult to explain and are not transparent. In this paper, we provide an uncertainty score for each prediction, thus helping to alleviate concerns caused by these black box models. We summarize our contributions as follows:

  • We empirically show that using Monte Carlo alone is insufficient to model data variance in knowledge tracing applications.

  • We show how to incorporate sensible uncertainties in deep knowledge tracing by explicitly regularizing the loss function. To the best of our knowledge, this is the first work that evaluates uncertainties in deep knowledge tracing.

  • We evaluate our methodology on three different real datasets as well as in a more controlled way using a novel synthetic dataset. The results show that our model produces reasonable uncertainties as well as comparable results to existing DKT models. Using our synthetic data, we show that our uncertainty modeling follows more closely to the underlying generated uncertainties than other uncertainty modeling approaches.

Section snippets

Related work

In this section we will first talk about how to incorporate uncertainties into deep models. We will mainly focus on epistemic uncertainty and aleatoric uncertainty. Then we will briefly talk about the deep models used for knowledge tracing. Our modeling is built upon these foundational works.

Uncertainties for knowledge tracing

In the previous section we introduced two kinds of uncertainties: Epistemic and Aleatoric. We now demonstrate how to incorporate these uncertainties into a deep knowledge tracing model. In the context of knowledge tracing, one can think of aleatoric uncertainty as being caused by the difficult level of an item and corresponding guess-ability of an item. For example, items that are too easy or too difficult will have lower uncertainties because students are more likely to get them correct or

Datasets

We evaluate our methodology on several datasets and compare our model with the model proposed by Kendall et al. [6]. The specifics of each datasets are given in Table 1. We have removed students with less than three attempts.

Assistment skill builder 09–10 and 14–15: This is considered the benchmark dataset [24] for comparing knowledge tracing models. A student is considered to have mastered one skill when meeting some criterion, say correctly answered three problems related to that skill in a

Results

The goal of our study as mentioned above is to build a model capable of providing a reasonable confidence level for each prediction. If we interpret the output variances as a measure of confidence level, lower variance should signify high confidence. We will first empirically show why Monte Carlo sampling alone may not help learn the data dependent variance. Then, we will compare the aleatoric uncertainties and epistemic uncertainties generated using Kendall’s method and our proposed method.

Conclusion

In this study, we proposed a method for incorporating prediction uncertainties into deep neural network models in the context of knowledge tracing. We first empirically showed that Monte Carlo sampling alone may not help learn the data dependent variance. Next, we proposed our methodology by explicitly regularizing the loss function to incentivize expected behavior. We evaluated our methodology on three different real datasets. The results show that our model can provide comparable results as

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Xinyi Ding: Conceptualization, Methodology, Software, Writing - original draft. Eric C. Larson: Supervision, Methodology, Writing - review & editing.

Mr Ding is a Ph.D candidate in the Department of Computer Science at SMU. He got his master degree of Computer Science from University of Texas at Dallas in May 2013. His research interests include ubiquitous computing, machine learning and educational data mining

.

References (27)

  • A.T. Corbett et al.

    Knowledge tracing: modeling the acquisition of procedural knowledge

    User Model. User-adapted Interaction

    (1994)
  • P.I. Pavlik, Jr., H. Cen, K.R. Koedinger, Performance factors analysis–a new alternative to knowledge tracing. Online...
  • C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L.J. Guibas, J. Sohl-Dickstein, Deep knowledge tracing, in:...
  • Cited by (12)

    View all citing articles on Scopus

    Mr Ding is a Ph.D candidate in the Department of Computer Science at SMU. He got his master degree of Computer Science from University of Texas at Dallas in May 2013. His research interests include ubiquitous computing, machine learning and educational data mining

    .

    Dr. Larson is an Associate Professor in the Department of Computer Science at SMU. He joined SMU in August 2013 after he received his PhD from the University of Washington. He is a fellow of the Hunt Institute for Engineering Humanity, member of the Darwin Deason Institute for Cybersecurity, Center for Global Health, and SMU AT&T Center for Virtualization. His research explores the interdisciplinary relationship of machine learning and signal/image processing with the fields of security, mobile health, education, chemistry, psycho-visual psychology, human-computer interaction, and ubiquitous computing

    .

    View full text