Elsevier

Computers in Human Behavior

Volume 58, May 2016, Pages 119-129
Computers in Human Behavior

Full length article
Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization

https://doi.org/10.1016/j.chb.2015.12.007Get rights and content

Highlights

  • Propose a temporal modeling approach for students' dropout behavior in MOOCs.

  • Demonstrate the advantage of appended feature modeling space based on PCA over a summed features modeling space.

  • Explore the power of the ensemble learning method (stacking generalization) in enhancing the prediction ability.

Abstract

Massive open online courses (MOOCs) have recently taken center stage in discussions surrounding online education, both in terms of their potential as well as their high dropout rates. The high attrition rates associated with MOOCs have often been described in terms of a scale-efficacy tradeoff. Building from the large numbers associated with MOOCs and the ability to track individual student performance, this study takes an initial step towards a mechanism for the early and accurate identification of students at risk for dropping out. Focusing on struggling students who remain active in course discussion forums and who are already more likely to finish a course, we design a temporal modeling approach, one which prioritizes the at-risk students in order of their likelihood to drop out of a course. In identifying only a small subset of at-risk students, we seek to provide systematic insight for instructors so they may better provide targeted support for those students most in need of intervention. Moreover, we proffer appending historical features to the current week of features for model building and to introduce principle component analysis in order to identify the breakpoint for turning off the features of previous weeks. This appended modeling method is shown to outperform simpler temporal models which simply sum features. To deal with the kind of data variability presented by MOOCs, this study illustrates the effectiveness of an ensemble stacking generalization approach to build more robust and accurate prediction models than the direct application of base learners.

Introduction

Online education is one of the fastest growing segments in education with one particular form of it – massive open online courses (MOOCs) – recently taking center stage in discussions of its future. A MOOC is usually “massive, with theoretically no limit to enrollment; open, allowing anyone to participate, usually at no cost; online, with learning activities typically taking place over the web; and a course, structured around a set of learning goals in a defined area of study” (Educause, 2013, p.1). These courses are most often offered through platforms such as Coursera, edX, and Udacity, which support teachers as they deploy courses that may scale up to hundreds or even thousands of students. Growing out of the Open Educational Resources (OER) movement, MOOCs are gaining popularity in large measure because they provide a specific means in order to achieve more equitable access to learning. So far, however, there is little evidence that the potential imagined for MOOCs is being realized. High attrition rates of MOOCs (ranges from 91% to 93%) have often been highlighted as a scale-efficacy tradeoff (Onah, Sinclair, & Boyatt, 2014).

While MOOCs demonstrate the potential of using the Internet to make education available to a broader base, the large number of students who enroll in (and drop out of) each course raise methodological difficulties for instructors as they work to identify academically at-risk students and provide in-time interventions. To some extent, the scaling up of learning in MOOCs can be considered as a sacrifice of pedagogical support (Brinton et al., 2013). It is almost impossible to offer the same quality of support in a class of five thousand as in a class of fifty due to the difficulty in collecting and analyzing data from such a large number of students. This situation may become worse since traditional educational researchers and practitioners have been using methods such as surveys, interviews, focus groups, and observations for data collection, methods which are time consuming and limited when conducted at scale (33, Xing et al., 2015b). Further, such methods are unable to support timely interventions for at-risk students, which, given their high dropout rates, is a central concern for MOOCs. The emerging fields of learning analytics and educational data mining (Siemens and Baker, 2012, Xing et al., 2014, Xing et al., 2015c) seem to offer promise in solving the dropout problems in MOOCs. In particular, learning analytics techniques and educational data mining enable analyzing the low-level trace data regarding students' interactions with a course and with other students (Xing and Goggins, 2015, Chen et al., 2015). From this kind of low-level structured data, it is possible to automatically infer higher level student behavior (e.g. dropout) in order to inform educational decision-making (e.g. intervention). The automatic nature of methods based on learning analytics and educational data mining have the potential to meet the challenge of large scale in MOOCs while at the same time also satisfying the requirement for being able to support timely interventions.

However, existing studies which employ click stream data to examine student dropout in MOOCs have mostly focused on summative measures of attrition, overlooking the temporal requirement for designing and implementing intervention. By performing a correlation analysis between the course completion and trace data evidence of engagement in the course, many studies have attempted to identify factors that predict the completion of the MOOC course (e.g. Alraimi et al., 2015, Yang et al., 2013). Similarly, preliminary prediction research applies all-time trace data to identify which students may dropout or not. Unfortunately, these studies are unable to meet the requirement that interventions be able to be implemented early enough in a course that they are effective (Halawa, Greene, & Mitchell, 2014). By the same token, the massiveness of the number of students dropping out in a course renders prediction methods depending only on the first week or only certain points of time less effective. Even though it can detect whether students are at risk of dropping out in a timely manner (Jiang et al., 2014), this model is unable to predict exactly when the student is dropping out. That is, while thousands of students may be flagged as being at-risk after the first week, such methods are unable to indicate which ones are in danger of dropping out after only the first week or which ones will still remain active after two, three, or four weeks, only to eventually drop out of the course. Such a method, while effective at predicting all students who may eventually drop out of a course, does not support teachers in identifying those students in need of immediate intervention. In this, constructing a temporal prediction model is critical as such a model would be able to place these at-risk students in a chronological order of when they are most at risk of dropping out so that teachers can provide timely intervention to the students most at risk at any given time.

Addressing the temporal features of a prediction model is significant not only because it allows for early detection of dropout students but also because of the gradual nature of attrition in MOOCs. This gradual attrition is especially the case for students who participate in course discussion forums (Yang et al., 2013). Although a major portion of participants dropping out either never engage in course activities at all or drop out after the first week, a large fraction of participants remain in the course for several weeks only to drop out later, with such a pattern suggesting a struggle to stay involved. Such struggle to stay involved is seen most directly in those students participating in the online discussion forums. These students, taking part in a massive community of strangers, lack the kinds of shared practices that help to form supportive bonds of interaction and are easily overwhelmed by the volume of discussion present in the forums (Rosé et al., 2014). As such forums are a key aspect of MOOC platforms, students involved with the online discussion boards are more prone to stay through the course. As such, students struggling with a course and yet still engaging with the discussion forums represent low hanging fruit which may be able to be targeted in order to enhance the success rate of a MOOC (Yang et al., 2013). Given this, we propose a method for predicting the gradual falling away from participation in a course which focuses in large part on forum participants.

This kind of prediction, from a learning analytics and educational data mining point of view, is usually realized by supervised machine learning algorithms (Goggins, Xing, Chen, Chen, & Wadholm, 2015). In order to forecast students' dropout, a training set of previously labeled (dropout or not dropout) data instances is used to guide the learning process, while another set of labeled instances named the “test set”, is applied to measure the quality of the obtained model (Xing, Guo, Petakovic, & Goggins, 2015). Previous studies have applied different algorithms (e.g. logistic regression, Naïve Bayes, decision tree, etc.) to perform the prediction. However, a simple application of these machine learning algorithms directly to the MOOC data may not adequately respond to the unique characteristics of the data generated in MOOC learning platforms. Due to their openness, the data generated in MOOCs can vary significantly overtime, with the number of students dropping out or completing the course differing substantially from course to course (Brinton et al., 2013). As a result of these fluctuations and variability in the data, the performance of these algorithms can be significantly altered. Because of this, a more reliable predictive modeling mechanism is needed.

Due to the large number of dropouts from the course, the gradual manner in which they fall away from a course, and data variability of MOOCs, this work proposes a temporal prediction model using ensemble machine learning methods that aims to accurately and reliably identify struggling students in MOOCs in advance so that teachers can provide timely and quality pedagogical support to harvest these low hanging fruit of MOOC forum participants and keep them engaged in the course. Specifically, to address the immenseness and graduality of attrition, we design a temporal modeling method, through which we predict who is going to drop out next week. In other words, instead of using all the data to identify all the students at risk of dropping out, the model is able to specifically determine students at-risk of dropping out for the following week using data collected from previous weeks. In only calling attention to those students at risk of dropping out in the coming week, this temporal modeling mechanism enables teachers to focus on only that small group of students in immediate danger instead of being faced with an overwhelming number of all the students who may drop out at some point in a course. With this, the teacher can deliver greater support to a smaller number of at-risk students each week. In terms of model building, instead of simply summing features together over weeks, an appended features mechanism based on principle component analysis (PCA) is used to expand the feature space. With regard to the performance of machine learning algorithms when confronting the fluctuating dataset, this study proffers the ensemble approach – stacking generalization – to increase the prediction stability and performance.

The overall research question for this project is “How and to what extent can we build a prediction model that can accurately and reliably identify struggling students in MOOC forums in advance so that teachers can provide timely and quality pedagogical support to them?” Two specific research questions are raised based on the overall goal:

  • 1)

    How can we synthesize the features for temporal model construction to improve the prediction performance?

  • 2)

    How can we employ stacking generalization to improve the temporal prediction performance?

The major research goals of this study are 1) to experiment and demonstrate a temporal modeling approach for students' dropout behavior; 2) to show the advantage of appended feature modeling space based on PCA over a summed features modeling space; and 3) to explore the power of the ensemble learning method (stacking generalization) in enhancing the prediction ability. The rest of the paper is organized as follows: Section 2 presents the related studies and background information. Section 3 describes the data and context of the study. Section 4 describes the research methodology. Section 5 presents the experimental results and analysis. Section 6 discusses the results, and Section 7 concludes this study.

Section snippets

Dropout in MOOCs

In spite of their momentum, student retention remains a serious problem for MOOCs since, due to their open nature, enrollment is open to the general public and consequences for failure in a course are minimal to none. This openness results in a large portion of students registering for the course without ever actually participating in it and students continuously dropping out at virtually every point during the course (Yang et al., 2013). Even though the problem of students registering and then

Context

For this study we focused on a project management course launched in August, 2014, and hosted by Canvas. The course lasted eight weeks with 11 modules and 3617 registered students. Except for the first 4 modules which all took place in the first week, each module lasted roughly one week. The end of each module was in most cases accompanied by spaces for online discussion and quizzes. In total, there were 14 discussion forums and 12 multiple-choice quizzes. Due to the number of students

Preprocess

For each week of the course (n=1,2,...,8), the dropout label for each student Sn, active in the current week, is calculated based on examining whether there is any activity from the student in the immediate next week and beyond. This generates the label vectors yn{0,1}sn, where 0 indicates dropout and 1 indicates active. These label vectors are then associated with the weekly feature vectors. Specifically, [number of discussion post, number of forum views, number of quiz views, number of

Data transformation and PCA analysis

After performing the logarithmic transformation for the feature sets on both the summed and weekly features, the subsequent box plots in Fig. 6 shows that it generates fairly non-skewed distributions and the outliers are greatly reduced. Fig. 6 (a) shows the results for summed features of Week 5 and Fig. 6 (b) shows the result for features of Week 7.

To identify the breakpoint for appending the historical features for the temporal modeling, PCA is implemented to investigate the separability of

Discussion

Despite the popularity of MOOCs, completion rates for the classes are dismal in comparison with traditional online education (Brinton et al., 2013). Because of the unique characteristics of MOOCs – massive number of enrolled students, students dropping out literally at every point of the course, and the huge variability in the data produced – they raise difficult methodological questions for educational researchers and teachers as they work to provide timely and quality support for at-risk

Conclusion

This study takes an initial step toward the early and accurate identification of students at-risk of dropping out of a MOOC. Specifically, this work has proposed to focus on those students already demonstrating some engagement with a course through their participation in the discussion forums. By designing a temporal modeling approach which prioritizes at-risk students according to when they are predicted to drop out of a course, we provide a mechanism by which instructors can deal with only a

Acknowledgments

This work is supported by Instructure of Canvas. There is no potential conflict of interest for the work reported in this paper.

Wanli Xing is an Assistant Professor in the Department of Educational Psychology and Leadership, Texas Tech University, USA with background in statistics, computer science and mathematical modeling. His research interests are educational data mining, learning analytics, and CSCL.

References (40)

  • J. Cheng et al.

    Learning bayesian belief network classifiers: algorithms and system

  • T.G. Dietterich

    Machine learning research: four current directions

    AI Magazine

    (1997)
  • Educause

    Seven things you should know about MOOCs II. Educause learning initiative

    (2013)
  • S. Goggins et al.

    Learning analytics at“ small” scale: exploring a complexity-grounded model for assessment automation

    Journal of Universal Computer Science

    (2015)
  • C. Gütl et al.

    Attrition in MOOC: lessons learned from drop-out students

  • S. Halawa et al.

    Dropout prediction in MOOCs using learner activity features

  • D.J. Hand

    Measuring classifier performance: a coherent alternative to the area under the ROC curve

    Machine learning

    (2009)
  • F.V. Jensen
    (1996)
  • S. Jiang et al.

    Predicting MOOC performance with week 1 behavior

  • I. Jolliffe

    Principal component analysis

    (2002)
  • Cited by (218)

    View all citing articles on Scopus

    Wanli Xing is an Assistant Professor in the Department of Educational Psychology and Leadership, Texas Tech University, USA with background in statistics, computer science and mathematical modeling. His research interests are educational data mining, learning analytics, and CSCL.

    Xin Chen is a PhD candidate in the School of Engineering Education, Purdue University. Her research blends Social Media Data Mining and Visualization, Web Development, and User Experience Research & Design. She received a BS in Electrical Engineering from East China Normal University.

    Jared Stein is the Vice President for research and education in Instructure for Canvas. He has worked in the field of technology enhanced education for more than 15 years and served as the Director of the Innovation Center in Utah Valley University.

    Michael Marcinkowski is a PhD candidate in the College of Information Sciences and Technology at Penn State University. He is involved with socio-technical research pertaining to design and human–computer interaction. His main interest is in hermeneutics and the uses of empirical data in design. He is currently studying the design of online education as it exists within larger social and cultural systems.

    View full text