Skip to main content
Log in

Student dropout prediction in massive open online courses by convolutional neural networks

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Massive open online courses (MOOCs) have given global learners access to quality educational resources, but the persistent high dropout rates problem has a serious impact on their educational effectiveness. Therefore, how to predict the dropout in MOOCs and make advance intervention is a hot topic in the research of MOOCs in recent years. Traditional methods rely on handcrafted features, the workload is heavy, and it is difficult to ensure the final prediction effect. In order to solve this problem, this paper proposes an end-to-end dropout prediction model based on convolutional neural networks to predict the student dropout problem in MOOCs and it integrates feature extraction and classification into a single framework, which transforms the original timestamp data according to different time windows and automatically extracts features to achieve better feature representation. Extensive experiments on a public dataset show that our approach can achieve results comparable to other dropout prediction methods on precision, recall, F1 score, and AUC score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://www.coursera.org/.

  2. https://www.edx.org/.

  3. https://www.moodle.org/.

References

  • Balakrishnan G, Coetzee D (2013) Predicting student retention in massive open online courses using hidden Markov models. Electrical Engineering and Computer Sciences, University of California at Berkeley

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Breslow L, Pritchard DE, DeBoer J, Stump GS, Ho AD, Seaton DT (2013) Studying learning in the worldwide classroom: research into edX’s first MOOC. Res Pract Assess 8:13–25

    Google Scholar 

  • Chaplot DS, Rhim E, Kim J (2015) Predicting Student attrition in MOOCs using sentiment analysis and neural networks. In Proceedings of the 2015 AIED workshop on intelligent support for learning in groups, pp 7–12

  • Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning, ACM, pp 233–240

  • DeBoer J, Stump GS, Seaton D, Breslow L (2013) Diversity in MOOC students backgrounds and behaviors in relationship to performance in 6.002 x. In: Proceedings of the sixth learning international networks consortium conference, vol 4

  • Demar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  Google Scholar 

  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  • Fei M, Yeung DY (2015) Temporal models for predicting student dropout in massive open online courses. In: 2015 IEEE international conference on data mining workshop (ICDMW). IEEE, pp 256–263

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  MathSciNet  MATH  Google Scholar 

  • Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  • Halawa S, Greene D, Mitchell J (2014) Dropout prediction in MOOCs using learner activity features. Exp Best Pract Around MOOCs 7:3–12

    Google Scholar 

  • He J, Bailey J, Rubinstein BI, Zhang R (2015) Identifying at-risk students in massive open online courses. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 1749–1755

  • Hone KS, Said GRE (2016) Exploring the factors affecting MOOC retention: a survey study. Comput Educ 98(Supplement C):157–168

    Article  Google Scholar 

  • Hung JL, Wang MC, Wang S, Abdelrasoul M, Li Y, He W (2017) Identifying at-risk students for early interventions—a time-series clustering approach. IEEE Trans Emerg Top Comput 5(1):45–55

    Article  Google Scholar 

  • Jiang S, Williams A, Schenke K, Warschauer M, O’dowd D (2014) Predicting MOOC performance with week 1 behavior. In: Proceedings of the 7th international conference on educational data mining

  • Jordan K (2014) Initial trends in enrolment and completion of massive open online courses. Int Rev Res Open Distance Learn 15(1):133–160

    Article  Google Scholar 

  • Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, pp 1725–1732

  • Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882 [cs]

  • Kloft M, Stiehler F, Zheng Z, Pinkwart N (2014) Predicting MOOC dropout over weeks using machine learning methods. In: Proceedings of the EMNLP 2014 workshop on analysis of large scale social interaction in MOOCs, pp 60–65

  • Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1097–1105

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  • Mrquez-Vera C, Cano A, Romero C, Ventura S (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38(3):315–330

    Article  Google Scholar 

  • Mrquez-Vera C, Cano A, Romero C, Noaman AYM, Mousa Fardoun H, Ventura S (2016) Early dropout prediction using data mining: a case study with high school students. Expert Syst 33(1):107–124

    Article  Google Scholar 

  • Murthy SK (1998) Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Discov 2(4):345–389

    Article  Google Scholar 

  • Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. Adv Neural Inf Process Syst 16:841–848

    Google Scholar 

  • Onah DFO, Sinclair JE, Boyatt R (2014) Dropout rates of massive open online courses: behavioural patterns. In: International conference on education and new learning technologies, pp 5825–5834

  • Ramesh A, Goldwasser D, Huang B, Daume III H, Getoor L (2014) Learning latent engagement patterns of students in online courses. In: Proceedings of the 28th AAAI conference on artificial intelligence. AAAI Press

  • Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747

  • Sainath TN, Kingsbury B, Saon G, Soltau H, Mohamed AR, Dahl G, Ramabhadran B (2015) Deep convolutional neural networks for large-scale speech tasks. Neural Netw 64:39

    Article  Google Scholar 

  • Sinha T, Jermann P, Li N, Dillenbourg P (2014) Your click decides your fate: inferring information processing and attrition behavior from MOOC video clickstream interactions. arXiv preprint arXiv:1407.7131

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Stein RM, Allione G (2014) Mass attrition: an analysis of drop out from a principles of microeconomics MOOC. Social Science Research Network, pp 1–19

  • Taylor C, Veeramachaneni K, O’Reilly UM (2014) Likely to stop? Predicting stopout in massive open online courses. arXiv preprint arXiv:1408.3382

  • Veeramachaneni K, Halawa S, Dernoncourt F, O’Reilly UM, Taylor C, Do C (2014) Moocdb: developing standards and systems to support MOOC data science. arXiv preprint arXiv:1406.2015

  • Wang Y (2013) Exploring possible reasons behind low student retention rates of massive online open courses: a comparative case study from a social cognitive perspective. In: Proceedings of the 1st workshop on massive open online courses at the 16th annual conference on artificial intelligence in education, p 58

  • Wang F, Chen L (2016) A nonlinear state space model for identifying at-risk students in open online courses. In: Proceedings of the 9th international conference on educational data mining, pp 527–532

  • Wen M, Yang D, Rose C (2014) Sentiment analysis in MOOC discussion forums: what does it tell us? In: Proceedings of educational data mining

  • Xing W, Chen X, Stein J, Marcinkowski M (2016) Temporal predication of dropouts in MOOCs: reaching the low hanging fruit through stacking generalization. Comput Human Behav 58(Supplement C):119–129

    Article  Google Scholar 

  • Yang D, Sinha T, Adamson D, Ros CP (2013) Turn on, tune in, drop out: anticipating student dropouts in massive open online courses. In: Proceedings of the 2013 NIPS data-driven education workshop, vol 11, p 14

  • Zheng Y, Liu Q, Chen E, Ge Y, Zhao JL (2014) Time series classification using multi-channels deep convolutional neural networks. Web-Age Information Management. Springer, Cham, (Lecture notes in computer science), pp 298–310

Download references

Acknowledgements

This work is supported by the National Social Science Fund of China for Young Project (13CYY037) and Educational Informatization Research Center of Hubei, Central China Normal University. We would like to gratefully acknowledge the organizers of KDD Cup 2015 as well as XuetangX for making the datasets available.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Qiu.

Ethics declarations

Conflict of interest

The Authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiu, L., Liu, Y., Hu, Q. et al. Student dropout prediction in massive open online courses by convolutional neural networks. Soft Comput 23, 10287–10301 (2019). https://doi.org/10.1007/s00500-018-3581-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3581-3

Keywords

Navigation