skip to main content
survey

A Survey of Machine Learning Approaches for Student Dropout Prediction in Online Courses

Published: 28 May 2020 Publication History

Abstract

The recent diffusion of online education (both MOOCs and e-courses) has led to an increased economic and scientific interest in e-learning environments. As widely documented, online students have a much higher chance of dropping out than those attending conventional classrooms. It is of paramount interest for institutions, students, and faculty members to find more efficient methodologies to mitigate withdrawals. Following the rise of attention on the Student Dropout Prediction (SDP) problem, the literature has witnessed a significant increase in contributions to this subject. In this survey, we present an in-depth analysis of the state-of-the-art literature in the field of SDP, under the central perspective, but not exclusive, of machine learning predictive algorithms. Our main contributions are the following: (i) we propose a comprehensive hierarchical classification of existing literature that follows the workflow of design choices in the SDP; (ii) to facilitate the comparative analysis, we introduce a formal notation to describe in a uniform way the alternative dropout models investigated by the researchers in the field; (iii) we analyse some other relevant aspects to which the literature has given less attention, such as evaluation metrics, gathered data, and privacy concerns; (iv) we pay specific attention to deep sequential machine learning methods—recently proposed by some contributors—which represent one of the most effective solutions in this area. Overall, our survey provides novice readers who address these topics with practical guidance on design choices, as well as directs researchers to the most promising approaches, highlighting current limitations and open challenges in the field.

Supplementary Material

a57-prenkaj-suppl.pdf (prenkaj.zip)
Supplemental movie, appendix, image and software files for, A Survey of Machine Learning Approaches for Student Dropout Prediction in Online Courses

References

[1]
Qasem A. Al-Radaideh, Emad M. Al-Shawakfa, and Mustafa I. Al-Najjar. 2006. Mining student data using decision trees. In Proceedings of the International Arab Conference on Information Technology (ACIT’2006). 1--5.
[2]
Sattar Ameri, Mahtab J. Fard, Ratna B. Chinnam, and Chandan K. Reddy. 2016. Survival analysis based framework for early prediction of student dropouts. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 903--912.
[3]
Bussaba Amnueypornsakul, Suma Bhat, and Phakpoom Chinprutthiwong. 2014. Predicting attrition along the way: The UIUC model. In Proceedings of the EMNLP Workshop on Analysis of Large Scale Social Interaction in MOOCs. Association for Computational Linguistics, 55--59.
[4]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[5]
Behdad Bakhshinategh, Osmar R. Zaiane, Samira ElAtia, and Donald Ipperciel. 2018. Educational data mining applications and tasks: A survey of the last 10 years. Educ. Inf. Technol. 23, 1 (2018), 537--553.
[6]
V. K. Balakrishnan. 1997. Schaum’s Outline of Graph Theory: Including Hundreds of Solved Problems. McGraw Hill Professional, New York, NY.
[7]
Papia Bawa. 2016. Retention in online courses: Exploring issues and solutions—A literature review. Sage Open 6, 1 (2016), 2158244015621777.
[8]
Johannes Berens, Kerstin Schneider, Simon GÖrtz, Simon Oster, and Julian Burghoff. 2019. Early detection of students at risk - Predicting student dropouts using administrative student data from German Universities and machine learning methods. Journal of Educational Data Mining 11, 3 (2019), 1--41. http://doi.org/10.5281/zenodo.3594771
[9]
Leo Breiman. 2017. Classification and Regression Trees. Routledge, Abingdon, Oxfordshire, UK.
[10]
Peter J. Brockwell, Richard A. Davis, and Matthew V. Calder. 2002. Introduction to Time Series and Forecasting. Vol. 2. Springer, New York, NY.
[11]
Rebecca Brown, Collin Lynch, Yuan Wang, Michael Eagle, Jennifer Albert, Tiffany Barnes, Ryan Shaun Baker, Yoav Bergner, and Danielle S. McNamara. 2015. Communities of performance 8 communities of preference. In CEUR Workshop Proceedings, Vol. 1446. CEUR-WS.
[12]
Vicki Carter. 1996. Do media influence learning? Revisiting the debate in the context of distance education. Open Learn. J. Open, Dist. e-Learn. 11, 1 (1996), 31--40.
[13]
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2010. Semi-Supervised Learning. The MIT Press, Cambridge, MA.
[14]
Jing Chen, Jun Feng, Xia Sun, Nannan Wu, Zhengzheng Yang, and Sushing Chen. 2019. MOOC dropout prediction using a hybrid algorithm based on decision tree and extreme learning machine. Math. Probl. Eng. 2019 (2019). https://doi.org/10.1155/2019/8404653
[15]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 785--794.
[16]
Yujing Chen, Aditya Johri, and Huzefa Rangwala. 2018. Running out of STEM: A comparative study across STEM majors of college students at-risk of dropping out early. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge. ACM, 270--279.
[17]
David R. Cox. 1972. Regression models and life-tables. J. Roy. Statist. Soc. Series B (Methodol.) 34, 2 (1972), 187--202.
[18]
Fisnik Dalipi, Ali Shariq Imran, and Zenun Kastrati. 2018. MOOC dropout prediction using machine learning techniques: Review and research challenges. In Proceedings of the IEEE Global Engineering Education Conference (EDUCON’18). IEEE, 1007--1014.
[19]
Jesse Davis and Mark Goadrich. 2006. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, NY, 233--240.
[20]
Gerben W. Dekker, Mykola Pechenizkiy, and Jan M. Vleeshouwer. 2009. Predicting students drop out: A case study. In Proceedings of the International Conference on Educational Data Mining (EDM’09).
[21]
David P. Diaz. 2000. Comparison of Student Characteristics, and Evaluation of Student Success, in an Online Health Education Course. Ph.D. Dissertation. Nova Southeastern University.
[22]
Mucong Ding, Kai Yang, Dit-Yan Yeung, and Ting-Chuen Pong. 2018. Effective feature learning with unsupervised learning for improving the predictive models in massive open online courses. arXiv:1812.05044.
[23]
William Doherty. 2006. An analysis of multiple factors affecting retention in web-based community college courses. Internet High. Educ. 9, 4 (2006), 245--255.
[24]
Mi Fei and Dit-Yan Yeung. 2015. Temporal models for predicting student dropout in massive open online courses. In Proceedings of the IEEE International Conference on Data Mining Workshop (ICDMW’15). IEEE, 256--263.
[25]
Wenzheng Feng, Jie Tang, and Tracy Xiao Liu. 2019. Understanding dropouts in MOOCs. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’19).
[26]
Karen Frankola. 2001. Why online learners drop out. Workf. Costa Mesa 80, 10 (2001), 52--61. Retrieved from http://www.workforce.com/feature/00/07/29.
[27]
S. Hari Ganesh and A. Joy Christy. 2015. Applications of educational data mining: A survey. In Proceedings of the International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS’15). IEEE, 1--6.
[28]
Josh Gardner and Christopher Brooks. 2018. Student success prediction in MOOCs. User Model. User-Adapt. Interact. 28, 2 (2018), 127--203.
[29]
Elena Gaudioso, Miguel Montero, and Felix Hernandez-Del-Olmo. 2012. Supporting teachers in adaptive educational systems through predictive models: A proof of concept. Exp. Syst. Applic. 39, 1 (2012), 621--625.
[30]
Niki Gitinabard, Farzaneh Khoshnevisan, Collin F. Lynch, and Elle Yuan Wang. 2018. Your actions or your associates? Predicting certification and dropout in MOOCs with behavioral and social features. arXiv:1809.00052.
[31]
Cameron C. Gray and Dave Perkins. 2019. Utilizing early engagement and machine learning to predict student outcomes. Comput. Educ. 131 (2019), 22--32.
[32]
Liu Haiyang, Zhihai Wang, Phillip Benachour, and Philip Tubman. 2018. A time series classification method for behaviour-based dropout prediction. In Proceedings of the IEEE 18th International Conference on Advanced Learning Technologies (ICALT’18). IEEE, 191--195.
[33]
Jiazhen He, James Bailey, Benjamin I. P. Rubinstein, and Rui Zhang. 2015. Identifying at-risk students in massive open online courses. In Proceedings of the 29th AAAI Conference on Artificial Intelligence.
[34]
Michael Herbert. 2006. Staying the course: A study in online student satisfaction and retention. Online J. Dist. Learn. Admin. 9, 4 (2006), 300--317.
[35]
Erin Heyman. 2010. Overcoming student retention issues in higher education online programs. Online J. Dist. Learn. Admin. 13, 4 (2010).
[36]
Deng Houtao, Runger C. George, Tuv Eugene, and Martyanov Vladimir. 2013. A time series forest for classification and feature extraction. Inf. Sci. 239 (2013), 142--153.
[37]
Ya-Han Hu, Chia-Lun Lo, and Sheng-Pao Shih. 2014. Developing early warning systems to predict students’ online learning performance. Comput. Human Behav. 36 (2014), 469--478.
[38]
Gordon V. Kass. 1980. An exploratory technique for investigating large quantities of categorical data. J. Roy. Statist. Soc.: Series C (Appl. Statist.) 29, 2 (1980), 119--127.
[39]
Tom Kasuba. 1993. Simplified fuzzy ARTMAP. AI Expert 8, 11 (1993).
[40]
Usha Keshavamurthy and H. S. Guruprasad. 2014. Learning analytics: A survey. Int. J. Comput. Trends Technol. 18, 6 (2014).
[41]
Marius Kloft, Felix Stiehler, Zhilin Zheng, and Niels Pinkwart. 2014. Predicting MOOC dropout over weeks using machine learning methods. In Proceedings of the EMNLP Workshop on Analysis of Large Scale Social Interaction in MOOCs. 60--65.
[42]
Georgios Kostopoulos, Sotiris Kotsiantis, and Panagiotis Pintelas. 2015. Estimating student dropout in distance higher education using semi-supervised techniques. In Proceedings of the 19th Panhellenic Conference on Informatics. ACM, ACM, New York, NY, 38--43.
[43]
Sotiris Kotsiantis, Kiriakos Patriarcheas, and Michalis Xenos. 2010. A combinational incremental ensemble of classifiers as a technique for predicting students’ performance in distance education. Knowl.-based Syst. 23, 6 (2010), 529--535.
[44]
Sotiris Kotsiantis, Christos Pierrakeas, and Panagiotis Pintelas. 2003. Preventing student dropout in distance learning using machine learning techniques. In Proceedings of the International Conference on Knowledge-based and Intelligent Information and Engineering Systems. Springer, New York, NY, 267--274.
[45]
Sotiris Kotsiantis, Christos Pierrakeas, Ioannis Zaharakis, and Panagiotis Pintelas. 2003. Efficiency of Machine Learning Techniques in Predicting Students Performance in Distance Learning Systems. University of Patras Press, 297--306.
[46]
Zlatko J. Kovačić. 2010. Early prediction of student success: Mining student enrollment data. In Proceedings of the Informing Science 8 IT Education Conference. Citeseer.
[47]
George D. Kuh. 2009. The national survey of student engagement: Conceptual and empirical foundations. New Direct. Inst. Res. 2009 (12 2009), 5--20.
[48]
Anupama S. Kumar and M. N. Vijayalakshmi. 2012. Mining of student academic evaluation records in higher education. In Proceedings of the International Conference on Recent Advances in Computing and Software Systems. IEEE, 67--70.
[49]
Mukesh Kumar, A. J. Singh, and Disha Handa. 2017. Literature survey on educational dropout prediction. Int. J. Educ. Manag. Eng. 7, 2 (2017), 8.
[50]
Wentao Li, Min Gao, Hua Li, Qingyu Xiong, Junhao Wen, and Zhongfu Wu. 2016. Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’16). IEEE, 3130--3137.
[51]
Nick Littlestone and Manfred K. Warmuth. 1994. The weighted majority algorithm. Inf. Comput. 108, 2 (1994), 212--261.
[52]
Chu Kiong Loo and M. V. C. Rao. 2005. Accurate and reliable diagnosis and classification using probabilistic ensemble simplified fuzzy ARTMAP. IEEE Trans. Knowl. Data Eng. 17, 11 (2005), 1589--1593.
[53]
Ioanna Lykourentzou, Ioannis Giannoukos, Vassilis Nikolopoulos, George Mpardis, and Vassili Loumos. 2009. Dropout prediction in e-learning courses through the combination of machine learning techniques. Comput. Educ. 53, 3 (2009), 950--965.
[54]
Laci Mary Barbosa Manhães, Sérgio Manuel Serra da Cruz, and Geraldo Zimbrão. 2014. WAVE: An architecture for predicting dropout in undergraduate courses using EDM. In Proceedings of the 29th ACM Symposium on Applied Computing. ACM, New York, NY, 243--247.
[55]
Mary McHugh. 2012. Interrater reliability: The kappa statistic. Biochem. Med.: Časopis Hrvatskoga društva medicinskih biokemičara / HDMB 22 (10 2012), 276--282.
[56]
Othon Michail. 2016. An introduction to temporal graphs: An algorithmic perspective. Internet Math. 12, 4 (2016), 239--280.
[57]
Christoph Molnar. 2018. Interpretable machine learning. Retrieved from https://christophm.github.io/interpretable-ml-book.
[58]
Michael Morgan, Matthew Butler, Neena Thota, and Jane Sinclair. 2018. How CS academics view student engagement. In Proceedings of the 23rd ACM Conference on Innovation and Technology in Computer Science Education. ACM, New York, NY, 284--289.
[59]
Saurabh Nagrecha, John Z. Dillon, and Nitesh V. Chawla. 2017. MOOC dropout prediction: Lessons learned from making pipelines interpretable. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 351--359.
[60]
Alejandro Peña-Ayala. 2014. Educational data mining: A survey and a data mining-based analysis of recent works. Exp. Syst. Applic. 41, 4 (2014), 1432--1462.
[61]
Jiezhong Qiu, Jie Tang, Tracy Xiao Liu, Jie Gong, Chenhui Zhang, Qian Zhang, and Yufei Xue. 2016. Modeling and predicting learning behavior in MOOCs. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, New York, NY, 93--102.
[62]
Lin Qiu, Yanshen Liu, Quan Hu, and Yi Liu. 2019. Student dropout prediction in massive open online courses by convolutional neural networks. Soft Comput. 23 (2019), 10287--10301. https://doi.org/10.1007/s00500-018-3581-3
[63]
Ross J. Quinlan. 2014. C4.5: Programs for Machine Learning. Elsevier, New York, NY.
[64]
Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daume III, and Lise Getoor. 2014. Learning latent engagement patterns of students in online courses. In Proceedings of the 28th AAAI Conference on Artificial Intelligence.
[65]
Carly Robinson, Michael Yeomans, Justin Reich, Chris Hulleman, and Hunter Gehlbach. 2016. Forecasting student achievement in MOOCs with natural language processing. In Proceedings of the 6th International Conference on Learning Analytics 8 Knowledge. ACM, New York, NY, 383--387.
[66]
Carolyn Rose and George Siemens. 2014. Shared task on prediction of dropout over time in massively open online courses. In Proceedings of the EMNLP Workshop on Analysis of Large Scale Social Interaction in MOOCs. 39--41.
[67]
Belinda G. Smith. 2010. E-learning Technologies: A Comparative Study of Adult Learners Enrolled on Blended and Online Campuses Engaging in a Virtual Classroom. Ph.D. Dissertation. Capella University.
[68]
Dagim Solomon. 2018. Predicting performance and potential difficulties of university students using classification: Survey paper. Int. J. Pure Appl. Math. 118, 18 (2018), 2703--2707.
[69]
Denise E. Stanford-Bowers. 2008. Persistence in online classes: A study of perceptions among community college stakeholders. J. Online Learn. Teach. 4, 1 (2008), 37--50.
[70]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 3104--3112.
[71]
Colin Taylor, Kalyan Veeramachaneni, and Una-May O’Reilly. 2014. Likely to stop? Predicting stopout in massive open online courses. arXiv:1408.3382.
[72]
Wei Wang, Han Yu, and Chuyan Miao. 2017. Deep model for dropout prediction in MOOCs. In Proceedings of the 2nd International Conference on Crowd Science and Engineering. ACM, New York, NY, 26--32.
[73]
Pedro A. Willging and Scott D. Johnson. 2009. Factors that influence students’ decision to dropout of online courses. J. Asynch. Learn. Netw. 13, 3 (2009), 115--127.
[74]
Annika Wolff, Zdenek Zdrahal, Andriy Nikolov, and Michal Pantucek. 2013. Improving retention: Predicting at-risk students by analysing clicking behaviour in a virtual learning environment. In Proceedings of the 3rd International Conference on Learning Analytics and Knowledge. ACM, New York, NY, 145--149.
[75]
Michalis Xenos, Christos Pierrakeas, and Panagiotis Pintelas. 2002. A survey on student dropout rates and dropout causes concerning the students in the course of informatics of the Hellenic Open University. Comput. Educ. 39, 4 (2002), 361--377.
[76]
Diyi Yang, Tanmay Sinha, David Adamson, and Carolyn Penstein Rosé. 2013. Turn on, tune in, drop out: Anticipating student dropouts in massive open online courses. In Proceedings of the NIPS Data-driven Education Workshop, Vol. 11. Curran Associates, Inc., 14.
[77]
Eran Yukseltur and Fethi Ahmet Inan. 2006. Examining the factors affecting student dropout in an online learning environment. Turk. Online J. Dist. Educ. 7, 3 (2006), 76--88.
[78]
Zhi-Hua Zhou and Ming Li. 2005. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17, 11 (2005), 1529--1541.
[79]
Mengxiao Zhu, Yoav Bergner, Yan Zhan, Ryan Baker, Yuan Wang, and Luc Paquette. 2016. Longitudinal engagement, performance, and social connectivity: A MOOC case study using exponential random graph models. In Proceedings of the 6th International Conference on Learning Analytics 8 Knowledge. ACM, New York, NY, 223--230.

Cited By

View all
  • (2024)Achieving optimal trade-off for student dropout prediction with multi-objective reinforcement learningPeerJ Computer Science10.7717/peerj-cs.203410(e2034)Online publication date: 30-Apr-2024
  • (2024)An Analysis on English Teachers’ Effective Classroom Discourse and Its Interactive Model Innovation with the Assistance of Artificial IntelligenceApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-16579:1Online publication date: 5-Jul-2024
  • (2024)An Artificial Intelligence-Based Random Forest Model for Reducing Prescription Errors and Improving Patient SafetySSRN Electronic Journal10.2139/ssrn.4842105Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 53, Issue 3
May 2021
787 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3403423
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2020
Online AM: 07 May 2020
Accepted: 01 March 2020
Revised: 01 March 2020
Received: 01 December 2019
Published in CSUR Volume 53, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Student dropout prediction
  2. educational data mining
  3. learning analytics

Qualifiers

  • Survey
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)318
  • Downloads (Last 6 weeks)36
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Achieving optimal trade-off for student dropout prediction with multi-objective reinforcement learningPeerJ Computer Science10.7717/peerj-cs.203410(e2034)Online publication date: 30-Apr-2024
  • (2024)An Analysis on English Teachers’ Effective Classroom Discourse and Its Interactive Model Innovation with the Assistance of Artificial IntelligenceApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-16579:1Online publication date: 5-Jul-2024
  • (2024)An Artificial Intelligence-Based Random Forest Model for Reducing Prescription Errors and Improving Patient SafetySSRN Electronic Journal10.2139/ssrn.4842105Online publication date: 2024
  • (2024)Actions to combat student dropout in higher educationAvaliação: Revista da Avaliação da Educação Superior (Campinas)10.1590/1982-57652024v29id2890172629Online publication date: 2024
  • (2024)Ações de combate à evasão estudantil na educação superiorAvaliação: Revista da Avaliação da Educação Superior (Campinas)10.1590/1982-57652024v29id28901729Online publication date: 2024
  • (2024)Discovering Privacy Harms from Education Technology by Analyzing User ReviewsProceedings of the 23rd Workshop on Privacy in the Electronic Society10.1145/3689943.3695050(186-192)Online publication date: 20-Nov-2024
  • (2024)Contexts Matter but How? Course-Level Correlates of Performance and Fairness Shift in Predictive Model TransferProceedings of the 14th Learning Analytics and Knowledge Conference10.1145/3636555.3636936(713-724)Online publication date: 18-Mar-2024
  • (2024)Balancing Performance and Explainability in Academic Dropout PredictionIEEE Transactions on Learning Technologies10.1109/TLT.2024.342595917(2140-2153)Online publication date: 2024
  • (2024)Deep Learning-Based Method for Predicting Student Dropouts in MOOCs2024 7th International Conference on Machine Learning and Natural Language Processing (MLNLP)10.1109/MLNLP63328.2024.10800676(1-6)Online publication date: 18-Oct-2024
  • (2024)Analysis and Prediction Model of Learning Behavior in the Digital Transformation of Tertiary Education2024 International Conference on Language Technology and Digital Humanities (LTDH)10.1109/LTDH64262.2024.00044(183-190)Online publication date: 5-Jul-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media