A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs

Youssef, Mourdi; Mohammed, Sadgal; Hamada, El Kabtane; Wafaa, Berrada Fathi

doi:10.1007/s10639-019-09934-y

A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs

Published: 26 June 2019

Volume 24, pages 3591–3618, (2019)
Cite this article

Education and Information Technologies Aims and scope Submit manuscript

Mourdi Youssef ORCID: orcid.org/0000-0003-0999-6388¹,
Sadgal Mohammed¹,
El Kabtane Hamada¹ &
…
Berrada Fathi Wafaa¹

1214 Accesses
Explore all metrics

Abstract

MOOCs are becoming more and more involved in the pedagogical experimentation of universities whose infrastructure does not respond to the growing mass of learners. These universities aim to complete their initial training with distance learning courses. Unfortunately, the efforts made to succeed in this pedagogical model are facing a dropout rate of enrolled learners reaching 90% in some cases. This makes the coaching, the group formation of learners, and the instructor/learner interaction challenging. It is within this context that this research aims to propose a predictive model allowing to classify the MOOCs learners into three classes: the learners at risk of dropping out, those who are likely to fail and those who are on the road to success. An automatic determination of relevant attributes for analysis, classification, interpretation and prediction from MOOC learners data, will allow instructors to streamline interventions for each class. To meet this purpose, we present an approach based on feature selection methods and ensemble machine learning algorithms. The proposed model was tested on a dataset of over 5,500 learners in two Stanford University MOOCs courses. In order to attest its performance (98.6%), a comparison was carried out based on several performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Early prediction of Student academic performance based on Machine Learning algorithms: A case study of bachelor’s degree students in KSA

Article 13 December 2023

A Systematic Approach to Transform Machine Learning Students’ Performance Prediction Model into Preventive Procedures

Students’ Performance Prediction Using Feature Selection and Supervised Machine Learning Algorithms

Discover the latest articles, news and stories from top researchers in related subjects.

Digital Education and Educational Technology

Notes

https://datastage.stanford.edu/

References

Al-Shabandar, R., Hussain, A., Laws, A., Keight, R., Lunn, J., Radi, N. (2017). Machine learning approaches to predict learning outcomes in Massive open online courses. Int. Jt. Conf. Neural Networks (pp. 713—720).
Alonso-betanzos, A. (2007). Filter methods for feature selection. A comparative study. Proc. International Conference on Intelligent Data Engineering and Automated Learning (pp. 178—187). UK, Birmingham.
Alves, A. (2017). Stacking machine learning classifiers to identify Higgs bosons at the LHC. Journal of Instrumentation, 12, 1–19.
Article Google Scholar
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., et al. (2015). Spark SQL: Relational Data Processing in Spark. Proceedings of International Conference Management Data (pp. 1383—1394). Australia, Melbourne.
Burgos, C., Campanario, M.L., de la Pena, D., Lara, J.A., Lizcano, D., Martinez, M.A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computer Electrical Engineering, 66, 541–556.
Article Google Scholar
Chaplot, D.S., Rhim, E., Kim, J. (2015). Predicting student attrition in MOOCs using sentiment analysis and neural networks. Proc. CEUR Workshop, 1432, 7–12.
Google Scholar
Choudhury, S., & Bhowal, A. (2015). Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. Proceedings of International Conference in Smart Technology of Management Computer Communication Controlling Energy Material (pp. 89—95). India, Chennai.
Cross, S. (2013). Evaluation of the OLDS MOOC curriculum design course: participant perspectives expectations and experiences. OLDS MOOC Proj.
Crossley, S., Paquette, L., Dascalu, M., McNamara, D.S., Baker, R.S. (2016). Combining click-stream data with NLP tools to better understand MOOC completion. Proc. Sixth Int. Conf. Learn. Anal. Knowl. (pp. 6—14). UK, Edinburgh.
Dinakar, K., Weinstein, E., Lieberman, H., Selman, R. (2014). Stacked Generalization Learning to Analyze Teenage Distress. Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media (pp. 81—90). USA, Michigan.
Fei, M., & Yeung, D.-Y. (2018). Temporal Models for Predicting Student Dropout in Massive Open Online Courses. IEEE International Conference on Data Mining Working (pp. 256—263). Singapore.
Gitinabard, N., Khoshnevisan, F., Lynch, C.F., Wang, E.Y. (2018). Your Actions or Your Associates? Predicting Certification and Dropout in MOOCs with Behavioral and Social Features. Proc. 11th International Conference on Educational Data Mining. Buffalo NY: In Press.
Healey, S.P., Cohen, W.B., Yang, Z., Brewer, C.K., Brooks, E.B., Gorelick, N., Hernandez, A.J., Huang, C., Hughes, M.J., Kennedy, R.E., et al. (2018). MApping forest change using stacked generalization: An ensemble approach. Remote Sensing Environment, 204, 717–728.
Article Google Scholar
Jindal, P., & Kumar, D. (2019). A Review on Dimensionality Reduction Techniques, International Journal Pattern Recognition of Artificial Intelligence. In Press.
Jović, A., Brkić, K., Bogunović, N. (2015). A review of feature selection methods with applications Proceedings of 38th International Convenience of Information Communication Technology Electronic Microelectronics (pp. 1200—1205). Croatia, Opatija.
Kabir, A., Ruiz, C., Alvarez, S.A. (2014). Regression, Classification and Ensemble Machine Learning Approaches to Forecasting Clinical Outcomes in Ischemic Stroke. Biomedical Engineering Systems and Technologies, 452, 376–402.
Google Scholar
Karegowda, A.G., Manjunath, A.S., Jayaram, M.A. (2010). Feature Subset Selection Problem using Wrapper Approach in Supervised Learning. International of Journal Computer Application, 1, 13–17.
Article Google Scholar
Kloft, M., Stiehler, F., Zheng, Z., Pinkwart, N. (2014). Predicting MOOC Dropout over Weeks Using Machine Learning Methods. Proc. Conf. Empir. Methods Nat. Lang. Process. (pp. 60—65).
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H. (2018). Feature selection: a data perspective, ACM Computer Survey, 50.
Article Google Scholar
Liyanagunawardena, T.R., Parslow, P., Williams, S.A. (2014). Dropout: MOOC participants’ perspective. Proceedings of European MOOC Stakehold (pp. 95–100). Switzerland: Summit.
Martínez-España, R., Bueno-Crespo, A., Timón, I., Soto, J., Muñoz, A., Cecilia, J.M. (2018). Air-pollution prediction in smart cities through machine learning methods: A case of study in Murcia. Spain, Journal University of Computer Science, 24, 261–276.
Google Scholar
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D.B., Amde, M., Owen, S. (2016). Others MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research, 17, 1235–1241.
MATH Google Scholar
Naghibi, S.A., Ahmadi, K., Daneshi, A. (2017). Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resources Management, 31, 2761–2775.
Article Google Scholar
Nagi, S., & Bhattacharyya, D.K. (2013). Classification of microarray cancer data using ensemble approach. Network Modelling Analysis of Health Informatics Bioinforma, 2, 159–173.
Article Google Scholar
Onah, D.F., & Sinclair, J. (2014). Boyatt Dropout Rates of Massive Open Online Courses: Behavioural Patterns MOOC Dropout and Completion: Existing Evaluations, Proceedings of 6th International Conference on Education (pp. 1–10). Spain: New Learn. Technol.
Panthong, R., & Srivihok, A. (2015). Wrapper Feature Subset Selection for Dimension Reduction Based on Ensemble Learning Algorithm. Procedia Computer Science, 72, 162–169.
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2012). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
MathSciNet MATH Google Scholar
Prieto, L.P., Rodríguez-Triana, M.J., Kusmin, M., Laanpere, M. (2017). Smart school multimodal dataset and challenges. Proceedings of CEUR Workshop, 1828, 53–59.
Google Scholar
Qi, Q., Liu, Y., Wu, F., Yan Xi., Wu, N. (2018). Temporal Models for Personalized Grade Prediction in Massive Open Online Courses. Proceedings of ACM Turing Celebration Conference (pp. 67—72).
Qiu, L., Liu, Y., Hu, Q., Liu, Y. (2018a). Student dropout prediction in massive open online courses by convolutional neural networks. bSoft Computer, 22, 1–15.
Google Scholar
Qiu, L., Liu, Y., Liu, Y. (2018b). An integrated framework with feature selection for dropout prediction in massive open online courses. IEEE Access, 6, 71474–71484.
Article Google Scholar
Ren, Y., Zhang, L., Suganthan, P.N. (2016). Ensemble Classification and Regression-Recent Developments, Applications and Future Directions. IEEE Computer of Intelligence Magazine, 11, 41–53.
Article Google Scholar
Salcedo-Sanz, S., Cornejo-Bueno, L., Prieto, L., Paredes, D., García-Herrera, R. (2018). Feature selection in machine learning prediction systems for renewable energy applications. Renewable and Sustainable Energy Reviews, 90, 728–741.
Article Google Scholar
Sanchez-Gordon, S., & Luján-Mora, S. (2016). How could MOOCs become accessible? The case of edX and the future of inclusive online learning. Journal University of Computer Science, 22, 55–81.
Google Scholar
Sikora, R., & Al-Laymoun, O. (2014). A Modified Stacking Ensemble Machine Learning Algorithm Using Genetic Algorithms. Handbook of Research on Organizational Transformations through Big Data Analytics, 23, 43–53.
Google Scholar
Sinha, T., Jermann, P., Li, N., Dillenbourg, P. (2014). Your click decides your fate: Inferring Information Processing and Attrition Behavior from MOOC Video Clickstream Interactions. Proceedings of Conference Empirial Methods Nat. Lang. Process. (pp. 6—14).
Talavera, L. (2005). An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering. Proceedings of International Symposium on Intelligent Data Analysis (pp. 440—451). Spain, Madrid.
MATH Google Scholar
Tang, C., Ouyang, Y., Rong, W., Zhang, J., Xiong, Z. (2018). Time series model for predicting dropout in massive open online courses, Proc. International conference on artificial intelligence in education (pp. 353–357). UK.
Vitiello, M., Walk, S., Helic, D., Chang, V., Gütl, C. (2018). User behavioral patterns and early dropouts detection: Improved users profiling through analysis of successive offering of MOOC. Journal University of Computer Science, 24, 1131–1150.
Google Scholar
White, T. (2012). Hadoop: The definitive guide. USA: O’Reilly Media, Inc.
Google Scholar
Witten, I. (2016). Data mining: Practical machine learning tools and techniques. Burlington: MorganKaufmann.
MATH Google Scholar
Xing, W., Chen, X., Stein, J., Marcinkowski, M. (2016). Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Comput. Human Behav., 58, 119–129.
Article Google Scholar
Xu, S., Lu, B., Baldea, M., Edgar, T.F., Nixon, M. (2018). An improved variable selection method for support vector regression in NIR spectral modeling. Journal Process Control, 67, 83–93.
Article Google Scholar
Yang, D., Sinha, T., Adamson, D. (2016). ’Turn on, Tune in, Drop out’: Anticipating student dropouts in Massive Open Online Courses. Proc. NIPS Work. Data Driven Educ. (pp. 1—8).
Yuan, L., & Powell, S. (2013). MOOCS and disruptive innovation: Implications for higher education. In-depth eLearning Papers, 33, 1–7.
Google Scholar
Zhu, Y., Xie, C., Wang, G.J., Yan, X.G. (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Computer Applications, 28, 41–50.
Article Google Scholar
Zitlau, R., Hoyle, B., Paech, K., Weller, J., Rau, M.M., Seitz, S. (2016). Stacking for machine learning redshifts applied to SDSS galaxies. Monthly Not. R. Astron. Soc., 460, 3152–3162.
Article Google Scholar

Download references

Acknowledgements

This research was done through Stanford University’s Advanced Research Center on Online Learning (CAROL), which we thank immensely for all the facilities they provided for us. We also wish to express our full gratitude to Ms. Kathy Mirzaei for her responsiveness as well as her collaboration. We wish to warmly thank Mr. Mitchell Stevens, Director of Digital Research and Planning, as well as all the CAROL commission for the trust they have given us.

Author information

Authors and Affiliations

Computer Science Departement, CADI AYYAD University, Marrakech, Morocco
Mourdi Youssef, Sadgal Mohammed, El Kabtane Hamada & Berrada Fathi Wafaa

Authors

Mourdi Youssef
View author publications
You can also search for this author in PubMed Google Scholar
Sadgal Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
El Kabtane Hamada
View author publications
You can also search for this author in PubMed Google Scholar
Berrada Fathi Wafaa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mourdi Youssef.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Youssef, M., Mohammed, S., Hamada, E.K. et al. A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs. Educ Inf Technol 24, 3591–3618 (2019). https://doi.org/10.1007/s10639-019-09934-y

Download citation

Received: 29 March 2019
Accepted: 16 May 2019
Published: 26 June 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s10639-019-09934-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Early prediction of Student academic performance based on Machine Learning algorithms: A case study of bachelor’s degree students in KSA

A Systematic Approach to Transform Machine Learning Students’ Performance Prediction Model into Preventive Procedures

Students’ Performance Prediction Using Feature Selection and Supervised Machine Learning Algorithms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Early prediction of Student academic performance based on Machine Learning algorithms: A case study of bachelor’s degree students in KSA

A Systematic Approach to Transform Machine Learning Students’ Performance Prediction Model into Preventive Procedures

Students’ Performance Prediction Using Feature Selection and Supervised Machine Learning Algorithms

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation