Skip to main content
Log in

A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

MOOCs are becoming more and more involved in the pedagogical experimentation of universities whose infrastructure does not respond to the growing mass of learners. These universities aim to complete their initial training with distance learning courses. Unfortunately, the efforts made to succeed in this pedagogical model are facing a dropout rate of enrolled learners reaching 90% in some cases. This makes the coaching, the group formation of learners, and the instructor/learner interaction challenging. It is within this context that this research aims to propose a predictive model allowing to classify the MOOCs learners into three classes: the learners at risk of dropping out, those who are likely to fail and those who are on the road to success. An automatic determination of relevant attributes for analysis, classification, interpretation and prediction from MOOC learners data, will allow instructors to streamline interventions for each class. To meet this purpose, we present an approach based on feature selection methods and ensemble machine learning algorithms. The proposed model was tested on a dataset of over 5,500 learners in two Stanford University MOOCs courses. In order to attest its performance (98.6%), a comparison was carried out based on several performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://datastage.stanford.edu/

References

  • Al-Shabandar, R., Hussain, A., Laws, A., Keight, R., Lunn, J., Radi, N. (2017). Machine learning approaches to predict learning outcomes in Massive open online courses. Int. Jt. Conf. Neural Networks (pp. 713—720).

  • Alonso-betanzos, A. (2007). Filter methods for feature selection. A comparative study. Proc. International Conference on Intelligent Data Engineering and Automated Learning (pp. 178—187). UK, Birmingham.

  • Alves, A. (2017). Stacking machine learning classifiers to identify Higgs bosons at the LHC. Journal of Instrumentation, 12, 1–19.

    Article  Google Scholar 

  • Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., et al. (2015). Spark SQL: Relational Data Processing in Spark. Proceedings of International Conference Management Data (pp. 1383—1394). Australia, Melbourne.

  • Burgos, C., Campanario, M.L., de la Pena, D., Lara, J.A., Lizcano, D., Martinez, M.A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computer Electrical Engineering, 66, 541–556.

    Article  Google Scholar 

  • Chaplot, D.S., Rhim, E., Kim, J. (2015). Predicting student attrition in MOOCs using sentiment analysis and neural networks. Proc. CEUR Workshop, 1432, 7–12.

    Google Scholar 

  • Choudhury, S., & Bhowal, A. (2015). Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. Proceedings of International Conference in Smart Technology of Management Computer Communication Controlling Energy Material (pp. 89—95). India, Chennai.

  • Cross, S. (2013). Evaluation of the OLDS MOOC curriculum design course: participant perspectives expectations and experiences. OLDS MOOC Proj.

  • Crossley, S., Paquette, L., Dascalu, M., McNamara, D.S., Baker, R.S. (2016). Combining click-stream data with NLP tools to better understand MOOC completion. Proc. Sixth Int. Conf. Learn. Anal. Knowl. (pp. 6—14). UK, Edinburgh.

  • Dinakar, K., Weinstein, E., Lieberman, H., Selman, R. (2014). Stacked Generalization Learning to Analyze Teenage Distress. Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media (pp. 81—90). USA, Michigan.

  • Fei, M., & Yeung, D.-Y. (2018). Temporal Models for Predicting Student Dropout in Massive Open Online Courses. IEEE International Conference on Data Mining Working (pp. 256—263). Singapore.

  • Gitinabard, N., Khoshnevisan, F., Lynch, C.F., Wang, E.Y. (2018). Your Actions or Your Associates? Predicting Certification and Dropout in MOOCs with Behavioral and Social Features. Proc. 11th International Conference on Educational Data Mining. Buffalo NY: In Press.

  • Healey, S.P., Cohen, W.B., Yang, Z., Brewer, C.K., Brooks, E.B., Gorelick, N., Hernandez, A.J., Huang, C., Hughes, M.J., Kennedy, R.E., et al. (2018). MApping forest change using stacked generalization: An ensemble approach. Remote Sensing Environment, 204, 717–728.

    Article  Google Scholar 

  • Jindal, P., & Kumar, D. (2019). A Review on Dimensionality Reduction Techniques, International Journal Pattern Recognition of Artificial Intelligence. In Press.

  • Jović, A., Brkić, K., Bogunović, N. (2015). A review of feature selection methods with applications Proceedings of 38th International Convenience of Information Communication Technology Electronic Microelectronics (pp. 1200—1205). Croatia, Opatija.

  • Kabir, A., Ruiz, C., Alvarez, S.A. (2014). Regression, Classification and Ensemble Machine Learning Approaches to Forecasting Clinical Outcomes in Ischemic Stroke. Biomedical Engineering Systems and Technologies, 452, 376–402.

    Google Scholar 

  • Karegowda, A.G., Manjunath, A.S., Jayaram, M.A. (2010). Feature Subset Selection Problem using Wrapper Approach in Supervised Learning. International of Journal Computer Application, 1, 13–17.

    Article  Google Scholar 

  • Kloft, M., Stiehler, F., Zheng, Z., Pinkwart, N. (2014). Predicting MOOC Dropout over Weeks Using Machine Learning Methods. Proc. Conf. Empir. Methods Nat. Lang. Process. (pp. 60—65).

  • Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H. (2018). Feature selection: a data perspective, ACM Computer Survey, 50.

    Article  Google Scholar 

  • Liyanagunawardena, T.R., Parslow, P., Williams, S.A. (2014). Dropout: MOOC participants’ perspective. Proceedings of European MOOC Stakehold (pp. 95–100). Switzerland: Summit.

  • Martínez-España, R., Bueno-Crespo, A., Timón, I., Soto, J., Muñoz, A., Cecilia, J.M. (2018). Air-pollution prediction in smart cities through machine learning methods: A case of study in Murcia. Spain, Journal University of Computer Science, 24, 261–276.

    Google Scholar 

  • Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D.B., Amde, M., Owen, S. (2016). Others MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research, 17, 1235–1241.

    MATH  Google Scholar 

  • Naghibi, S.A., Ahmadi, K., Daneshi, A. (2017). Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resources Management, 31, 2761–2775.

    Article  Google Scholar 

  • Nagi, S., & Bhattacharyya, D.K. (2013). Classification of microarray cancer data using ensemble approach. Network Modelling Analysis of Health Informatics Bioinforma, 2, 159–173.

    Article  Google Scholar 

  • Onah, D.F., & Sinclair, J. (2014). Boyatt Dropout Rates of Massive Open Online Courses: Behavioural Patterns MOOC Dropout and Completion: Existing Evaluations, Proceedings of 6th International Conference on Education (pp. 1–10). Spain: New Learn. Technol.

  • Panthong, R., & Srivihok, A. (2015). Wrapper Feature Subset Selection for Dimension Reduction Based on Ensemble Learning Algorithm. Procedia Computer Science, 72, 162–169.

    Article  Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2012). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    MathSciNet  MATH  Google Scholar 

  • Prieto, L.P., Rodríguez-Triana, M.J., Kusmin, M., Laanpere, M. (2017). Smart school multimodal dataset and challenges. Proceedings of CEUR Workshop, 1828, 53–59.

    Google Scholar 

  • Qi, Q., Liu, Y., Wu, F., Yan Xi., Wu, N. (2018). Temporal Models for Personalized Grade Prediction in Massive Open Online Courses. Proceedings of ACM Turing Celebration Conference (pp. 67—72).

  • Qiu, L., Liu, Y., Hu, Q., Liu, Y. (2018a). Student dropout prediction in massive open online courses by convolutional neural networks. bSoft Computer, 22, 1–15.

    Google Scholar 

  • Qiu, L., Liu, Y., Liu, Y. (2018b). An integrated framework with feature selection for dropout prediction in massive open online courses. IEEE Access, 6, 71474–71484.

    Article  Google Scholar 

  • Ren, Y., Zhang, L., Suganthan, P.N. (2016). Ensemble Classification and Regression-Recent Developments, Applications and Future Directions. IEEE Computer of Intelligence Magazine, 11, 41–53.

    Article  Google Scholar 

  • Salcedo-Sanz, S., Cornejo-Bueno, L., Prieto, L., Paredes, D., García-Herrera, R. (2018). Feature selection in machine learning prediction systems for renewable energy applications. Renewable and Sustainable Energy Reviews, 90, 728–741.

    Article  Google Scholar 

  • Sanchez-Gordon, S., & Luján-Mora, S. (2016). How could MOOCs become accessible? The case of edX and the future of inclusive online learning. Journal University of Computer Science, 22, 55–81.

    Google Scholar 

  • Sikora, R., & Al-Laymoun, O. (2014). A Modified Stacking Ensemble Machine Learning Algorithm Using Genetic Algorithms. Handbook of Research on Organizational Transformations through Big Data Analytics, 23, 43–53.

    Google Scholar 

  • Sinha, T., Jermann, P., Li, N., Dillenbourg, P. (2014). Your click decides your fate: Inferring Information Processing and Attrition Behavior from MOOC Video Clickstream Interactions. Proceedings of Conference Empirial Methods Nat. Lang. Process. (pp. 6—14).

  • Talavera, L. (2005). An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering. Proceedings of International Symposium on Intelligent Data Analysis (pp. 440—451). Spain, Madrid.

    MATH  Google Scholar 

  • Tang, C., Ouyang, Y., Rong, W., Zhang, J., Xiong, Z. (2018). Time series model for predicting dropout in massive open online courses, Proc. International conference on artificial intelligence in education (pp. 353–357). UK.

  • Vitiello, M., Walk, S., Helic, D., Chang, V., Gütl, C. (2018). User behavioral patterns and early dropouts detection: Improved users profiling through analysis of successive offering of MOOC. Journal University of Computer Science, 24, 1131–1150.

    Google Scholar 

  • White, T. (2012). Hadoop: The definitive guide. USA: O’Reilly Media, Inc.

    Google Scholar 

  • Witten, I. (2016). Data mining: Practical machine learning tools and techniques. Burlington: MorganKaufmann.

    MATH  Google Scholar 

  • Xing, W., Chen, X., Stein, J., Marcinkowski, M. (2016). Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Comput. Human Behav., 58, 119–129.

    Article  Google Scholar 

  • Xu, S., Lu, B., Baldea, M., Edgar, T.F., Nixon, M. (2018). An improved variable selection method for support vector regression in NIR spectral modeling. Journal Process Control, 67, 83–93.

    Article  Google Scholar 

  • Yang, D., Sinha, T., Adamson, D. (2016). ’Turn on, Tune in, Drop out’: Anticipating student dropouts in Massive Open Online Courses. Proc. NIPS Work. Data Driven Educ. (pp. 1—8).

  • Yuan, L., & Powell, S. (2013). MOOCS and disruptive innovation: Implications for higher education. In-depth eLearning Papers, 33, 1–7.

    Google Scholar 

  • Zhu, Y., Xie, C., Wang, G.J., Yan, X.G. (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Computer Applications, 28, 41–50.

    Article  Google Scholar 

  • Zitlau, R., Hoyle, B., Paech, K., Weller, J., Rau, M.M., Seitz, S. (2016). Stacking for machine learning redshifts applied to SDSS galaxies. Monthly Not. R. Astron. Soc., 460, 3152–3162.

    Article  Google Scholar 

Download references

Acknowledgements

This research was done through Stanford University’s Advanced Research Center on Online Learning (CAROL), which we thank immensely for all the facilities they provided for us. We also wish to express our full gratitude to Ms. Kathy Mirzaei for her responsiveness as well as her collaboration. We wish to warmly thank Mr. Mitchell Stevens, Director of Digital Research and Planning, as well as all the CAROL commission for the trust they have given us.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mourdi Youssef.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Youssef, M., Mohammed, S., Hamada, E.K. et al. A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs. Educ Inf Technol 24, 3591–3618 (2019). https://doi.org/10.1007/s10639-019-09934-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-019-09934-y

Keywords

Navigation