Skip to main content

Not Another Hardcoded Solution to the Student Dropout Prediction Problem: A Novel Approach Using Genetic Algorithms for Feature Selection

  • Conference paper
  • First Online:
Intelligent Tutoring Systems (ITS 2022)

Abstract

Preventing student dropout is a challenge for higher education institutions (HEIs) that have worsened with COVID-19 and online classes. Despite several research attempts to understand and reduce dropout rates in HEIs, the solutions found in the literature are often hardcoded, making reuse difficult and therefore slowing progress in the area. In an effort to advance the area, this paper introduces a novel portable approach based on genetic algorithms to automatically select the optimal subset of features for dropout prediction in HEIs. Our approach is validated on a dataset containing approx. 248k student records from a Brazilian university. The results show that the proposed approach significantly increases the accuracy in dropout prediction, outperforming previous work in the literature. Our contributions in this paper are fourfold: the implementation of a (i) novel efficient and accurate automatic feature selector that does not require expert knowledge; (ii) an adaptive deep learning model for dropout prediction in sequential data sets; (iii) a portable solution that can be applied to other data sets/degrees; and, (iv) an analysis and discussion of the performance of feature selection and predictive models for dropout prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://scikit-learn.org/.

  2. 2.

    This information is important for the reproducibility of the results.

References

  1. Ai, D., Zhang, T., Yu, G., Shao, X.: A dropout prediction framework combined with ensemble feature selection. In: Proceedings of the 2020 8th International Conference on Information and Education Technology, pp. 179–185. ICIET 2020, ACM, NY, USA (2020). https://doi.org/10.1145/3395245.3396432

  2. Babatunde, O., Armstrong, L., Leng, J., Diepeveen, D.: A genetic algorithm-based feature selection. Int. J. Electron. Commun. Comput. Eng. 5, 889–905 (2014)

    Google Scholar 

  3. Baranyi, M., Nagy, M., Molontay, R.: Interpretable deep learning for university dropout prediction. In: Proceedings of the 21st Annual Conference on Information Technology Education, pp. 13–19. SIGITE 2020, ACM, NY, USA (2020). https://doi.org/10.1145/3368308.3415382

  4. Cai, L., Zhang, G.: Prediction of MOOCs dropout based on WCLSRT model. In: IEEE Conference on Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), vol. 5, pp. 780–784 (2021). https://doi.org/10.1109/IAEAC50856.2021.9390886

  5. Chollet, F.: Deep Learning with Python. Manning, November 2017

    Google Scholar 

  6. Diaz-Mujica, A., Pérez, M., Bernardo, A., Cervero, A., González-Pienda, J.: Affective and cognitive variables involved in structural prediction of university droput. Psicothema 31, 429–436 (2019). https://doi.org/10.7334/psicothema2019.124

  7. Garratt-Reed, D., Roberts, L.D., Heritage, B.: Grades, student satisfaction and retention in online and face-to-face introductory psychology units: a test of equivalency theory. Front. Psychol. 7 (2016). https://doi.org/10.3389/fpsyg.2016.00673

  8. Haiyang, L., Wang, Z., Benachour, P., Tubman, P.: A time series classification method for behaviour-based dropout prediction. In: IEEE 18th International Conference on Advanced Learning Technologies (ICALT), pp. 191–195 (2018). https://doi.org/10.1109/ICALT.2018.00052

  9. Hasbun, T., Araya, A., Villalon, J.: Extracurricular activities as dropout prediction factors in higher education using decision trees. In: IEEE 16th International Conference on Advanced Learning Technologies (ICALT), pp. 242–244 (2016). https://doi.org/10.1109/ICALT.2016.66

  10. Herbert, M.A.: Staying the course: a study in online student satisfaction and retention. Online J. Distance Learn. Adm. 9(4), 300–317 (2006)

    Google Scholar 

  11. Hong, S., Lynn, H.S.: Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Med. Res. Methodol. 20 (2020). Article No. 199. https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01080-1#citeas

  12. Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 5(2), 1 (2015)

    Article  Google Scholar 

  13. Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007). https://doi.org/10.1016/j.patrec.2007.05.011

    Article  Google Scholar 

  14. Kostopoulos, G., Kotsiantis, S., Ragos, O., Grapsa, T.N.: Early dropout prediction in distance higher education using active learning. In: 2017 8th International Conference on Information, Intelligence, Systems Applications (IISA), pp. 1–6 (2017). https://doi.org/10.1109/IISA.2017.8316424

  15. Leardi, R.: Application of genetic algorithm-pls for feature selection in spectral data sets. J. Chem. - J. Chemometr 14, 643–655 (2000). https://doi.org/10.1002/1099-128X(200009/12)14:5/63.0.CO;2-E

  16. Limsathitwong, K., Tiwatthanont, K., Yatsungnoen, T.: Dropout prediction system to reduce discontinue study rate of information technology students. In: 5th International Conference on Business and Industrial Research (ICBIR), pp. 110–114 (2018). https://doi.org/10.1109/ICBIR.2018.8391176

  17. Liu, K., Tatinati, S., Khong, A.W.H.: A weighted feature extraction technique based on temporal accumulation of learner behavior features for early prediction of dropouts. In: IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), pp. 295–302 (2020). https://doi.org/10.1109/TALE48869.2020.9368317

  18. Manrique, R., Nunes, B.P., Marino, O., Casanova, M.A., Nurmikko-Fuller, T.: An analysis of student representation, representative features and classification algorithms to predict degree dropout. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, pp. 401–410. LAK19, ACM, NY, USA (2019). https://doi.org/10.1145/3303772.3303800

  19. Marcilio, W.E., Eler, D.M.: From explanations to feature selection: assessing shap values as feature selection mechanism. In: 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 340–347 (2020)

    Google Scholar 

  20. Nagrecha, S., Dillon, J.Z., Chawla, N.V.: Mooc dropout prediction: lessons learned from making pipelines interpretable. In: 26th International Conference on World Wide Web Companion, pp. 351–359. WWW 2017 Companion, Republic and Canton of Geneva, CHE (2017). https://doi.org/10.1145/3041021.3054162

  21. Parsa, A.B., Movahedi, A., Taghipour, H., Derrible, S., Mohammadian, A.: Toward safer highways, application of xgboost and shap for real-time accident detection and feature analysis. Accid. Anal. Prev. 136, 105405 (2019). https://doi.org/10.1016/j.aap.2019.105405

  22. Pereira Nunes, B., Mera, A., Casanova, M.A., Fetahu, B., P. Paes Leme, L.A., Dietze, S.: Complex matching of RDF datatype properties. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013. LNCS, vol. 8055, pp. 195–208. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40285-2_18

    Chapter  Google Scholar 

  23. Rovira, S., Puertas, E., Igual, L.: Data-driven system to predict academic grades and dropout. PLOS ONE 12, e0171207 (2017). https://doi.org/10.1371/journal.pone.0171207

  24. Sales, A.R.P., Balby, L., Cajueiro, A.: Exploiting academic records for predicting student drop out: a case study in Brazilian higher education. J. Inf. Data Manag. 7, 166–180 (2016)

    Google Scholar 

  25. Shipley, B., Ian, W.: Here comes the drop: university drop out rates and increasing student retention through education, January 2019

    Google Scholar 

  26. Srinivas, C., Reddy, B.R., Ramji, K., Naveen, R.: Sensitivity analysis to determine the parameters of genetic algorithm for machine layout. Procedia Mater. Sci. 6, 866–876 (2014). https://doi.org/10.1016/j.mspro.2014.07.104

    Article  Google Scholar 

  27. Sukhbaatar, O., Ogata, K., Usagawa, T.: Mining educational data to predict academic dropouts: a case study in blended learning course. In: TENCON 2018–2018 IEEE Region 10 Conference, pp. 2205–2208 (2018). https://doi.org/10.1109/TENCON.2018.8650138

  28. Tao, Z., Huiling, L., Wenwen, W., Xia, Y.: Ga-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl. Soft Comput. 75, 323–332 (2019). https://doi.org/10.1016/j.asoc.2018.11.001

    Article  Google Scholar 

  29. Wang, W., Yu, H., Miao, C.: Deep model for dropout prediction in MOOCs. In: International Conference on Crowd Science and Engineering, pp. 26–32. ICCSE 2017, ACM, New York (2017). https://doi.org/10.1145/3126973.3126990

  30. Whitcombe, J., Cropp, R., Braddock, R., Agranovski, I.: The use of sensitivity analysis and genetic algorithms for the management of catalyst emissions from oil refineries. Math. Comput. Model. 44(5), 430–438 (2006). https://doi.org/10.1016/j.mcm.2006.01.003

    Article  MATH  Google Scholar 

  31. Wu, N., Zhang, L., Gao, Y., Zhang, M., Sun, X., Feng, J.: Clms-net: Dropout prediction in MOOCs with deep learning. In: Proceedings of the ACM Turing Celebration Conference - China, ACM TURC 2019, ACM, NY, USA (2019). https://doi.org/10.1145/3321408.3322848

  32. Yang, D., Sinha, T., Adamson, D., Rose, C.P.: Turn on, tune in, drop out: anticipating student dropouts. In: in Massive Open Online Courses, in NIPS Data-Driven Education Workshop (2013)

    Google Scholar 

  33. Zhang, Y., Chang, L., Liu, T.: MOOCs dropout prediction based on hybrid deep neural network. In: 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 197–203 (2020). https://doi.org/10.1109/CyberC49757.2020.00039

  34. Zheng, Y., Gao, Z., Wang, Y., Fu, Q.: MOOC dropout prediction using FWTS-CNN model based on fused feature weighting and time series. IEEE Access 8, 225324–225335 (2020). https://doi.org/10.1109/ACCESS.2020.3045157

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yixin Cheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cheng, Y., Pereira Nunes, B., Manrique, R. (2022). Not Another Hardcoded Solution to the Student Dropout Prediction Problem: A Novel Approach Using Genetic Algorithms for Feature Selection. In: Crossley, S., Popescu, E. (eds) Intelligent Tutoring Systems. ITS 2022. Lecture Notes in Computer Science, vol 13284. Springer, Cham. https://doi.org/10.1007/978-3-031-09680-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-09680-8_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-09679-2

  • Online ISBN: 978-3-031-09680-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics