Abstract
Preventing student dropout is a challenge for higher education institutions (HEIs) that have worsened with COVID-19 and online classes. Despite several research attempts to understand and reduce dropout rates in HEIs, the solutions found in the literature are often hardcoded, making reuse difficult and therefore slowing progress in the area. In an effort to advance the area, this paper introduces a novel portable approach based on genetic algorithms to automatically select the optimal subset of features for dropout prediction in HEIs. Our approach is validated on a dataset containing approx. 248k student records from a Brazilian university. The results show that the proposed approach significantly increases the accuracy in dropout prediction, outperforming previous work in the literature. Our contributions in this paper are fourfold: the implementation of a (i) novel efficient and accurate automatic feature selector that does not require expert knowledge; (ii) an adaptive deep learning model for dropout prediction in sequential data sets; (iii) a portable solution that can be applied to other data sets/degrees; and, (iv) an analysis and discussion of the performance of feature selection and predictive models for dropout prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
This information is important for the reproducibility of the results.
References
Ai, D., Zhang, T., Yu, G., Shao, X.: A dropout prediction framework combined with ensemble feature selection. In: Proceedings of the 2020 8th International Conference on Information and Education Technology, pp. 179–185. ICIET 2020, ACM, NY, USA (2020). https://doi.org/10.1145/3395245.3396432
Babatunde, O., Armstrong, L., Leng, J., Diepeveen, D.: A genetic algorithm-based feature selection. Int. J. Electron. Commun. Comput. Eng. 5, 889–905 (2014)
Baranyi, M., Nagy, M., Molontay, R.: Interpretable deep learning for university dropout prediction. In: Proceedings of the 21st Annual Conference on Information Technology Education, pp. 13–19. SIGITE 2020, ACM, NY, USA (2020). https://doi.org/10.1145/3368308.3415382
Cai, L., Zhang, G.: Prediction of MOOCs dropout based on WCLSRT model. In: IEEE Conference on Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), vol. 5, pp. 780–784 (2021). https://doi.org/10.1109/IAEAC50856.2021.9390886
Chollet, F.: Deep Learning with Python. Manning, November 2017
Diaz-Mujica, A., Pérez, M., Bernardo, A., Cervero, A., González-Pienda, J.: Affective and cognitive variables involved in structural prediction of university droput. Psicothema 31, 429–436 (2019). https://doi.org/10.7334/psicothema2019.124
Garratt-Reed, D., Roberts, L.D., Heritage, B.: Grades, student satisfaction and retention in online and face-to-face introductory psychology units: a test of equivalency theory. Front. Psychol. 7 (2016). https://doi.org/10.3389/fpsyg.2016.00673
Haiyang, L., Wang, Z., Benachour, P., Tubman, P.: A time series classification method for behaviour-based dropout prediction. In: IEEE 18th International Conference on Advanced Learning Technologies (ICALT), pp. 191–195 (2018). https://doi.org/10.1109/ICALT.2018.00052
Hasbun, T., Araya, A., Villalon, J.: Extracurricular activities as dropout prediction factors in higher education using decision trees. In: IEEE 16th International Conference on Advanced Learning Technologies (ICALT), pp. 242–244 (2016). https://doi.org/10.1109/ICALT.2016.66
Herbert, M.A.: Staying the course: a study in online student satisfaction and retention. Online J. Distance Learn. Adm. 9(4), 300–317 (2006)
Hong, S., Lynn, H.S.: Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Med. Res. Methodol. 20 (2020). Article No. 199. https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01080-1#citeas
Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 5(2), 1 (2015)
Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007). https://doi.org/10.1016/j.patrec.2007.05.011
Kostopoulos, G., Kotsiantis, S., Ragos, O., Grapsa, T.N.: Early dropout prediction in distance higher education using active learning. In: 2017 8th International Conference on Information, Intelligence, Systems Applications (IISA), pp. 1–6 (2017). https://doi.org/10.1109/IISA.2017.8316424
Leardi, R.: Application of genetic algorithm-pls for feature selection in spectral data sets. J. Chem. - J. Chemometr 14, 643–655 (2000). https://doi.org/10.1002/1099-128X(200009/12)14:5/63.0.CO;2-E
Limsathitwong, K., Tiwatthanont, K., Yatsungnoen, T.: Dropout prediction system to reduce discontinue study rate of information technology students. In: 5th International Conference on Business and Industrial Research (ICBIR), pp. 110–114 (2018). https://doi.org/10.1109/ICBIR.2018.8391176
Liu, K., Tatinati, S., Khong, A.W.H.: A weighted feature extraction technique based on temporal accumulation of learner behavior features for early prediction of dropouts. In: IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), pp. 295–302 (2020). https://doi.org/10.1109/TALE48869.2020.9368317
Manrique, R., Nunes, B.P., Marino, O., Casanova, M.A., Nurmikko-Fuller, T.: An analysis of student representation, representative features and classification algorithms to predict degree dropout. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, pp. 401–410. LAK19, ACM, NY, USA (2019). https://doi.org/10.1145/3303772.3303800
Marcilio, W.E., Eler, D.M.: From explanations to feature selection: assessing shap values as feature selection mechanism. In: 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 340–347 (2020)
Nagrecha, S., Dillon, J.Z., Chawla, N.V.: Mooc dropout prediction: lessons learned from making pipelines interpretable. In: 26th International Conference on World Wide Web Companion, pp. 351–359. WWW 2017 Companion, Republic and Canton of Geneva, CHE (2017). https://doi.org/10.1145/3041021.3054162
Parsa, A.B., Movahedi, A., Taghipour, H., Derrible, S., Mohammadian, A.: Toward safer highways, application of xgboost and shap for real-time accident detection and feature analysis. Accid. Anal. Prev. 136, 105405 (2019). https://doi.org/10.1016/j.aap.2019.105405
Pereira Nunes, B., Mera, A., Casanova, M.A., Fetahu, B., P. Paes Leme, L.A., Dietze, S.: Complex matching of RDF datatype properties. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013. LNCS, vol. 8055, pp. 195–208. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40285-2_18
Rovira, S., Puertas, E., Igual, L.: Data-driven system to predict academic grades and dropout. PLOS ONE 12, e0171207 (2017). https://doi.org/10.1371/journal.pone.0171207
Sales, A.R.P., Balby, L., Cajueiro, A.: Exploiting academic records for predicting student drop out: a case study in Brazilian higher education. J. Inf. Data Manag. 7, 166–180 (2016)
Shipley, B., Ian, W.: Here comes the drop: university drop out rates and increasing student retention through education, January 2019
Srinivas, C., Reddy, B.R., Ramji, K., Naveen, R.: Sensitivity analysis to determine the parameters of genetic algorithm for machine layout. Procedia Mater. Sci. 6, 866–876 (2014). https://doi.org/10.1016/j.mspro.2014.07.104
Sukhbaatar, O., Ogata, K., Usagawa, T.: Mining educational data to predict academic dropouts: a case study in blended learning course. In: TENCON 2018–2018 IEEE Region 10 Conference, pp. 2205–2208 (2018). https://doi.org/10.1109/TENCON.2018.8650138
Tao, Z., Huiling, L., Wenwen, W., Xia, Y.: Ga-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl. Soft Comput. 75, 323–332 (2019). https://doi.org/10.1016/j.asoc.2018.11.001
Wang, W., Yu, H., Miao, C.: Deep model for dropout prediction in MOOCs. In: International Conference on Crowd Science and Engineering, pp. 26–32. ICCSE 2017, ACM, New York (2017). https://doi.org/10.1145/3126973.3126990
Whitcombe, J., Cropp, R., Braddock, R., Agranovski, I.: The use of sensitivity analysis and genetic algorithms for the management of catalyst emissions from oil refineries. Math. Comput. Model. 44(5), 430–438 (2006). https://doi.org/10.1016/j.mcm.2006.01.003
Wu, N., Zhang, L., Gao, Y., Zhang, M., Sun, X., Feng, J.: Clms-net: Dropout prediction in MOOCs with deep learning. In: Proceedings of the ACM Turing Celebration Conference - China, ACM TURC 2019, ACM, NY, USA (2019). https://doi.org/10.1145/3321408.3322848
Yang, D., Sinha, T., Adamson, D., Rose, C.P.: Turn on, tune in, drop out: anticipating student dropouts. In: in Massive Open Online Courses, in NIPS Data-Driven Education Workshop (2013)
Zhang, Y., Chang, L., Liu, T.: MOOCs dropout prediction based on hybrid deep neural network. In: 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 197–203 (2020). https://doi.org/10.1109/CyberC49757.2020.00039
Zheng, Y., Gao, Z., Wang, Y., Fu, Q.: MOOC dropout prediction using FWTS-CNN model based on fused feature weighting and time series. IEEE Access 8, 225324–225335 (2020). https://doi.org/10.1109/ACCESS.2020.3045157
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cheng, Y., Pereira Nunes, B., Manrique, R. (2022). Not Another Hardcoded Solution to the Student Dropout Prediction Problem: A Novel Approach Using Genetic Algorithms for Feature Selection. In: Crossley, S., Popescu, E. (eds) Intelligent Tutoring Systems. ITS 2022. Lecture Notes in Computer Science, vol 13284. Springer, Cham. https://doi.org/10.1007/978-3-031-09680-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-09680-8_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09679-2
Online ISBN: 978-3-031-09680-8
eBook Packages: Computer ScienceComputer Science (R0)