Abstract
School dropout is a structural problem which permanently penalizes students and society in areas such as low qualification jobs, higher poverty levels and lower life expectancy, lower pensions, and higher economic burden for governments. Given these high consequences and the surge of the problem due to COVID-19 pandemic, in this paper we propose a methodology to design, develop, and evaluate a machine learning model for predicting dropout in school systems. In this methodology, we introduce necessary steps to develop a robust model to estimate the individual risk of each student to drop out of school. As advancement from previous research, this proposal focuses on analyzing individual trajectories of students, incorporating the student situation at school, family, among other levels, changes, and accumulation of events to predict dropout. Following the methodology, we create a model for the Chilean case based on data available mostly through administrative data from the educational system, and according to known factors associated with school dropout. Our results are better than those from previous research with a relevant sample size, with a predictive capability 20% higher for the actual dropout cases. Also, in contrast to previous work, the including non-individual dimensions results in a substantive contribution to the prediction of leaving school. We also illustrate applications of the model for Chilean case to support public policy decision making such as profiling schools for qualitative studies of pedagogic practices, profiling students’ dropout trajectories and simulating scenarios.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Specially thanks to the Center of Studies from the Ministry of Education, Agency of Quality of Education and JUNJI for providing special datasets to develop the model.
The data that support the findings of this study are available from Ministry of Education – Open data platform, but restrictions apply to the availability of these data, which were used under licence for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Ministry of Education, Agency of Quality of Education and/or JUNJI.
Notes
Year where a student enrolls in first grade.
There is a fourth school category (delegated administration) where schools have a mechanism of funding by charters, with a basal funding to public property schools whose administration is delegated to private agents (Browne, 2017). Nevertheless, since there are only 70 schools in this category (41,578 students in 2019, 1.4% of total same year students) and notorious differences with respect to the ownership, funding, and administration of the schools, we decided to omit it from most of the reports in this article.
References
Adelman, M., Haimovich, F., Ham, A., & Vazquez, E. (2018). Predicting school dropout with administrative data: New evidence from Guatemala and Honduras. Education Economics, 26(4), 356–372. https://doi.org/10.1080/09645292.2018.1433127
Anderson, S., Uribe, M., & Valenzuela, J. P. (2021).Reforming public education in Chile: The creation of local education services. Educational Management Administration & Leadership, 1741143220983327.https://doi.org/10.1177/1741143220983327.
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5
Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for Hyper-Parameter Optimization. Advances in Neural Information Processing Systems, 24. https://papers.nips.cc/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html.
Boniolo, P., & Najmias, C. (2018). School dropout and school lag in Argentina: A social classes approach. Tempo Social, 30(3), 217–247. https://doi.org/10.11606/0103-2070.ts.2018.121349.
Browne, M. (2017). Análisis del Sistema de Administración Delegada creada por el DL No 3166 de 1980. Ministerio de Educación-SETP. http://biblioteca.digital.gob.cl/handle/123456789/897. Accessed 20 Aug 2022.
Buenadicha, C., Galdon, G., Hermosilla, M., Loewe, D., & Pombo, C. (2019). La gestión ética de los datos. Inter-American Development Bank. https://doi.org/10.18235/0001623.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785.
Dos Santos, E. M., Sabourin, R., & Maupin, P. (2009). Overfitting cautious selection of classifier ensembles with genetic algorithms. Information Fusion, 10(2), 150–162. https://doi.org/10.1016/j.inffus.2008.11.003
Dussaillant, F. (2017). Deserción escolar en Chile. Propuestas para la investigación y la política pública. Documento No 18, 1–18. Available at: https://gobierno.udd.cl/cpp/files/2020/10/18-Deserción.pdf. Accessed 20 Aug 2022.
Ecker-Lyster, M., & Niileksela, C. (2016). Keeping Students on Track to Graduate: A Synthesis of School Dropout Trends, Prevention, and Intervention Initiatives. The Journal of at-Risk Issues, 19(2), 24–31.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226–231. Available at: https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf.
Gil, A. J., Antelm-Lanzat, A. M., Cacheiro-González, M. L., & Pérez-Navío, E. (2019). School dropout factors: A teacher and school manager perspective. Educational Studies, 45(6), 756–770. https://doi.org/10.1080/03055698.2018.1516632
Hirakawa, Y., & Taniguchi, K. (2021). School dropout in primary schools in rural Cambodia: School-level and student-level factors. Asia Pacific Journal of Education, 41(3), 527–542. https://doi.org/10.1080/02188791.2020.1832042
Höfter, R. H. (2006). Private health insurance and utilization of health services in Chile. Applied Economics, 38(4), 423–439. https://doi.org/10.1080/00036840500392797
Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., Simonyan, K., Fernando, C., & Kavukcuoglu, K. (2017). Population Based Training of Neural Networks. ArXiv:1711.09846 [Cs]. http://arxiv.org/abs/1711.09846.
Jena, M., & Dehuri, S. (2020). DecisionTree for Classification and Regression: A State-of-the Art Review. Informatica, 44(4), 4. https://doi.org/10.31449/inf.v44i4.3023.
Kattan, R. B., & Székely, M. (2017). Analyzing Upper Secondary Education Dropout in Latin America through a Cohort Approach. Journal of Education and Learning, 6(4), 12–39. https://doi.org/10.5539/jel.v6n4p12
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157
Khan, M. J., & Ahmed, J. (2021). Child education in the time of pandemic: Learning loss and dropout. Children and Youth Services Review, 127, 106065. https://doi.org/10.1016/j.childyouth.2021.106065
Kursa, M. B., Jankowski, A., & Rudnicki, W. R. (2010). Boruta – A System for Feature Selection. Fundamenta Informaticae, 101(4), 271–285. https://doi.org/10.3233/FI-2010-288
Ladd, H., & Fiske, E. (2020). International perspectives on school choice. Routledge.
Lee, S., & Chung, J. Y. (2019). The Machine Learning-Based Dropout Early Warning System for Improving the Performance of Dropout Prediction. Applied Sciences, 9(15), 3093. https://doi.org/10.3390/app9153093
Lee-St John, T. J., Walsh, M. E., Raczek, A. E., Vuilleumier, C. E., Foley, C., Heberle, A., Sibley, E., & Dearing, E. (2018). The Long-Term Impact of Systemic Student Support in Elementary School: Reducing High School Dropout. Aera Open, 4(4). https://doi.org/10.1177/2332858418799085.
Levin, H. M., Belfield, C., Hollands, F., & Bowden, A. B. (2012). Cost-Effectiveness analysis of interventions that improve high school completion. Center for Benefit-Cost Studies of Education 34. https://repository.upenn.edu/cbcse/34. Accessed 20 Aug 2022
Lundberg, S. M., Erion, G. G., & Lee, S.-I. (2019). Consistent Individualized Feature Attribution for Tree Ensembles. ArXiv:1802.03888 [Cs, Stat]. http://arxiv.org/abs/1802.03888.
Lundberg, S. M., Nair, B., Vavilala, M. S., Horibe, M., Eisses, M. J., Adams, T., Liston, D. E., Low, D.K.-W., Newman, S.-F., Kim, J., & Lee, S.-I. (2018). Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2(10), 749–760. https://doi.org/10.1038/s41551-018-0304-0
Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y. M., Fardoun, H. M., & Ventura, S. (2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33(1), 107–124. https://doi.org/10.1111/exsy.12135
McInnes, L., Healy, J., & Melville, J. (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction (arXiv:1802.03426). arXiv. https://doi.org/10.48550/arXiv.1802.03426.
Mduma, N., Kalegele, K., & Machuve, D. (2019). A Survey of Machine Learning Approaches and Techniques for Student Dropout Prediction. Data Science Journal, 18, 14. https://doi.org/10.5334/dsj-2019-014
Misra, P., & Yadav, A. (2020). Improving the classification accuracy using recursive feature elimination with cross-validation. International Journal on Emerging Technologies, 11(3), 659-665.
Şara, N-B., Halland, R., Igel, C., and Alstrup, S. (2015). High-school dropout prediction using machine learning: a Danish large-scale study. In M. Verleysen (Ed.), Proceedings. ESANN 2015: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (pp. 319-324).
OECD. (2010). Overcoming school failure: Policies that work. OECD project description, (April). Available at https://www.oecd.org/education/school/45171670.pdf
OECD. (2020). Education at a Glance 2020: OECD Indicators. Organisation for Economic Co-operation and Development. https://www.oecd-ilibrary.org/education/education-at-a-glance-2020_69096873-en. Accessed 20 Aug 2022.
Pereira de Souza, C. M., Pereira, J. M., & de Jesus Ranke, M. da C. (2020). Reflexes of the Pandemic in school dropout/exit: The democratization of access and permanence. Revista Brasileira De Educacao Do Campo-Brazilian Journal of Rural Education, 5, e10844. https://doi.org/10.20873/uft.rbec.e10844.
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2019). CatBoost: Unbiased boosting with categorical features (arXiv:1706.09516). arXiv. https://doi.org/10.48550/arXiv.1706.09516.
Sahin, S., Arseven, Z., & Kilic, A. (2016). Causes of Student Absenteeism and School Dropouts. International Journal of Instruction, 9(1), 195–210. https://doi.org/10.12973/iji.2016.9115a.
Sansone, D. (2019). Beyond Early Warning Indicators: High School Dropout and Machine Learning. Oxford Bulletin of Economics and Statistics, 81(2), 456–485. https://doi.org/10.1111/obes.12277
Sharma, P., Mirzan, S. R., Bhandari, A., Pimpley, A., Eswaran, A., Srinivasan, S., & Shao, L. (2020). Evaluating Tree Explanation Methods for Anomaly Reasoning: A Case Study of SHAP TreeExplainer and TreeInterpreter. In G. Grossmann & S. Ram (Eds.), Advances in Conceptual Modeling (pp. 35–45). Springer International Publishing. https://doi.org/10.1007/978-3-030-65847-2_4.
Sorensen, L. C. (2019). “Big Data” in Educational Administration: An Application for Predicting School Dropout Risk. Educational Administration Quarterly, 55(3), 404–446. https://doi.org/10.1177/0013161X18799439
Studer, S., Bui, T. B., Drescher, C., Hanuschkin, A., Winkler, L., Peters, S., & Müller, K.-R. (2021). Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology. Machine Learning and Knowledge Extraction, 3(2), 392–413. https://doi.org/10.3390/make3020020
UNESCO. (2012). International Standard Classification of Education ISCED 2011. UNESCO Institute of Statistics, Montreal. Available at http://uis.unesco.org/sites/default/files/documents/international-standard-classification-of-education-isced-2011-en.pdf. Accessed 20 Aug 2022
UNESCO. (2020). UNESCO COVID-19 education response: How many students are at risk of not returning to school? Advocacy paper. UNESCO Paris. Available at https://unesdoc.unesco.org/ark:/48223/pf0000373992. Accessed 20 Aug 2022.
Valenzuela, J. P., & Allende, C. (2014). Trayectorias de mejoramiento en el Sistema Escolar Chileno: Las escuelas de educación básica 2002 - 2010. Apuntes sobre Mejoramiento Escolar N°1, Enero 2014. Anillo de Ciencias Sociales sobre Mejoramiento de la Efectividad Escolar en Chile. https://www.mejoramientoescolar.cl/download.php?file=recursos/nota_tecnica.pdf. Accessed 20 Aug 2022.
Weybright, E. H., Caldwell, L. L., Wegner, L., & Smith, E. A. (2017). Predicting secondary school dropout among South African adolescents: A survival analysis approach. South African Journal of Education, 37(2), 1–11. https://doi.org/10.15700/saje.v37n2a1353.
Yoshida, S. (2020). Verification of Usefulness of SHAP values in Interpretation of Decision Tree Models. The Japanese Society for Artificial Intelligence. https://confit.atlas.jp/guide/event/jsai2020/subject/3E5-GS-2-04/detail. Accessed 20 Aug. 20022.
Zaff, J. F., Donlan, A., Gunning, A., Anderson, S. E., Mcdermott, E., & Sedaca, M. (2017). Factors that Promote High School Graduation: A Review of the Literature. Educational Psychology Review, 447–476.https://doi.org/10.1007/s10648-016-9363-5.
Funding
We thank the support from ANID/PIA/Basal Funds for Centers of Excellence FB0003 and ANID-FONDEF IT17I0006 grants.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rodríguez, P., Villanueva, A., Dombrovskaia, L. et al. A methodology to design, develop, and evaluate machine learning models for predicting dropout in school systems: the case of Chile. Educ Inf Technol 28, 10103–10149 (2023). https://doi.org/10.1007/s10639-022-11515-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-022-11515-5