Skip to main content

Advertisement

Log in

A methodology to design, develop, and evaluate machine learning models for predicting dropout in school systems: the case of Chile

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

School dropout is a structural problem which permanently penalizes students and society in areas such as low qualification jobs, higher poverty levels and lower life expectancy, lower pensions, and higher economic burden for governments. Given these high consequences and the surge of the problem due to COVID-19 pandemic, in this paper we propose a methodology to design, develop, and evaluate a machine learning model for predicting dropout in school systems. In this methodology, we introduce necessary steps to develop a robust model to estimate the individual risk of each student to drop out of school. As advancement from previous research, this proposal focuses on analyzing individual trajectories of students, incorporating the student situation at school, family, among other levels, changes, and accumulation of events to predict dropout. Following the methodology, we create a model for the Chilean case based on data available mostly through administrative data from the educational system, and according to known factors associated with school dropout. Our results are better than those from previous research with a relevant sample size, with a predictive capability 20% higher for the actual dropout cases. Also, in contrast to previous work, the including non-individual dimensions results in a substantive contribution to the prediction of leaving school. We also illustrate applications of the model for Chilean case to support public policy decision making such as profiling schools for qualitative studies of pedagogic practices, profiling students’ dropout trajectories and simulating scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Specially thanks to the Center of Studies from the Ministry of Education, Agency of Quality of Education and JUNJI for providing special datasets to develop the model.

The data that support the findings of this study are available from Ministry of Education – Open data platform, but restrictions apply to the availability of these data, which were used under licence for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Ministry of Education, Agency of Quality of Education and/or JUNJI.

Notes

  1. Year where a student enrolls in first grade.

  2. There is a fourth school category (delegated administration) where schools have a mechanism of funding by charters, with a basal funding to public property schools whose administration is delegated to private agents (Browne, 2017). Nevertheless, since there are only 70 schools in this category (41,578 students in 2019, 1.4% of total same year students) and notorious differences with respect to the ownership, funding, and administration of the schools, we decided to omit it from most of the reports in this article.

  3. https://www.agenciaeducacion.cl/simce/

References

Download references

Funding

We thank the support from ANID/PIA/Basal Funds for Centers of Excellence FB0003 and ANID-FONDEF IT17I0006 grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patricio Rodríguez.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Table 10

Table 10 Criteria used to score each student trajectory consistency

Appendix B List of variables used and their significance in the final model

Table 11

Table 11 (Only codes and contributions for the most relevant variables are shown)

Appendix C

Table 12

Table 12 Clusters description

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodríguez, P., Villanueva, A., Dombrovskaia, L. et al. A methodology to design, develop, and evaluate machine learning models for predicting dropout in school systems: the case of Chile. Educ Inf Technol 28, 10103–10149 (2023). https://doi.org/10.1007/s10639-022-11515-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-022-11515-5

Keywords