Machine learning models for predicting school dropout rates in rural contexts
DOI:
https://doi.org/10.64325/IJIP.v3i1.56Keywords:
school dropout, machine learning, rural education, predictive models, systematic review, artificial intelligenceAbstract
This research systematizes the academic evidence on the use of machine learning models to predict school dropout rates in basic and secondary education in rural contexts (2015-2025). A systematic review was conducted following PRISMA 2020 guidelines, and Scopus, Web of Science, SciELO, and Redalyc were consulted. After applying eligibility criteria focused on rurality and vulnerable populations, 25 articles were selected. The results show the predominance of ensemble models, especially Random Forest and XGBoost, due to their performance with scarce and unbalanced tabular data. A shift in the most influential predictors is also observed: structural and geographic variables—home-school distance, maternal educational level, and agricultural cycles—outperform prior academic performance. In parallel, the explanability of models (XAI) emerges as a key requirement for implementing Early Warning Systems that guide timely decisions. In summary, machine learning outperforms traditional statistical approaches for detecting dropout risk in rural areas. However, its effectiveness depends on integrating contextualized data and translating algorithmic results into pedagogical actions that are understandable for teachers, administrators, and local authorities. There are also reported concerns about the quality and availability of administrative data, as well as about ethics, privacy, and bias in rural communities.
Downloads
References
Álamo, E. M.-C. (2024). Análisis de estrategias innovadoras para retención estudiantil con inteligencia artificial: una perspectiva multidisciplinaria. European Public y Social Innovation Review, 9, 1. https://doi.org/10.31637/epsir-2024-440
Andrade-Girón, D., Sandivar-Rosas, J., Rodriguez, W. J. M., Susanibar-Ramirez, E., Toro-Dextre, E., Sánchez, J., Villarreal-Torres, H., y Ángeles-Morales, J. (2023). Predicting Student Dropout based on Machine Learning and Deep Learning: A Systematic Review [Review of Predicting Student Dropout based on Machine Learning and Deep Learning: A Systematic Review]. ICST Transactions on Scalable Information Systems. European Alliance for Innovation. https://doi.org/10.4108/eetsis.3586
Bayona-Rodríguez, H., Naranjo Quintero, G. M., López Guarín, C. E., Rodríguez De Luque, J., Duque Márquez, I., Angulo González, M. V., Alarcón Párraga, C. L., Moreno Segura, N. E., Quiceno, S. I., Ortiz Vera, A. M., Vásquez Rojas, E., Quiceno Forero, S. I., Daza Malagón, C., Vásquez Rojas, E., Ortiz Vera, A. M., Aristizábal Henao, I. D., Bolívar Guzmán, E., Garavito Mariño, J., Herrera Aguilar, Z. T., … Serrano Corredor, M. S. (2022). DeserciÃ3n escolar en Colombia: análisis, determinantes y política de acogida, bienestar y permanencia.
Bulut, O., Wongvorachan, T., He, S., y Lee, S. (2024a). Enhancing high-school dropout identification: a collaborative approach integrating human and machine insights. Discover Education, 3(1). https://doi.org/10.1007/s44217-024-00209-4
Caballero, R. P. (2023). La tradición, la ruptura y la continuidad de la educación rural en el espacio iberoamericano. Revista Boletín Redipe, 12(2), 114. https://doi.org/10.36260/rbr.v12i2.1937
Castillo-Peña, J. (2021). Expectativas y trayectorias educativas postsecundarias de jóvenes de territorios rurales en Chile. Una mirada desde el desarrollo humano. Revista Iberoamericana de Educación Superior, 127. https://doi.org/10.22201/iisue.20072872e.2021.34.983
Castrillón-Gómez, O. D., Sarache, W., y Ruiz-Herrera, S. (2020). Predicción de las principales variables que conllevan al abandono estudiantil por medio de técnicas de minería de datos. Formación Universitaria, 13(6), 217. https://doi.org/10.4067/s0718-50062020000600217
Comisión Económica para América Latina y el Caribe (CEPAL), Panorama Social de América Latina, 2020 (LC/PUB.2021/2-P/Rev.1), Santiago, 2021.
Cilleros, M. V. M., Sánchez-Prada, A., Álvarez, C. D., y Gómez, M. C. S. (2017). Valoración de un diseño educativo tecnológico para prevenir el abandono escolar. RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação, 23, 61. https://doi.org/10.17013/risti.23.61-77
Fitriana, S., Riniyanty, Laila, R., Pratama, S. A., y Lamasitudju, C. A. (2024). Prediksi Siswa Putus Sekolah Dan Keberhasilan Akademik Menggunakan Machine Learning. Indonesian Journal of Computer Science, 13(6). https://doi.org/10.33022/ijcs.v13i6.4453
Forero-Corba, W., y Bennásar, F. N. (2023). Técnicas y aplicaciones del Machine Learning e Inteligencia Artificial en educación: una revisión sistemática. RIED Revista Iberoamericana de Educación a Distancia, 27(1), 209. https://doi.org/10.5944/ried.27.1.37491
Franco, J. C. L., Segura, Y. C. R., y Taborda, L. M. A. (2020). Diseños curriculares e inteligencia social en zonas urbanas y rurales de Colombia. Cultura Educación y Sociedad, 12(1), 255. https://doi.org/10.17981/cultedusoc.12.1.2021.17
Galabay-Cajas, S. L., y Álvarez-Lozano, M. I. (2021). WhatsApp como estrategia educativa en pandemia: Una experiencia desde educación rural en Ecuador. CIENCIAMATRIA, 7(13), 397. https://doi.org/10.35381/cm.v7i13.497
Galván Mora, L. (2020). Educación rural en América Latina: escenarios, tendencias y horizontes de investigación. https://doi.org/10.24310/mgnmar.v1i2.8598
Gamboa-Cruzado, J., Alvarez-Cuellar, C. Y., Martinez-Medina, S., Chaparro, J. E. T., Damián, A. S., y Kong, M. P. R. (2023). Predicción de repitencias en estudiantes a nivel escolar usando Machine Learning: una revisión sistemática. Apuntes Universitarios, 13(2). https://doi.org/10.17162/au.v13i2.1438
Guzmán, A., Moreno, S. P. B., y Vitery, F. C. (2021). Dropout in Rural Higher Education: A Systematic Review [Review of Dropout in Rural Higher Education: A Systematic Review]. Frontiers in Education, 6. Frontiers Media. https://doi.org/10.3389/feduc.2021.727833
Haddaway, N. R., Collins, A. M., Coughlin, D., y Kirk, S. (2015). The Role of Google Scholar in Evidence Reviews and Its Applicability to Grey Literature Searching. PLOS ONE, 10(9), 0. https://doi.org/10.1371/journal.pone.0138237
Higgins, J. P. T., López-López, J. A., Becker, B. J., Davies, S. R., Dawson, S., Grimshaw, J. M., McGuinness, L. A., Moore, T. H. M., Rehfuess, E. A., Thomas, J., y Caldwell, D. M. (2019). Synthesising quantitative evidence in systematic reviews of complex health interventions. BMJ Global Health, 4, 0. https://doi.org/10.1136/bmjgh-2018-000858
Hooshyar, D., Šír, G., Yang, Y., Kikas, E., Hämäläinen, R., Kärkkäinen, T., Gašević, D., y Azevedo, R. (2025). Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2504.16148
Huatangari, L. Q., Jara, D. M., Alvarado, N., Pino, M. E. M., y Gamarra, O. A. V. (2020). Modelo para la estimación de la deserción estudiantil Awajún y Wampis empleando minería de datos. Revista de Ciencia y Tecnología, 34, 45. https://doi.org/10.36995/j.recyt.2020.34.006
Jiménez-Rogel, J. P., y Campoverde-Moscol, A. I. (2024). Desafíos y perspectivas de la educación rural: acceso, permanencia y proyección profesional de los estudiantes. MQRInvestigar, 8(2), 3242. https://doi.org/10.56048/mqr20225.8.2.2024.3242-3259
Khatun, Mst. R., Mim, M. A., Tasin, Md. M., y Hossain, Md. M. (2025). A hybrid framework of statistical, machine learning, and explainable AI methods for school dropout prediction. PLoS ONE, 20(9). https://doi.org/10.1371/journal.pone.0331917
Leal, A. Z. (2019). Buenas prácticas pedagógicas y proyectos pedagógicos productivos : una experiencia en las instituciones educativas oficiales rurales del departamento del Valle del Cauca (Colombia). Latinoamericana de Estudios Educativos, 15(2), 11. https://doi.org/10.17151/rlee.2019.15.2.2
Marcolino, M. S., Porto, T. R., Primo, T. T., Targino, R., Ramos, V. F. C., Queiroga, E. M., Muñoz, R., y Cechinel, C. (2025). Student dropout prediction through machine learning optimization: insights from moodle log data. Scientific Reports, 15(1), 9840. https://doi.org/10.1038/s41598-025-93918-1
Mariñas, V. A. O., Orosco, L. S., y Yóplac, S. A. A. (2022). Brecha digital y educación virtual en instituciones educativas rurales. LATAM Revista Latinoamericana de Ciencias Sociales y Humanidades, 3(2), 534. https://doi.org/10.56712/latam.v3i2.116
Melo, E. C., y Souza, F. S. H. de. (2023). Improving the prediction of school dropout with the support of the semi-supervised learning approach. iSys - Brazilian Journal of Information Systems, 16(1). https://doi.org/10.5753/isys.2023.2852
Mínguez-Martínez, A., Sood, K., y Mahto, R. (2024). Early Detection of At-Risk Students Using Machine Learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2412.09483
Montes, N., y Luna, F. (2024). Sistemas de alerta temprana: cómo acompañar trayectorias escolares inclusivas.
Moons, K. G. M., de Groot, J. A. H., Bouwmeester, W., Vergouwe, Y., Mallett, S., Altman, D. G., Reitsma, J. B., y Collins, G. S. (2014). Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLoS Medicine, 11(10), 0. https://doi.org/10.1371/journal.pmed.1001744
Nagy, M., y Molontay, R. (2023). Interpretable Dropout Prediction: Towards XAI-Based Personalized Intervention. International Journal of Artificial Intelligence in Education, 34(2), 274. https://doi.org/10.1007/s40593-023-00331-8
Niño, J. L. P., Guerrero, Ó. E. G., y Oliveros, D. C. L. (2024). Modelos de inteligencia artificial en minería de datos educativos para predecir la deserción en Educación Superior: una revisión integral. Tecnura, 28(82), 134. https://doi.org/10.14483/22487638.23670
Ocen, S., Katalihwa, M., y Mwanje, D. (2025). Predicting Primary School Student Dropout Risk: A Machine Learning Framework for Early Intervention. Journal of Intelligent Learning Systems and Applications, 17(4), 267. https://doi.org/10.4236/jilsa.2025.174017
Orozco, I. de la C., y Rubio, B. H. (2019). Asistencia y deserción escolar de la juventud indígena en Secundaria. Revista Electrónica de Investigación Educativa, 21, 1. https://doi.org/10.24320/redie.2019.21.e24.1973
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T., Mulrow, C. D., Shamseer, L., Tetzlaff, J., Akl, E. A., Brennan, S., Chou, R., Glanville, J., Grimshaw, J., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E., Mayo‐Wilson, E., McDonald, S., … Moher, D. (2021). The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ, 372. https://doi.org/10.1136/bmj.n71
Parra-Sánchez, J. S., Torres, I. D., y Meriño, C. Y. M. de. (2023). Factores explicativos de la deserción universitaria abordados mediante inteligencia artificial. Revista Electrónica de Investigación Educativa, 25, 1. https://doi.org/10.24320/redie.2023.25.e18.4455
Psyridou, M., Prezja, F., Torppa, M., Lerkkanen, M., Poikkeus, A., y Vasalampi, K. (2024). Machine learning predicts upper secondary education dropout as early as the end of primary school. Scientific Reports, 14(1), 12956. https://doi.org/10.1038/s41598-024-63629-0
Quiñones Huatangari, L., y Carrasco Vega, Y. L. (2020). Rendimiento académico empleando minería de datos. https://doi.org/10.48082/espacios-a20v41n44p17
Restrepo, M. G. (2022). Feature Reduction for Classification with Mixed Data : An Algorithmic Approach. Research Portal Denmark, 143. https://local.forskningsportal.dk/local/dki-cgi/ws/cris-link?src=cbsyid=cbs-44a772b1-ff34-41c5-8ce9-f7a585a7e6d8yti=Feature%20Reduction%20for%20Classification%20with%20Mixed%20Data%20%3A%20An%20Algorithmic%20Approach
Retamal, S. C., Casas, R. T., Hernández, A. E., y Clavería, K. A. (2021). Repensando la educación Rural.
Rincón, A. G., Moreno, S. P. B., Cosenz, F., y Vitery, F. C. (2023). Prevention and Mitigation of Rural Higher Education Dropout in Colombia: A Dynamic Performance Management Approach. F1000Research, 12, 497. https://doi.org/10.12688/f1000research.132267.1
Rincón, A. G., Moreno, S. P. B., Vitery, F. C., y Segovia-García, N. (2022). Deserción en la Educación Superior Rural: Análisis de Causas desde el Pensamiento Sistémico. Qualitative Research in Education, 11(2), 118. https://doi.org/10.17583/qre.10048
Rodríguez, P., Villanueva, A., Dombrovskaia, L., y Valenzuela, J. P. (2023). A methodology to design, develop, and evaluate machine learning models for predicting dropout in school systems: the case of Chile. Education and Information Technologies, 28(8), 10103. https://doi.org/10.1007/s10639-022-11515-5
Roman, M. (2016). Factores Asociados al Abandono y la Deserción Escolar en América Latina: Una mirada de conjunto. REICE Revista Iberoamericana Sobre Calidad Eficacia y Cambio En Educación, 11(2). https://doi.org/10.15366/reice2013.11.2.002
Romero, C., y Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Data Mining and Knowledge Discovery, 10(3). https://doi.org/10.1002/widm.1355
Salce, F. (2020). Deserción escolar y calidad de los docentes en Chile. Revista de Análisis Económico, 35(2), 135. https://doi.org/10.4067/s0718-88702020000200135
Tete, M. F., Sousa, M. de M., Santana, T. S. D., y Silva, S. F. (2022). Aplicação de métodos preditivos em evasão no ensino superior: Uma revisão sistemática da literatura. Education Policy Analysis Archives, 30. https://doi.org/10.14507/epaa.30.6845
Tong, T., y Li, Z. (2024). Predicting Learning Achievement Using Ensemble Learning with Result Explanation. Research Square (Research Square). https://doi.org/10.21203/rs.3.rs-4674228/v1
Torres, A. S. L. (2025). Desigualdad en la preparación académica de estudiantes rurales para la Educación Superior: Políticas de Inclusión Educativa en Ecuador. https://doi.org/10.35537/10915/181437
UNESCO. (2021). Los sistemas de alerta temprana para prevenir el abandono escolar en América Latina y el Caribe.
Villarrasa-Sapiña, I., García‐Massó, X., Liébana, E., y Monfort-Torres, G. (2024). Academic achievement prediction in secondary education by decision tree analysis. Educación XX1, 27(1), 253. https://doi.org/10.5944/educxx1.33351
Wolff, B., Mahoney, F., Lohiniva, A. L., y Corkum, M. (2019). Collecting and Analyzing Qualitative Data. The CDC Field Epidemiology Manual, 213. https://doi.org/10.1093/oso/9780190933692.003.0010
Zambrano-Trujillo, S. L., Lara, F. L., y Cruz, Y. C. de la. (2022). Maestras de multigrado en Esmeraldas (Ecuador): una realidad silenciada. Alteridad, 17(2), 304. https://doi.org/10.17163/alt.v17n2.2022.10
Zomaya, A. Y. (2016). Foreword. In Elsevier eBooks. Elsevier BV. https://doi.org/10.1016/b978-0-12-804535-0.09995-0
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Oscar Iván Montiel Petro (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish in this journal agree to the following terms:
a) Authors retain copyright and grant the journal the right of first publication, with the work licensed under a Creative Commons Attribution 4.0 license, which allows third parties to use the published work provided they attribute the authorship of the work and acknowledge its first publication in this journal.
b) Authors may enter into other independent and additional contractual agreements for the non-exclusive distribution of the version of the article published in this journal (e.g., including it in an institutional repository or publishing it in a book) provided they clearly indicate that the work was first published in this journal.
c) Authors are permitted and encouraged to share their work online (e.g., in institutional repositories or on personal websites) before and during the manuscript submission process, as this can lead to productive exchanges and greater and faster citation of the published work.