Abstract
Early identification of vulnerable students who are prone to drop-out is critical for devising effective educational retention strategies. Based on the Activity Theory, we undertake this challenge by considering students’ online activities as a useful predictor of their academic performance. Specifically, six artificial intelligence and related prediction models in individual and ensemble structures for tackling classification and multi-objective optimization tasks pertaining to early prediction of students’ performance are presented. A real database comprising online learning activities of 2544 students over 2 years in 84 science, engineering, and technology courses from an open distance education institution is used for evaluation. Comparing with other studies in the literature, the huge numbers of students and courses involved in this study pose a great challenge, due to increase in complexity of the problem and data dimensionality. The empirical results reveal statistically significant improvements of the ensemble-based models as compared with individual models in prediction of students’ performance. Implications of the results are analyzed and discussed from the Activity Theory perspective.
Similar content being viewed by others
References
Agudo-Peregrina ÁF, Iglesias-Pradas S, Conde-González MÁ, Hernández-García Á (2014) Can we predict success from log data in vles? classification of interactions for learning analytics and their relation with performance in vle-supported f2f and online learning. Comput Human Behav 31:542–550
Allen DK, Karanasios S, Norman A (2014) Information sharing and interoperability: the case of major incident management. Eur J Inf Syst 23(4):418–432
Allen IE, Seaman J (2013) Changing course: ten years of tracking online education in the United States. ERIC
Arel I, Rose DC, Karnowski TP (2010) Deep machine learning: a new frontier in artificial intelligence research [research frontier]. IEEE Comput Intell Mag 5(4):13–18
Barab SA, Barnett M, Yamagata-Lynch L, Squire K, Keating T (2002) Using activity theory to understand the systemic tensions characterizing a technology-rich introductory astronomy course. Mind C Act 9(2):76–107
Beauchamp C, Jazvac-Martek M, McAlpine L (2009) Studying doctoral education: using activity theory to shape methodological tools. Innov Edu Teach Int 46(3):265–277
Bereiter C, Scardamalia M (2014) Knowledge building and knowledge creation: theory, pedagogy, and technology. In: Sawyer K (ed) Cambridge handbook of the learning sciences. Cambridge University Press, Cambridge, pp 397–417
Beume N, Naujoks B, Emmerich M (2007) Sms-emoa: multiobjective selection based on dominated hypervolume. Eur J Oper Res 181(3):1653–1669
Bozkurt A, Akgün-Özbek E, Zawacki-Richter O (2017) Trends and patterns in massive open online courses: review and content analysis of research on moocs (2008–2015). Int Rev Res Open Distrib Learn 18(5):118–147
Cabiati E (2015) Teaching and learning: an exchange of knowledge in the university among students, service users, and professors. Eur J Soc Work 19(2):247–262
Cerezo R, Sánchez-Santillán M, Paule-Ruiz MP, Núñez JC (2016) Students‘ lms interaction patterns and their relationship with achievement: a case study in higher education. Comput Educ 96:42–54
Coello CAC (2017) Recent results and open problems in evolutionary multiobjective optimization. In: International conference on theory and practice of natural computing, Springer, pp. 3–21
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Diver P, Martinez I (2015) Moocs as a massive research laboratory: opportunities and challenges. Distance Educ 36(1):5–25
Engestrom Y (1999) Activity theory and individual and social transformation i y engestrom, r. miettinen & rl. punamaki (red): perspectivies on activity theory. Cambridge University Press, Cambridge
Greenland S (2011) Using log data to investigate the impact of (a) synchronous learning tools on lms interaction. In: Proceedings of the Australasian society for computers in learning in tertiary education (ASCILITE) Ccnference, Hobart, Australia
Hall P (1992) The bootstrap and edgeworth expansion, chap. 3, pp. 1–35, 83–106. Springer-Verlag, New York
Hong HY, Chen B, Chai CS (2016) Exploring the development of college students‘ epistemic views during their knowledge building activities. Comput Educ 98:1–13
Hsieh YH, Lin YC, Hou HT (2013) Exploring the role of flow experience, learning performance and potential behavior clusters in elementary students‘ game-based learning. Interact Learn Environ 24(1):178–193
Karanasios S, Allen D (2013) Ict for development in the context of the closure of chernobyl nuclear power plant: an activity theory perspective. Inf Syst J 23(4):287–306
Kaya IE (2019) Artificial neural networks as a decision support tool in curriculum development. Int J Artif Intell Tools 28(4):1940004
Kazanidis I, Theodosiou T, Petasakis I, Valsamidis S (2014) Online courses assessment through measuring and archetyping of usage data. Interact Learn Environ 24(3):472–486
Khalfallah J, Ben Hadj Slama J (2018) The effect of emotional analysis on the improvement of experimental e-learning systems. Comput Appl Eng Educ 0(0):1–16. https://doi.org/10.1002/cae.22075
Kostopoulos G, Kotsiantis S, Fazakis N, Koutsonikos G, Pierrakeas C (2019) A semi-supervised regression algorithm for grade prediction of students in distance learning courses. Int J Artif Intell Tools 28(4):1940001
Kostopoulos G, Kotsiantis S, Pintelas P (2015) Predicting student performance in distance higher education using semi-supervised techniques. In: Model and data engineering, Springer, pp. 259–270
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection, vol 1. MIT press
Liaw SS, Huang HM (2014) Investigating learner attitudes toward e-books as learning tools: based on the activity theory approach. Interact Learn Environ 24(3):625–643
Liu L, Vernica R, Hassan T, Damera Venkata N (2019) Using text mining for personalization and recommendation for an enriched hybrid learning experience. Comput Intell 35(2):336–370. https://doi.org/10.1111/coin.12201
Lopez MI, Luna J, Romero C, Ventura S (2012) Classification via clustering for predicting final marks based on student participation in forums. International Educational Data Mining Society
Macfadyen LP, Dawson S (2010) Mining lms data to develop an “early warning system’’ for educators: a proof of concept. Comput Educ 54(2):588–599
Márquez-Vera C, Cano A, Romero C, Noaman AYM, Mousa Fardoun H, Ventura S (2015) Early dropout prediction using data mining: a case study with high school students. Expert Syst 33(1):107–124
Márquez-Vera C, Cano A, Romero C, Ventura S (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl intell 38(3):315–330
McAndrew P, Scanlon E et al (2013) Open learning at a distance: lessons for struggling moocs. Science 342(6165):1450–1451
Menai MEB, Alhunitah H, Al-Salman H (2018) Swarm intelligence to solve the curriculum sequencing problem. Comput Appl Eng Educ 26(5):1393–1404. https://doi.org/10.1002/cae.22046
Ou J, Zheng J, Ruan G, Hu Y, Zou J, Li M, Yang S, Tan X (2019) A pareto-based evolutionary algorithm using decomposition and truncation for dynamic multi-objective optimization. Appl Soft Comput 85:105673
Pal M, Bandyopadhyay S (2019) Esoea: ensemble of single objective evolutionary algorithms for many-objective optimization. Swarm Evol Comput 50:100511
Panzarasa P, Kujawski B, Hammond EJ, Michael Roberts C (2016) Temporal patterns and dynamics of e-learning usage in medical education. Educ Technol Res Dev 64(1):13–35. https://doi.org/10.1007/s11423-015-9407-4
Pianta RC, Hamre BK (2009) Conceptualization, measurement, and improvement of classroom processes: standardized observation can leverage capacity. Educ Res 38(2):109–119
Poza-Lujan JL, Calafate CT, Posadas-Yagüe JL, Cano JC (2016) Assessing the impact of continuous evaluation strategies: tradeoff between student performance and instructor effort. IEEE Trans Educ 59(1):17–23
Qu B, Liang JJ, Zhu Y, Suganthan PN (2019) Solving dynamic economic emission dispatch problem considering wind power by multi-objective differential evolution with ensemble of selection method. Nat Comput 18(4):695–703
Qu BY, Suganthan PN (2011) Constrained multi-objective optimization algorithm with an ensemble of constraint handling methods. Eng Opt 43(4):403–416
Romero C, Espejo PG, Zafra A, Romero JR, Ventura S (2013) Web usage mining for predicting final marks of students that use moodle courses. Comput Appl Eng Educ 21(1):135–146
Romero C, González P, Ventura S, del Jesús MJ, Herrera F (2009) Evolutionary algorithms for subgroup discovery in e-learning: a practical application using moodle data. Expert Syst Appl 36(2):1632–1644
Romero C, López MI, Luna JM, Ventura S (2013) Predicting students‘ final performance from participation in on-line discussion forums. Comput Educ 68:458–472
Sanchez Nigenda R, Maya Padrón C, Martínez-Salazar I, Torres-Guerrero F (2018) Design and evaluation of planning and mathematical models for generating learning paths. Comput Intell 34(3):821–838. https://doi.org/10.1111/coin.12134
Spinuzzi C (2008) Network: theorizing knowledge work in telecommunications. Cambridge University Press
Strang KD (2016) Do the critical success factors from learning analytics predict student outcomes? J Educ Technol Syst 44(3):273–299
Tan CJ, Lim CP, Cheah YN (2013) A modified micro genetic algorithm for undertaking multi-objective optimization problems. J Intell Fuzzy Syst 24(3):483–495
Tan CJ, Lim CP, Cheah YN (2014) A multi-objective evolutionary algorithm-based ensemble optimizer for feature selection and classification with neural network models. Neurocomputing 125:217–228
Tan CJ, Neoh SC, Lim CP, Hanoun S, Wong WP, Loo CK, Zhang L, Nahavandi S (2019) Application of an evolutionary algorithm-based ensemble model to job-shop scheduling. J Intell Manuf 30(2):879–890
Tanabe R, Ishibuchi H (2020) An easy-to-use real-world multi-objective optimization problem suite. Appl Soft Comput 89:106078
Wang F, Liao F, Li Y, Yan X, Chen X (2021) An ensemble learning based multi-objective evolutionary algorithm for the dynamic vehicle routing problem with time windows. Comput Ind Eng 154:107131
Wiens PD, Hessberg K, LoCasale-Crouch J, DeCoster J (2013) Using a standardized video-based assessment in a university teacher education program to examine preservice teachers knowledge related to effective teaching. Teach Teach Educ 33:24–33
Willging PA, Johnson SD (2009) Factors that influence students‘ decision to dropout of online courses. J Asynchronous Learn Netw 13(3):115–127
Wolff A, Zdrahal Z, Herrmannova D, Knoth P (2014) Predicting student performance from combined data sources. In: Educational data mining, Springer, pp. 175–202
Wu G, Shen X, Li H, Chen H, Lin A, Suganthan PN (2018) Ensemble of differential evolution variants. Inf Sci 423:172–186
Xing W, Guo R, Petakovic E, Goggins S (2015) Participation-based student final performance prediction model through interpretable genetic programming: integrating learning analytics, educational data mining and theory. Comput Human Behav 47:168–181
Zacharis NZ (2015) A multivariate approach to predicting student outcomes in web-enabled blended learning courses. Internet High Educ 27:44–53
Zhang N, Han Y, Crespo RG, Martínez OS (2020) Physical education teaching for saving energy in basketball sports athletics using hidden markov and motion model. Comput Intell n/a(n/a):1–16. https://doi.org/10.1111/coin.12334
Zhang Q, Li H (2007) Moea/d: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731
Zhang YH, Gong YJ, Gu TL, Zhang J (2019) Ensemble mating selection in evolutionary many-objective search. Appl Soft Comput 76:294–312
Zhao SZ, Suganthan PN, Zhang Q (2012) Decomposition-based multiobjective evolutionary algorithm with an ensemble of neighborhood sizes. IEEE Trans Evol Comput 16(3):442–446
Zheng W, Bai Y, Che H (2018) A computer-assisted instructional method based on machine learning in software testing class. Comput Appl Eng Educ 26(5):1150–1158. https://doi.org/10.1002/cae.21962
Zitzler E, Künzli S (2004) Indicator-based selection in multiobjective search. In: International conference on parallel problem solving from nature, Springer, pp. 832–842
Zitzler E, Laumanns M, Thiele L (2001) Spea2: Improving the strength pareto evolutionary algorithm. TIK-report 103
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Human and animal rights
This article does not contain any studies with animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tan, C.J., Lim, T.Y., Liew, T.K. et al. An intelligent tool for early drop-out prediction of distance learning students. Soft Comput 26, 5901–5917 (2022). https://doi.org/10.1007/s00500-021-06604-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06604-5