Abstract
Predicting students’ performance in advance could help assist the learning process; if “at-risk” students can be identified early on, educators can provide them with the necessary educational support. Despite this potential advantage, the technology for predicting students’ performance has not been widely used in education due to practical limitations. We propose a practical method to predict students’ performance in the educational environment using machine learning and explainable artificial intelligence (XAI) techniques. We conducted qualitative research to ascertain the perspectives of educational stakeholders. Twelve people, including educators, parents of K-12 students, and policymakers, participated in a focus group interview. The initial practical features were chosen based on the participants’ responses. Then, a final version of the practical features was selected through correlation analysis. In addition, to verify whether at-risk students could be distinguished using the selected features, we experimented with various machine learning algorithms: Logistic Regression, Decision Tree, Random Forest, Multi-Layer Perceptron, Support Vector Machine, XGBoost, LightGBM, VTC, and STC. As a result of the experiment, Logistic Regression showed the best overall performance. Finally, information intended to help each student was visually provided using the XAI technique.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education.
Aggarwal, D., Mittal, S., & Bali, V. (2021). Significance of non-academic parameters for predicting student performance using ensemble learning techniques. International Journal of System Dynamics Applications (IJSDA), 10(3), 38–49.
Agudo-Peregrina, Á. F., Iglesias-Pradas, S., Conde-González, M. Á., & Hernández-García, Á. (2014). Can we predict success from log data in VLEs? Classification of interactions for learning analytics and their relation with performance in VLE-supported F2F and online learning. Computers in Human Behavior, 31(1), 542–550. https://doi.org/10.1016/j.chb.2013.05.031
Ahmed, N. S., & Hikmat Sadiq, M. (2018). Clarify of the Random Forest Algorithm in an Educational Field. ICOASE 2018 - International Conference on Advanced Science and Engineering, 179–184. https://doi.org/10.1109/ICOASE.2018.8548804
Ahmed, S., Paul, R., & Hoque, A. S. M. L. (2003). Knowledge discovery from academic data using association rule mining. 2014 17th International Conference on Computer and Information Technology, ICCIT 2014, 314–319. https://doi.org/10.1109/ICCITechn.2014.7073107
Ajibade, S. S. M., Ahmad, N. B. B., & Shamsuddin, S. M. (2019). Educational data mining: enhancement of student performance model using ensemble methods. In IOP Conference Series: Materials Science and Engineering (vol. 551, no. 1, p. 012061). IOP Publishing.
Al-Barrak, M. A., & Al-Razgan, M. (2016). Predicting students final GPA using decision trees: A case study. International Journal of Information and Education Technology, 6(7), 528–533. https://doi.org/10.7763/ijiet.2016.v6.745
Al-Obeidat, F., Tubaishat, A., Dillon, A., & Shah, B. (2017). Analyzing students’ performance using multi-criteria classification. Cluster Computing, 21(1), 623–632. https://doi.org/10.1007/s10586-017-0967-4
Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’ performance prediction using machine learning techniques. Education Sciences, 11(9). https://doi.org/10.3390/educsci11090552
Amro, F., & Borup, J. (2019). Exploring blended teacher roles and obstacles to success when using personalized learning software. Journal of Online Learning Research, 5(3), 229–250.
Arbaugh, J. B. (2014). System, scholar or students? Which most influences online MBA course effectiveness? Journal of Computer Assisted Learning, 30(4), 349–362. https://doi.org/10.1111/jcal.12048
Atherton, M., Shah, M., Vazquez, J., Griffiths, Z., Jackson, B., & Burgess, C. (2017). Using learning analytics to assess student engagement and academic outcomes in open access enabling programmes. Open Learning: The Journal of Open, Distance and e-Learning, 32(2), 119–136.
Asan, O., Bayrak, A. E., & Choudhury, A. (2020). Artificial intelligence and human trust in healthcare: Focus on clinicians. Journal of Medical Internet Research, 22(6), 1–7. https://doi.org/10.2196/15154
Aydoğdu, Ş. (2020). Predicting student final performance using artificial neural networks in online learning environments. Education and Information Technologies, 25(3), 1913–1927. https://doi.org/10.1007/s10639-019-10053-x
Beer, C., Zlotkowski, E., & Hollander, E. L. (2011). Indicators of engagement. Higher Education and Democracy: Essays on Service-Learning and Civic Engagement, 9781439900, 285–302. https://doi.org/10.1007/978-1-4615-0885-4_3
Belgiu, M., & Drăgu, L. (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011
Bendikson, L., Hattie, J., & Robinson, V. (2011). Identifying the comparative academic performance of secondary schools. Journal of Educational Administration, 49(4), 433–449. https://doi.org/10.1108/09578231111146498
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
Cai, L., Ren, X., Fu, X., Peng, L., Gao, M., & Zeng, X. (2021). iEnhancer-XG: Interpretable sequence-based enhancers and their strength predictor. Bioinformatics, 37(8), 1060–1067.
Car, Z., Baressi Šegota, S., Anđelić, N., Lorencin, I., & Mrzljak, V. (2020). Modeling the Spread of COVID-19 Infection Using a Multilayer Perceptron. Computational and Mathematical Methods in Medicine, 2020. https://doi.org/10.1155/2020/5714714
Carvalho, D. V., Pereira, E. M., & Cardoso, J. S. (2019). Machine learning interpretability: A survey on methods and metrics. Electronics (Switzerland), 8(8), 1–34. https://doi.org/10.3390/electronics8080832
Cen, L., Ruta, D., Powell, L., Hirsch, B., & Ng, J. (2016). Quantitative approach to collaborative learning: Performance prediction, individual assessment, and group composition. International Journal of Computer-Supported Collaborative Learning, 11(2), 187–225. https://doi.org/10.1007/s11412-016-9234-6
Cerezo, R., Sánchez-Santillán, M., Paule-Ruiz, M. P., & Núñez, J. C. (2016). Students’ LMS interaction patterns and their relationship with achievement: A case study in higher education. Computers and Education, 96, 42–54. https://doi.org/10.1016/j.compedu.2016.02.006
Chalvatza, F., Karkalas, S., & Mavrikis, M. (2019). Communicating learning analytics: Stakeholder participation and early stage requirement analysis. CSEDU 2019 - Proceedings of the 11th International Conference on Computer Supported Education, 2(Csedu), 339–346. https://doi.org/10.5220/0007716503390346
Chaturvedi, R., & Ezeife, C. I. (2017). Predicting Student Performance in an ITS Using Task-Driven Features. IEEE CIT 2017 - 17th IEEE International Conference on Computer and Information Technology, 168–175. https://doi.org/10.1109/CIT.2017.34
Chaudhury, P., & Tripaty, H. K. (2017). An empirical study on attribute selection of student performance prediction model. International Journal of Learning Technology, 12(3), 241–252. https://doi.org/10.1504/IJLT.2017.088407
Chen, T., & He, T. (2015). Higgs boson discovery with boosted trees. In NIPS 2014 workshop on high-energy physics and machine learning (pp. 69–80). PMLR.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).
Chen, W., Brinton, C. G., Cao, D., Mason-Singh, A., Lu, C., & Chiang, M. (2019). Early detection prediction of learning outcomes in online short-courses via learning behaviors. IEEE Transactions on Learning Technologies, 12(1), 44–58. https://doi.org/10.1109/TLT.2018.2793193
Chitti, M., Chitti, P., & Jayabalan, M. (2020). Need for Interpretable Student Performance Prediction. Proceedings - International Conference on Developments in ESystems Engineering, DeSE, 2020-Decem, 269–272. https://doi.org/10.1109/DeSE51703.2020.9450735
Choi, S., Jang, Y., & Kim, H. (2022). Influence of pedagogical beliefs and perceived trust on teachers’ acceptance of educational artificial intelligence tools. International Journal of Human–Computer Interaction, 1–13.
Chou, C., Peng, H., & Chang, C. Y. (2010). The technical framework of interactive functions for course-management systems: Students’ perceptions, uses, and evaluations. Computers and Education, 55(3), 1004–1017. https://doi.org/10.1016/j.compedu.2010.04.011
Chounta, I. A., Bardone, E., Raudsep, A., & Pedaste, M. (2021). Exploring teachers’ perceptions of artificial intelligence as a tool to support their practice in Estonian K-12 education. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-021-00243-5
Clark, R., Kaw, A., Lou, Y., Scott, A., & Besterfield-Sacre, M. (2018). Evaluating blended and flipped instruction in numerical methods at multiple engineering schools. International Journal for the Scholarship of Teaching and Learning, 12(1), 1–16. https://doi.org/10.20429/ijsotl.2018.120111
Clow, D. (2013). An overview of learning analytics. Teaching in Higher Education, 18(6), 683–695. https://doi.org/10.1080/13562517.2013.827653
Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student performance from LMS data: A comparison of 17 blended courses using moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29. https://doi.org/10.1109/TLT.2016.2616312
Cortez, P., & Silva, A. (2008). Using data mining to predict secondary school student performance. 15th European Concurrent Engineering Conference 2008, ECEC 2008 - 5th Future Business Technology Conference, FUBUTEC 2008, 2003(2000), 5–12.
Costa, E. B., Fonseca, B., Santana, M. A., de Araújo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256. https://doi.org/10.1016/j.chb.2017.01.047
Das, A., & Rad, P. (2020). Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey. 1–24. http://arxiv.org/abs/2006.11371. Accessed 29 May 2022.
Dawson, S. P., Mcwilliam, E., & Tan, J. P. (2008). Teaching smarter: How mining ICT data can inform and improve learning and teaching practice. 221–230.
Dietz-Uhler, B., & Hurn, J. E. (2013). Using learning analytics to predict (and improve) student success: A faculty perspective. Journal of Interactive Online Learning, 12(1), 17–26.
Dinesh Kumar, A., Pandi Selvam, R., & Sathesh Kumar, K. (2018). Review on prediction algorithms in educational data mining. International Journal of Pure and Applied Mathematics, 118(Special Issue 8), 531–537.
Dogan, A., & Birant, D. (2019). A weighted majority voting ensemble approach for classification. In 2019 4th International Conference on Computer Science and Engineering (UBMK) (pp. 1–6). IEEE.
Dollinger, S. J., Matyja, A. M., & Huber, J. L. (2008). Which factors best account for academic success: Those which college students can control or those they cannot? Journal of Research in Personality, 42(4), 872–885. https://doi.org/10.1016/j.jrp.2007.11.007
Dong, X., Yu, Z., Cao, W., Shi, Y., & Ma, Q. (2020). A survey on ensemble learning. Frontiers of Computer Science, 14(2), 241–258.
Downing, K. J., Lam, T., Kwong, T., Downing, W., & Chan, S. (2007). Creating interaction in online learning: A case study. Alt-J, 15(3), 201–215. https://doi.org/10.1080/09687760701673592
Duffy, T., & Cunningham, D. (1996). Constructivism: Implications for the design and delivery of instruction. Handbook of Research on Educational Communications and Technology, 171(4), 1–31.
Dvorak, T., & Jia, M. (2016). Do the Timeliness, Regularity, and Intensity of Online Work Habits Predict Academic Performance? Journal of Learning Analytics, 3(3), 318–330. https://learning-analytics.info/index.php/JLA/article/view/4676. Accessed 29 May 2022.
El Aissaoui, O., El Alami El Madani, Y., Oughdir, L., Dakkak, A., & El Allioui, Y. (2020). A Multiple Linear Regression-Based Approach to Predict Student Performance. In Advances in Intelligent Systems and Computing: Vol. 1102 AISC (Issue January). Springer International Publishing. https://doi.org/10.1007/978-3-030-36653-7_2
Felisoni, D. D., & Godoi, A. S. (2018). Cell phone usage and academic performance: An experiment. Computers and Education, 117(March 2017), 175–187. https://doi.org/10.1016/j.compedu.2017.10.006
Ferguson, R., Brasher, A., Clow, D., Cooper, A., Hillaire, G., Mittelmeier, J., Rienties, B., Ullmann, T., & Vuorikari, R. (2016). Research Evidence on the Use of Learning Analytics - Implications for Education Policy. In A European Framework for Action on Learning Analytics (Issue 2016). https://doi.org/10.2791/955210
Gašević, D., Dawson, S., Rogers, T., & Gasevic, D. (2016). Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. Internet and Higher Education, 28, 68–84. https://doi.org/10.1016/j.iheduc.2015.10.002
Gowda, S. M., Baker, R. S., Corbett, A. T., & Rossi, L. M. (2013). Towards automatically detecting whether student learning is shallow. International Journal of Artificial Intelligence in Education, 23(1–4), 50–70. https://doi.org/10.1007/s40593-013-0006-4
Grivokostopoulou, F., Perikos, I., & Hatzilygeroudis, I. (2015). Utilizing semantic web technologies and data mining techniques to analyze students learning and predict final performance. Proceedings of IEEE International Conference on Teaching, Assessment and Learning for Engineering: Learning for the Future Now, TALE 2014, December, 488–494. https://doi.org/10.1109/TALE.2014.7062571
Han, M., Tong, M., Chen, M., Liu, J., & Liu, C. (2017, July). Application of ensemble algorithm in students' performance prediction. In 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) (pp. 735–740). IEEE.
Haridas, M., Gutjahr, G., Raman, R., Ramaraju, R., & Nedungadi, P. (2020). Predicting school performance and early risk of failure from an intelligent tutoring system. Education and Information Technologies. https://doi.org/10.1007/s10639-020-10144-0
Hasan, M. M., Schaduangrat, N., Basith, S., Lee, G., Shoombuatong, W., & Manavalan, B. (2020). HLPpred-Fuse: Improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics, 36(11), 3350–3356.
Hasan, R., & Chu, C. (2022). Noise in Datasets: What Are the Impacts on Classification Performance?[Noise in Datasets: What Are the Impacts on Classification Performance?]. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods.
Hassan, H., Ahmad, N. B., & Anuar, S. (2020). Improved students’ performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining. In Journal of Physics: Conference Series (vol. 1529, no. 5, p. 052041). IOP Publishing.
Helle, L., Nivala, M., Kronqvist, P., Ericsson, K. A., & Lehtinen, E. (2010). Do prior knowledge, personality and visual perceptual ability predict student performance in microscopic pathology? Medical Education, 44(6), 621–629. https://doi.org/10.1111/j.1365-2923.2010.03625.x
Hossain, S., Bushra, J., Sarma, D., Sen, S., & Taher, M. (2019). Student Performance under Uncertainty. December, 18–20.
Hu, Y. H., Lo, C. L., & Shih, S. P. (2014). Developing early warning systems to predict students’ online learning performance. Computers in Human Behavior, 36, 469–478. https://doi.org/10.1016/j.chb.2014.04.002
Imran, M., Latif, S., Mehmood, D., & Shah, M. S. (2019). Student Academic Performance Prediction using Supervised Learning Techniques. International Journal of Emerging Technologies in Learning, 14(14).
Ingale, N. V., Sivakkumar, M., & Namdeo, V. (2021). Survey on prediction system for student academic performance using educational data. Mining Turkish Journal of Computer and Mathematics Education, 12(13), 363–369.
Jayaprakash, S. M., Moody, E. W., Lauría, E. J. M., Regan, J. R., & Baron, J. D. (2014). Early Alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47. https://doi.org/10.18608/jla.2014.11.3
Jin, D., Lu, Y., Qin, J., Cheng, Z., & Mao, Z. (2020). SwiftIDS: Real-time intrusion detection system based on LightGBM and parallel intrusion detection mechanism. Computers & Security, 97, 101984.
Jishan, S. T., Rashu, R. I., Haque, N., & Rahman, R. M. (2015). Improving accuracy of students’ final grade prediction model using optimal equal width binning and synthetic minority over-sampling technique. Decision Analytics, 2(1), 1–25. https://doi.org/10.1186/s40165-014-0010-2
Joksimović, S., Gašević, D., Loughin, T. M., Kovanović, V., & Hatala, M. (2015). Learning at distance: Effects of interaction traces on academic achievement. Computers and Education, 87, 204–217. https://doi.org/10.1016/j.compedu.2015.07.002
Kadoic, N., & Oreski, D. (2018). Analysis of student behavior and success based on logs in Moodle. 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2018 - Proceedings, 654–659. https://doi.org/10.23919/MIPRO.2018.8400123
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30.
Kim, B., Khanna, R., & Koyejo, O. O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. Advances in neural information processing systems, 29.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai (Vol. 14, No. 2, pp. 1137–1145).
Kondo, N., Okubo, M., & Hatanaka, T. (2017). Early Detection of At-Risk Students Using Machine Learning Based on LMS Log Data. Proceedings - 2017 6th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2017, 198–201. https://doi.org/10.1109/IIAI-AAI.2017.51
Kotsiantis, S., Pierrakeas, C., & Pintelas, P. (2004). Predicting students’ performance in distance learning using machine learning techniques. Applied Artificial Intelligence, 18(5), 411–426. https://doi.org/10.1080/08839510490442058
Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. Internet and Higher Education, 27, 74–89. https://doi.org/10.1016/j.iheduc.2015.06.002
Krueger, R. A. (1994). Focus Groups: A Practical Guide For Applied Research Description: Title: Focus Groups: A Practical Guide for Applied Research.
Kumari, P., Jain, P. K., & Pamula, R. (2018). An efficient use of ensemble methods to predict students academic performance. In 2018 4th International Conference on Recent Advances in Information Technology (RAIT) (pp. 1–6). IEEE.
Lauría, E. J. M., Baron, J. D., Devireddy, M., Sundararaju, V., & Jayaprakash, S. M. (2012). Mining academic data to improve college student retention. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge - LAK ’12, May, 139. http://dl.acm.org/citation.cfm?doid=2330601.2330637. Accessed 29 May 2022.
Lemay, D. J., & Doleck, T. (2020). Grade prediction of weekly assignments in MOOCS: Mining video-viewing behavior. Education and Information Technologies, 25(2), 1333–1342. https://doi.org/10.1007/s10639-019-10022-4
Liu, P., Chen, P., Yuan, Y., Zhang, W., & He, X. (2020). A teaching assistant system for big data analysis. Journal of Physics: Conference Series, 1678(1). https://doi.org/10.1088/1742-6596/1678/1/012090
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 32(2), 4768–4777). https://doi.org/10.1016/j.inffus.2019.12.012%0A10.1016/j.ophtha.2018.11.016
Lundberg, S. M., Nair, B., Vavilala, M. S., Horibe, M., Eisses, M. J., Adams, T., Liston, D. E., Low, D. K. W., Newman, S. F., Kim, J., & Lee, S. I. (2018). Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2(10), 749–760. https://doi.org/10.1038/s41551-018-0304-0
Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers and Education, 54(2), 588–599. https://doi.org/10.1016/j.compedu.2009.09.008
Mandrekar, J. N. (2010). Receiver operating characteristic curve in diagnostic test assessment. Journal of Thoracic Oncology, 5(9), 1315–1316. https://doi.org/10.1097/JTO.0b013e3181ec173d
Marbouti, F., Diefes-Dux, H. A., & Madhavan, K. (2016). Models for early prediction of at-risk students in a course using standards-based grading. Computers and Education, 103, 1–15. https://doi.org/10.1016/j.compedu.2016.09.005
Mengash, H. A. (2020). Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access, 8, 55462–55470. https://doi.org/10.1109/ACCESS.2020.2981905
Meyer, J., & Land, R. (2005). Overcoming barriers to student understanding. Taylor & Francis Limited.
Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. https://doi.org/10.1016/j.artint.2018.07.007
Moghaddam, D. D., Rahmati, O., Panahi, M., Tiefenbacher, J., Darabi, H., Haghizadeh, A., ..., & Bui, D. T. (2020). The effect of sample size on different machine learning models for groundwater potential mapping in mountain bedrock aquifers. Catena, 187, 104421.
Moore, M. G. (1989). Editorial: Three types of interaction. American Journal of Distance Education, 3(2), 1–7. https://doi.org/10.1080/08923648909526659
Morris, L. V., Finnegan, C., & Wu, S. S. (2005). Tracking student behavior, persistence, and achievement in online courses. Internet and Higher Education, 8(3), 221–231. https://doi.org/10.1016/j.iheduc.2005.06.009
Motlagh, M. N., Fehresti, S., Talebi, Z., & Hesari, M. (2013). The study of the teacher’s role and student interaction in e-learning process. 4th International Conference on E-Learning and e-Teaching, ICELET 2013, 130–134. https://doi.org/10.1109/ICELET.2013.6681659
Muñoz-Organero, M., Muñoz-Merino, P. J., & Kloos, C. D. (2010). Student behavior and interaction patterns with an lms as motivation predictors in e-learning settings. IEEE Transactions on Education, 53(3), 463–470. https://doi.org/10.1109/TE.2009.2027433
Nandi, D., Hamilton, M., Harland, J., & Warburton, G. (2011). How active are students in online discussion forums? Conferences in Research and Practice in Information Technology Series, 114, 125–133.
Nikian, S., Nor, F. M., & Aziz, M. A. (2013). Malaysian teachers’ perception of applying technology in the classroom. Procedia - Social and Behavioral Sciences, 103, 621–627. https://doi.org/10.1016/j.sbspro.2013.10.380
O’Connell, K. A., Wostl, E., Crosslin, M., Berry, T. L., & Grover, J. P. (2018). Student ability best predicts final grade in a college algebra course. Journal of Learning Analytics, 5(3), 167–181. https://doi.org/10.18608/jla.2018.53.11
Onwuegbuzie, A. J., Dickinson, W. B., Leech, N. L., & Zoran, A. G. (2009). A qualitative framework for collecting and analyzing data in focus group research. International Journal of Qualitative Methods, 8(3), 1–21. https://doi.org/10.1177/160940690900800301
Pal, M., & Foody, G. M. (2010). Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience and Remote Sensing, 48(5), 2297–2307. https://doi.org/10.1109/TGRS.2009.2039484
Pal, S., & Chaurasia, V. (2017). Is alcohol affect higher education students performance: searching and predicting pattern using data mining algorithms. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2991214
Pandey, M., & Taruna, S. (2014). A comparative study of ensemble methods for students' performance modeling. International Journal of Computer Applications, 103(8).
Ping, T. A. (2011). Students’ interaction in the online learning management systems: A comparative study of undergraduate and postgraduate courses. Asian Association of Open Universities Journal, 6(1), 59–73. https://doi.org/10.1108/aaouj-06-01-2011-b007
Qin, F., Li, K., & Yan, J. (2020). Understanding user trust in artificial intelligence-based educational systems: Evidence from China. British Journal of Educational Technology, 51(5), 1693–1710. https://doi.org/10.1111/bjet.12994
Rabiee, F. (2004). Focus-group interview and data analysis. Proceedings of the Nutrition Society, 63(4), 655–660. https://doi.org/10.1079/pns2004399
Rafaeli, S., Ravid, G., Keren, O., Ben-Hanoch, R., Yarchi-Cohen, A., Goshen, Y., Shabtai, I., & Bar-, T. (n.d.). OnLine, Web Based Learning Environment for an Information Systems course: Access logs, Linearity and Performance.
Ragab, M., Abdel Aal, A. M., Jifri, A. O., & Omran, N. F. (2021). Enhancement of predicting students performance model using ensemble approaches and educational data mining techniques. Wireless Communications and Mobile Computing, 2021.
Ramesh, V., Parkavi, P., & Ramar, K. (2013). Predicting student performance: A statistical and data mining approach. International Journal of Computer Applications, 63(8), 35–39. https://doi.org/10.5120/10489-5242
Rienties, B., Toetenel, L., & Bryan, A. (2015). “Scaling up” learning design: Impact of learning design activities on LMS behavior and performance. ACM International Conference Proceeding Series, 16–20-Marc, 315–319. https://doi.org/10.1145/2723576.2723600
Riestra-González, M., Paule-Ruíz, M. del P., & Ortin, F. (2021). Massive LMS log data analysis for the early prediction of course-agnostic student performance. Computers and Education, 163(December 2020). https://doi.org/10.1016/j.compedu.2020.104108
Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532
Saadatmand, M., Uhlin, L., Hedberg, M., Åbjörnsson, L., & Kvarnström, M. (2017). Examining Learners’ interaction in an open online course through the community of inquiry framework. European Journal of Open, Distance and E-Learning, 20(1), 61–79. https://doi.org/10.1515/eurodl-2017-0004
Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.
Sathe, M. T., & Adamuthe, A. C. (2021). Comparative study of supervised algorithms for prediction of students' performance. International Journal of Modern Education & Computer Science, 13(1).
Schell, J., Lukoff, B., & Alvarado, C. (2014). Using early warning signs to predict academic risk in interactive, blended teaching environments. Internet Learning, 3(2). https://doi.org/10.18278/il.3.2.5
Shin, D. (2021). The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI. International Journal of Human Computer Studies, 146(April 2020), 102551. https://doi.org/10.1016/j.ijhcs.2020.102551
Shum, S. J. B., & Luckin, R. (2019). Learning analytics and ai: Politics, pedagogy and practices. British Journal of Educational Technology, 50(6), 2785–2793.
Singh, B. K., Verma, K., & Thoke, A. S. (2015). Investigations on impact of feature normalization techniques on classifier's performance in breast tumor classification. International Journal of Computer Applications, 116(19).
Song, Y. Y., & Lu, Y. (2015). Decision tree methods: applications for classification and prediction. Shanghai Archives of Psychiatry, 27(2), 130–135. https://doi.org/10.11919/j.issn.1002-0829.215044
Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica, 24(1), 12–18. https://doi.org/10.11613/BM.2014.003
Stapel, M., Zheng, Z., & Pinkwart, N. (2016). An Ensemble Method to Predict Student Performance in an Online Math Learning Environment. International Educational Data Mining Society.
Stemler, S. (2001). An overview of content analysis. Practical Assessment, Research and Evaluation, 7(17), 2000–2001. https://doi.org/10.1362/146934703771910080
Stojić, A., Stanić, N., Vuković, G., Stanišić, S., Perišić, M., Šoštarić, A., & Lazić, L. (2019). Explainable extreme gradient boosting tree-based prediction of toluene, ethylbenzene and xylene wet deposition. Science of the Total Environment, 653, 140–147. https://doi.org/10.1016/j.scitotenv.2018.10.368
Tanner, T., & Toivonen, H. (2010). Predicting and preventing student failure – using the k-nearest neighbour method to predict student performance in an online course environment. International Journal of Learning Technology, 5(4), 356. https://doi.org/10.1504/ijlt.2010.038772
Tawfik, A. A., Reeves, T. D., Stich, A. E., Gill, A., Hong, C., McDade, J., Pillutla, V. S., Zhou, X., & Giabbanelli, P. J. (2017). The nature and level of learner–learner interaction in a chemistry massive open online course (MOOC). Journal of Computing in Higher Education, 29(3), 411–431. https://doi.org/10.1007/s12528-017-9135-3
Tempelaar, D. T., Rienties, B., & Giesbers, B. (2015). In search for the most informative data for feedback generation: Learning analytics in a data-rich context. Computers in Human Behavior, 47, 157–167. https://doi.org/10.1016/j.chb.2014.05.038
Turabieh, H. (2019). Hybrid machine learning classifiers to predict student performance. 2019 2nd International Conference on New Trends in Computing Sciences, ICTCS 2019 - Proceedings. https://doi.org/10.1109/ICTCS.2019.8923093
Umer, R., Mathrani, A., Susnjak, T., & Lim, S. (2019). Mining activity log data to predict student's outcome in a course. In proceedings of the 2019 international conference on big data and education (pp. 52–58).
Vij, M. (2017). Teacher as an Agent or Barrier to Integrated Technology. Research Review International Journal of Multidisciplinary, 3085(04), 42–46.
Vonkova, H., Papajoanu, O., Stipek, J., & Kralova, K. (2021). Identifying the accuracy of and exaggeration in self-reports of ICT knowledge among different groups of students: The use of the overclaiming technique. Computers and Education, 164(May 2020), 104112. https://doi.org/10.1016/j.compedu.2020.104112
Wang, Y., Pan, Q., Liu, X., & Ding, Y. (2022). ET-MSF: A model stacking framework to identify electron transport proteins. Frontiers in Bioscience (landmark Edition), 27(1), 12–12.
Widyahastuti, F., & Tjhin, V. U. (2017). Predicting students performance in final examination using linear regression and multilayer perceptron. Proceedings - 2017 10th International Conference on Human System Interactions, HSI 2017, 188–192. https://doi.org/10.1109/HSI.2017.8005026
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.
Xia, J. C., Fielder, J., & Siragusa, L. (2013). Achieving better peer interaction in online discussion forums: A reflective practitioner case study. Issues in Educational Research, 23(1), 97–113.
Yağci, A., & Çevik, M. (2019). Prediction of academic achievements of vocational and technical high school (VTS) students in science courses through artificial neural networks (comparison of Turkey and Malaysia). Education and Information Technologies, 24(5), 2741–2761. https://doi.org/10.1007/s10639-019-09885-4
Yan, L., & Liu, Y. (2020). An ensemble prediction model for potential student recommendation using machine learning. Symmetry, 12(5), 728.
Yousafzai, B. K., Hayat, M., & Afzal, S. (2020). Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student. Education and Information Technologies, 25(6), 4677–4697. https://doi.org/10.1007/s10639-020-10189-1
Yu, L. C., Lee, C. W., Pan, H. I., Chou, C. Y., Chao, P. Y., Chen, Z. H., Tseng, S. F., Chan, C. L., & Lai, K. R. (2018). Improving early prediction of academic failure using sentiment analysis on self-evaluated comments. Journal of Computer Assisted Learning, 34(4), 358–365. https://doi.org/10.1111/jcal.12247
Yu, T., & Jo, I. H. (2014). Educational technology approach toward learning analytics: Relationship between student online behavior and learning performance in higher education. ACM International Conference Proceeding Series, 269–270. https://doi.org/10.1145/2567574.2567594
Yu, R., Li, Q., Fischer, C., Doroudi, S., & Xu, D. (2020a). Towards accurate and fair prediction of college success: Evaluating different sources of student data. Proceedings of the 13th International Conference on Educational Data Mining (EDM 2020a), Edm, 292–301.
Yu, X., Zhou, J., Zhao, M., Yi, C., Duan, Q., Zhou, W., & Li, J. (2020b). Exploiting XG boost for predicting enhancer-promoter interactions. Current Bioinformatics, 15(9), 1036–1045.
Zacharis, N. Z. (2015). A multivariate approach to predicting student outcomes in web-enabled blended learning courses. Internet and Higher Education, 27, 44–53. https://doi.org/10.1016/j.iheduc.2015.05.002
Zhang, Y., Wang, Y., Gao, M., Ma, Q., Zhao, J., Zhang, R., ..., & Huang, L. (2019). A predictive data feature exploration-based air quality prediction approach. IEEE Access, 7, 30732-30743.
Zydney, J. M., Denoyelles, A., & Kyeong-JuSeo, K. (2012). Creating a community of inquiry in online environments: An exploratory study on the effect of a protocol on interactions within asynchronous discussions. Computers and Education, 58(1), 77–87. https://doi.org/10.1016/j.compedu.2011.07.009
Acknowledgements
This work was supported by the National Research Foundation (NRF), Korea, under the project BK21 FOUR.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix. Features and related literature
Appendix. Features and related literature
Table 8 shows the features that affect student performance and studies in which the features are used. The abbreviated form of the feature names was partially modified to clarify the meaning of each feature (for example, “Medu” was changed to “MotherEducation”). If the feature names used in each study were different for features with the same meaning, they were merged under one name (for example, “low income” and “income” were merged under “income”).
Rights and permissions
About this article
Cite this article
Jang, Y., Choi, S., Jung, H. et al. Practical early prediction of students’ performance using machine learning and eXplainable AI. Educ Inf Technol 27, 12855–12889 (2022). https://doi.org/10.1007/s10639-022-11120-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-022-11120-6