Abstract
Increasing productivity in educational systems is of great imprtance. Researchers are keen to predict the academic performance of students; this is done to enhance the overall productivity of educational system by effectively identifying students whose performance is below average. This universal concern has been combined with data science leading to the creation of an interdisciplinary research area called Educational Data Mining. One of the recent issues which has been addressed by researchers is training generalizable models from different aspects such as gender, major, geography and etc. Therefore, in this research we use machine learning methods to predict student's performance, emphasizing on training generalizable models from geographical aspect. For this purpose, a questionnaire containing 37 questions was designed, through which 536 answers were collected, including 111 international and 425 domestic answers. According to the literature, student performance is mostly determined based on the GPA (grade point average) of the entire course. In this research, information about the GPA of respondents in undergraduate and graduate courses was collected in the form of three classes. After a final review of the models employed in previous studies, the main models selected and used for classification purposes included SVM, CNN, Adaboost, RF, SVM, and DT. Feature selection is performed using XGBoost, random forest, as well as SVML1. The main issue investigated in this study is the generalizability of the models trained on domestic (iranian) data and tested on international data (non-iranian). Experimental results show that the best models trained with specific dataset collected in this research had generalizability comparing to base models’ outcomes which were trained and tested on domestic data. Meanwhile, Random forest and CNN models shows the best performance with the average of accuracy and F-score of 73.5 and 68.5, respectively.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated and/or analyzed during the current study are not publicly available due to students’ privacy considerations but are available from the corresponding author on reasonable request.
References
Abe, K. (2019). Data mining and machine learning applications for educational big data in the university. In 2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech) (pp. 350–355). Fukuoka, Japan. https://doi.org/10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00071
Abu Zohair, L. M. (2019). ‘Prediction of Student’s performance by modelling small dataset size’, International Journal of Educational Technology in Higher Education, 16(1). https://doi.org/10.1186/s41239-019-0160-3.
Ahmed, S. T., Al-Hamdani, R. S., & Croock, M. S. (2019). EDM preprocessing and hybrid feature selection for improving classification accuracy. Journal of Theoretical and Applied Information Technology, 97(1), 279–289.
Aldowah, H., Al-Samarraie, H. and Fauzy, W. M. (2019). ‘Educational data mining and learning analytics for 21st century higher education: A review and synthesis’, Telematics and Informatics. Elsevier, pp. 13–49. https://doi.org/10.1016/j.tele.2019.01.007.
Alloghani, M. Al-Jumeily, D., Hussain, A., Aljaaf, A. J., Mustafina, J., & Petrov, E. (2018). Application of Machine Learning on Student Data for the Appraisal of Academic Performance. In 2018 11th International Conference on Developments in eSystems Engineering (DeSE) (pp. 157–162). Cambridge, UK. https://doi.org/10.1109/DeSE.2018.00038
De Almeida Neto, F. A., Castro, A. (2015). ‘Elicited and mined rules for dropout prevention in online courses’, in Proceedings - Frontiers in Education Conference, FIE. https://doi.org/10.1109/FIE.2015.7344048.
Alyahyan, E. and Düştegör, D. (2020) ‘Predicting academic success in higher education: literature review and best practices’, International Journal of Educational Technology in Higher Education, 17(1). https://doi.org/10.1186/s41239-020-0177-7.
Anderson, G., & Arsenault, N. (2005). Fundamentals of educational research. Taylor & Francis Ltd. https://doi.org/10.4324/9780203978221
Aulck, L. et al. (2019). ‘Mining university registrar records to predict first-year undergraduate attrition’, EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, (Edm), pp. 9–18.
Bayer, J. et al. (2012). ‘Predicting drop-out from social behaviour of students’, Proceedings of the 5th International Conference on Educational Data Mining, EDM 2012, (Dm), pp. 103–109.
Cambruzzi, W., Rigo, S. J., & Barbosa, J. L. V. (2015). Dropout prediction and reduction in distance education courses with the learning analytics multitrail approach. Journal of Universal Computer Science, 21(1), 23–47.
Cortez, P., & Silva, A. (2008). (2008) ‘Using data mining to predict secondary school student performance’, 15th European Concurrent Engineering Conference 2008, ECEC 2008–5th Future Business Technology Conference. FUBUTEC, 2008, 5–12.
Daud, A. et al. (2017) . ‘Predicting student performance using advanced learning analytics’, 26th International World Wide Web Conference 2017, WWW 2017 Companion, pp. 415–421. https://doi.org/10.1145/3041021.3054164.
de-La-Fuente-Valentín, L. et al. (2015). A visual analytics method for score estimation in learning courses. Journal of Universal Computer Science, 21(1), 134–155. https://doi.org/10.3217/jucs-021-01-0134
Doan, T. N. and Sahebi, S. (2019). ‘Rank-based tensor factorization for student performance prediction’, EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, (Edm), pp. 288–293.
Hellas, A. et al. (2018) ‘Predicting academic performance: A systematic literature review’, Annual Conference on Innovation and Technology in Computer Science Education, ITiCSE, pp. 175–199. https://doi.org/10.1145/3293881.3295783.
Hutt, S. et al. (2019). ‘Evaluating fairness and generalizability in models predicting on-time graduation from college applications’, EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, (Edm), pp. 79–88.
Jadraque, G. R. (2020). Algorithmic Analytics for Outcomes-based Tertiary Education Performance Assessment. International Journal of Advanced Trends in Computer Science and Engineering, 9(1), 766–773. https://doi.org/10.30534/ijatcse/2020/109912020
Jain, S., Todwal, V. and Jat, S. C. (2019). ‘Student Performance Assessment and Prediction based on Machine Learning’, 21(16).
Jiménez-Gómez, M. Á. et al. (2015). ‘Discovering clues to avoid middle school failure at early stages’, ACM International Conference Proceeding Series, 16–20-Marc, pp. 300–304. https://doi.org/10.1145/2723576.2723597.
Kumar, B. and Pal, S. (2011) ‘Mining Educational Data to Analyze Students Performance’, International Journal of Advanced Computer Science and Applications, 2(6). https://doi.org/10.14569/ijacsa.2011.020609.
Lykourentzou, I., et al. (2009). Dropout prediction in e-learning courses through the combination of machine learning techniques. Computers and Education, 53(3), 950–965. https://doi.org/10.1016/j.compedu.2009.05.010
Márquez-Vera, C., et al. (2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33(1), 107–124. https://doi.org/10.1111/exsy.12135
Márquez-Vera, C., Romero Morales, C., & Ventura Soto, S. (2013). Predicting school failure and dropout by using data mining techniques. Revista Iberoamericana De Tecnologias Del Aprendizaje, 8(1), 7–14. https://doi.org/10.1109/RITA.2013.2244695
Namoun, A., & Alshanqiti, A. (2021). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences (switzerland), 11(1), 1–28. https://doi.org/10.3390/app11010237
Pandey, S. and Karypis, G. (2019). ‘A self-attentive model for knowledge tracing’, in EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, pp. 384–389.
Pradeep, A., Das, S. and Kizhekkethottam, J. J. (2015) ‘Students dropout factor prediction using EDM techniques’, Proceedings of the IEEE International Conference on Soft-Computing and Network Security, ICSNS 2015.https://doi.org/10.1109/ICSNS.2015.7292372.
Ren, Z. et al. (2019) ‘Grade prediction based on cumulative knowledge and co-taken courses’, EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, (Edm), pp. 158–167.
Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), 1–21. https://doi.org/10.1002/widm.1355
Sawant, T., Pol, U., & Patankar, P. (2019). Educational data mining prediction model using decision tree algorithm. Journal of Emerging Technologies and Innovative Research (JETIR), 6(5), 306–313.
Shelke, M. S., Deshmukh, P. R., & Shandilya, P. V. K. (2017). A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique. International Journal of Recent Trends in Engineering and Research, 3(4), 444–449. https://doi.org/10.23883/ijrter.2017.3168.0uwxm
Whitley, L. A. (2018) ‘Educational Data Mining and its Uses to Predict the Most Prosperous Learning Environment’, ProQuest Dissertations and Theses, p. 51.
Wirth, R. and Hipp, J. (2000) ‘CRISP-DM: towards a standard process model for data mining. Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, 29–39’, Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, (24959), pp. 29–39.
Zhou, M., et al. (2010). Sequential pattern analysis of learning logs: Methodology and applications. Chapman & Hall. https://doi.org/10.1201/b10274
Acknowledgements
Authors of this research are truly thankful of people anonymously participated in responding to the questionnaire provided in this research.
Author information
Authors and Affiliations
Contributions
The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Limited data sample
40 rows of the dataset is included in zip file.
Code file
Coding file is included in zip file. Code explanations and considerations are written in coding file as notes.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Parhizkar, A., Tejeddin, G. & Khatibi, T. Student performance prediction using datamining classification algorithms: Evaluating generalizability of models from geographical aspect. Educ Inf Technol 28, 14167–14185 (2023). https://doi.org/10.1007/s10639-022-11560-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-022-11560-0