Skip to main content

Advertisement

Log in

Student performance prediction using datamining classification algorithms: Evaluating generalizability of models from geographical aspect

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

Increasing productivity in educational systems is of great imprtance. Researchers are keen to predict the academic performance of students; this is done to enhance the overall productivity of educational system by effectively identifying students whose performance is below average. This universal concern has been combined with data science leading to the creation of an interdisciplinary research area called Educational Data Mining. One of the recent issues which has been addressed by researchers is training generalizable models from different aspects such as gender, major, geography and etc. Therefore, in this research we use machine learning methods to predict student's performance, emphasizing on training generalizable models from geographical aspect. For this purpose, a questionnaire containing 37 questions was designed, through which 536 answers were collected, including 111 international and 425 domestic answers. According to the literature, student performance is mostly determined based on the GPA (grade point average) of the entire course. In this research, information about the GPA of respondents in undergraduate and graduate courses was collected in the form of three classes. After a final review of the models employed in previous studies, the main models selected and used for classification purposes included SVM, CNN, Adaboost, RF, SVM, and DT. Feature selection is performed using XGBoost, random forest, as well as SVML1. The main issue investigated in this study is the generalizability of the models trained on domestic (iranian) data and tested on international data (non-iranian). Experimental results show that the best models trained with specific dataset collected in this research had generalizability comparing to base models’ outcomes which were trained and tested on domestic data. Meanwhile, Random forest and CNN models shows the best performance with the average of accuracy and F-score of 73.5 and 68.5, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The datasets generated and/or analyzed during the current study are not publicly available due to students’ privacy considerations but are available from the corresponding author on reasonable request.

References

  • Abe, K. (2019). Data mining and machine learning applications for educational big data in the university. In 2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech) (pp. 350–355). Fukuoka, Japan. https://doi.org/10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00071

  • Abu Zohair, L. M. (2019). ‘Prediction of Student’s performance by modelling small dataset size’, International Journal of Educational Technology in Higher Education, 16(1). https://doi.org/10.1186/s41239-019-0160-3.

  • Ahmed, S. T., Al-Hamdani, R. S., & Croock, M. S. (2019). EDM preprocessing and hybrid feature selection for improving classification accuracy. Journal of Theoretical and Applied Information Technology, 97(1), 279–289.

    Google Scholar 

  • Aldowah, H., Al-Samarraie, H. and Fauzy, W. M. (2019). ‘Educational data mining and learning analytics for 21st century higher education: A review and synthesis’, Telematics and Informatics. Elsevier, pp. 13–49. https://doi.org/10.1016/j.tele.2019.01.007.

  • Alloghani, M. Al-Jumeily, D., Hussain, A., Aljaaf, A. J., Mustafina, J., & Petrov, E. (2018). Application of Machine Learning on Student Data for the Appraisal of Academic Performance. In 2018 11th International Conference on Developments in eSystems Engineering (DeSE) (pp. 157–162). Cambridge, UK. https://doi.org/10.1109/DeSE.2018.00038

  • De Almeida Neto, F. A., Castro, A. (2015). ‘Elicited and mined rules for dropout prevention in online courses’, in Proceedings - Frontiers in Education Conference, FIE. https://doi.org/10.1109/FIE.2015.7344048.

  • Alyahyan, E. and Düştegör, D. (2020) ‘Predicting academic success in higher education: literature review and best practices’, International Journal of Educational Technology in Higher Education, 17(1). https://doi.org/10.1186/s41239-020-0177-7.

  • Anderson, G., & Arsenault, N. (2005). Fundamentals of educational research. Taylor & Francis Ltd. https://doi.org/10.4324/9780203978221

    Book  Google Scholar 

  • Aulck, L. et al. (2019). ‘Mining university registrar records to predict first-year undergraduate attrition’, EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, (Edm), pp. 9–18.

  • Bayer, J. et al. (2012). ‘Predicting drop-out from social behaviour of students’, Proceedings of the 5th International Conference on Educational Data Mining, EDM 2012, (Dm), pp. 103–109.

  • Cambruzzi, W., Rigo, S. J., & Barbosa, J. L. V. (2015). Dropout prediction and reduction in distance education courses with the learning analytics multitrail approach. Journal of Universal Computer Science, 21(1), 23–47.

    Google Scholar 

  • Cortez, P., & Silva, A. (2008). (2008) ‘Using data mining to predict secondary school student performance’, 15th European Concurrent Engineering Conference 2008, ECEC 2008–5th Future Business Technology Conference. FUBUTEC, 2008, 5–12.

    Google Scholar 

  • Daud, A. et al. (2017) . ‘Predicting student performance using advanced learning analytics’, 26th International World Wide Web Conference 2017, WWW 2017 Companion, pp. 415–421. https://doi.org/10.1145/3041021.3054164.

  • de-La-Fuente-Valentín, L. et al. (2015). A visual analytics method for score estimation in learning courses. Journal of Universal Computer Science, 21(1), 134–155. https://doi.org/10.3217/jucs-021-01-0134

  • Doan, T. N. and Sahebi, S. (2019). ‘Rank-based tensor factorization for student performance prediction’, EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, (Edm), pp. 288–293.

  • Hellas, A. et al. (2018) ‘Predicting academic performance: A systematic literature review’, Annual Conference on Innovation and Technology in Computer Science Education, ITiCSE, pp. 175–199. https://doi.org/10.1145/3293881.3295783.

  • Hutt, S. et al. (2019). ‘Evaluating fairness and generalizability in models predicting on-time graduation from college applications’, EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, (Edm), pp. 79–88.

  • Jadraque, G. R. (2020). Algorithmic Analytics for Outcomes-based Tertiary Education Performance Assessment. International Journal of Advanced Trends in Computer Science and Engineering, 9(1), 766–773. https://doi.org/10.30534/ijatcse/2020/109912020

    Article  Google Scholar 

  • Jain, S., Todwal, V. and Jat, S. C. (2019). ‘Student Performance Assessment and Prediction based on Machine Learning’, 21(16).

  • Jiménez-Gómez, M. Á. et al. (2015). ‘Discovering clues to avoid middle school failure at early stages’, ACM International Conference Proceeding Series, 16–20-Marc, pp. 300–304. https://doi.org/10.1145/2723576.2723597.

  • Kumar, B. and Pal, S. (2011) ‘Mining Educational Data to Analyze Students Performance’, International Journal of Advanced Computer Science and Applications, 2(6). https://doi.org/10.14569/ijacsa.2011.020609.

  • Lykourentzou, I., et al. (2009). Dropout prediction in e-learning courses through the combination of machine learning techniques. Computers and Education, 53(3), 950–965. https://doi.org/10.1016/j.compedu.2009.05.010

    Article  Google Scholar 

  • Márquez-Vera, C., et al. (2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33(1), 107–124. https://doi.org/10.1111/exsy.12135

    Article  Google Scholar 

  • Márquez-Vera, C., Romero Morales, C., & Ventura Soto, S. (2013). Predicting school failure and dropout by using data mining techniques. Revista Iberoamericana De Tecnologias Del Aprendizaje, 8(1), 7–14. https://doi.org/10.1109/RITA.2013.2244695

    Article  Google Scholar 

  • Namoun, A., & Alshanqiti, A. (2021). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences (switzerland), 11(1), 1–28. https://doi.org/10.3390/app11010237

    Article  Google Scholar 

  • Pandey, S. and Karypis, G. (2019). ‘A self-attentive model for knowledge tracing’, in EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, pp. 384–389.

  • Pradeep, A., Das, S. and Kizhekkethottam, J. J. (2015) ‘Students dropout factor prediction using EDM techniques’, Proceedings of the IEEE International Conference on Soft-Computing and Network Security, ICSNS 2015.https://doi.org/10.1109/ICSNS.2015.7292372.

  • Ren, Z. et al. (2019) ‘Grade prediction based on cumulative knowledge and co-taken courses’, EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining, (Edm), pp. 158–167.

  • Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), 1–21. https://doi.org/10.1002/widm.1355

    Article  Google Scholar 

  • Sawant, T., Pol, U., & Patankar, P. (2019). Educational data mining prediction model using decision tree algorithm. Journal of Emerging Technologies and Innovative Research (JETIR), 6(5), 306–313.

    Google Scholar 

  • Shelke, M. S., Deshmukh, P. R., & Shandilya, P. V. K. (2017). A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique. International Journal of Recent Trends in Engineering and Research, 3(4), 444–449. https://doi.org/10.23883/ijrter.2017.3168.0uwxm

    Article  Google Scholar 

  • Whitley, L. A. (2018) ‘Educational Data Mining and its Uses to Predict the Most Prosperous Learning Environment’, ProQuest Dissertations and Theses, p. 51.

  • Wirth, R. and Hipp, J. (2000) ‘CRISP-DM: towards a standard process model for data mining. Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, 29–39’, Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, (24959), pp. 29–39.

  • Zhou, M., et al. (2010). Sequential pattern analysis of learning logs: Methodology and applications. Chapman & Hall. https://doi.org/10.1201/b10274

    Book  Google Scholar 

Download references

Acknowledgements

Authors of this research are truly thankful of people anonymously participated in responding to the questionnaire provided in this research.

Author information

Authors and Affiliations

Authors

Contributions

The author read and approved the final manuscript.

Corresponding author

Correspondence to Amirmohammad Parhizkar.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Limited data sample

40 rows of the dataset is included in zip file.

Code file

Coding file is included in zip file. Code explanations and considerations are written in coding file as notes.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 34 KB)

Supplementary file2 (XLSX 15 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parhizkar, A., Tejeddin, G. & Khatibi, T. Student performance prediction using datamining classification algorithms: Evaluating generalizability of models from geographical aspect. Educ Inf Technol 28, 14167–14185 (2023). https://doi.org/10.1007/s10639-022-11560-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-022-11560-0

Keywords

Navigation