Abstract
This study aims to employ the supervised machine learning algorithms to examine factors that negatively impacted academic performance among college students on probation (underperforming students). We used the Knowledge Discovery in Databases (KDD) methodology on a sample of N = 6514 college students spanning 11 years (from 2009 to 2019) provided by a major public university in Oman. We used the Information Gain (InfoGain) algorithm to select the most effective features and ensemble methods to compare the accuracy with more robust algorithms, including Logit Boost, Vote, and Bagging. The algorithms were evaluated based on the performance evaluation metrics such as accuracy, precision, recall, F-measure, and ROC curve, and then validated using 10-folds cross-validation. The study revealed that the main identified factors affecting student academic achievement include study duration in the university and previous performance in secondary school. Based on the experimental results, these features were consistently ranked as the top factors that negatively impacted academic performance. The study also indicated that gender, estimated graduation year, cohort, and academic specialization significantly contributed to whether a student was under probation. Domain experts and other students were involved in verifying some of the results. The theoretical and practical implications of this study are discussed.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
References
Abdul-Wahab, S. A., Salem, N. M., Yetilmezsoy, K., & Fadlallah, S. O. (2019). Students’ reluctance to attend Office hours: Reasons and suggested solutions. Journal of Educational and Psychological Studies [JEPS], 13(4), 715–732.
Akbari, A., Ng, L., & Solnik, B. (2021). Drivers of economic and financial integration: A machine learning approach. Journal of Empirical Finance, 61, 82–102.
Al-Busaidi, A. S., Dauletova, V., & Al-Wahaibi, I. (2022). The role of excessive social media content generation, attention seeking, and individual differences on the fear of missing out: a multiple mediation model. Behaviour & Information Technology, 1–21.
Al-Emran, M., Al-Nuaimi, M. N., & Arpaci, I. (2022). Towards a wearable education: Understanding the determinants affecting students’ adoption of wearable technologies using machine learning algorithms.Education and Information Technologies,1–20.
Al Hamdi, S. S. N., & Edakkalayil, L. A. (2022). Measuring Students’ Performance in Face To Face and Online Learning-An Empirical Evidence From Oman in the Pre and During the Covid-19 Pandemic Period. Proceedings of the fourth international conference on teaching, learning and Education, Berlin, Germany, 11–13 March 2022.
AlHarthi, H., Kadhim, A., et al. (2011). Predicting the difficulties faced by students living outside the university campus in light of some demographic variables. Journal of Qualitative Educational Research, 18(3), 306–430.
Al-Kindi, I., & Al-Khanjari, Z. (2020, August). A Novel Architecture of SQU SMART LMS: The New Horizon for SMART City in Oman. In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 751–756). IEEE.
Al Muqarshi, A. (2022). Outsourcing, national diversity and transience: the reality of social identity in an ELT context in Omani higher education. International Journal of Qualitative Studies in Education, 1–17.
Al-Mahrouqia, R., & Karadsheh, M. A. (2016). Sultan Qaboos University students reasons of being under Observation. Humanities and social sciences, 43(3), 2343–2360.
Al-Sharafi, M. A., Al-Emran, M., Iranmanesh, M., Al-Qaysi, N., Iahad, N. A., & Arpaci, I. (2022). Understanding the impact of knowledge management factors on the sustainable use of AI-based chatbots for educational purposes using a hybrid SEM-ANN approach.Interactive Learning Environments,1–20.
AlGhanboosi, S., & Kadhim, A. (2004). Problems of Academic Supervision at Sultan Qaboos University from Professors and students perspectives. Journal of Education, 10(2), 39–75.
Anil, Ö., & Batdi, V. (2022). Use of augmented reality in science education: A mixed-methods research with the multi-complementary approach.Education and Information Technologies,1–39.
Belwal, R., Belwal, S., Sufian, A. B., & Al Badi, A. (2020). Project-based learning (PBL): Outcomes of students’ engagement in an external consultancy project in Oman. Education + Training, 63(3), 336–359.
Bowman, N. A., & Jang, N. (2022). What is the Purpose of Academic Probation? Its Substantial Negative Effects on Four-Year Graduation.Research in Higher Education,1–27.
Brownlee, J. (2018). August 3, 2020). A Gentle Introduction to k-fold Cross-Validation. Online resources.
Chugh, S., Gulistan, A., Ghosh, S., & Rahman, B. M. A. (2019). Machine learning approach for computing optical properties of a photonic crystal fiber. Optics express, 27(25), 36414–36425.
De Smedt, J., Deeva, G., & De Weerdt, J. (2019). Mining behavioral sequence constraints for classification. IEEE Transactions on Knowledge and Data Engineering, 32(6), 1130–1142.
Debuse, J. C. W., Iglesia, B., Howard, C. M., & Rayward-Smith, V. J. (2000). Building the KDD Roadmap: A methodology for Knowledge Discovery. Industrial Knowledge Management (pp. 179–196). London: Springer.
Du, X., Yang, J., Hung, J. L., & Shelton, B. (2020). Educational data mining: A systematic review of research and emerging trends. Information Discovery and Delivery, 48(4), 225–236.
Deeva, G., De, S. J., Saint-Pierre, C., Weber, R., & De, W. J. (2022). Predicting student performance using sequence classification with time-based windows,Expert Systems with Applications,209.
Gamal, B. (2020). Naïve Bayes Algorithm. Retrieved from https://medium.com/analytics-vidhya/na%C3%AFve-bayes-algorithm-5bf31e9032a2.
Gareth, J., Daniela, W., Trevor, H., & Robert, T. (2013). An introduction to statistical learning: with applications in R. Spinger, London, UK.
Hammad, W., & Al-Harthi, A. S. A. (2021). Aligning ‘international’standards with ‘national’educational leadership preparation needs: The case of a master’s programme in Oman. Internationalisation of Educational Administration and Leadership Curriculum (pp. 117–138). Bingley: Emerald Publishing Limited.
Hussain, S., Gaftandzhieva, S., Maniruzzaman, M., et al. (2021). Regression analysis of student academic performance using deep learning. Educ Inf Technol, 26, 783–798.
Hussain, M., Zhu, W., Zhang, W., Abidi, S. M. R., & Ali, S. (2019). Using machine learning to predict student difficulties from learning session data. Artificial Intelligence Review, 52(1), 381–407. https://doi.org/10.1007/s10462-018-9620-8.
Ibrahim, A., & Al-Barwani, T. A. (1993). A study of Omani secondary school Certificate Examination as a predictor of academic performance of Sultan Qaboos University. Research in college Teaching Practicum Research in Sultan Qaboos University, 1, 1–29.
Imran, M., Latif, S., Mehmood, D., & Shah, M. S. (2019). Student Academic Performance Prediction using Supervised Learning Techniques.International Journal of Emerging Technologies in Learning, 14(14)
Jalota, C., & Agrawal, R. (2019, February). Analysis of educational data mining using classification. In 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon) (pp. 243–247). IEEE.
Jiao, P., Ouyang, F., Zhang, Q., & Alavi, A. H. (2022). Artificial intelligence-enabled prediction model of student academic performance in online engineering education.Artificial Intelligence Review,1–24.
Jia, J. W. (2013). Machine learning algorithms and predictive models for undergraduate student retention at an HBCU (Doctoral dissertation, Bowie State University).
Kalavathy, R., Suresh, R. M., & Akhila, R. (2007, December). KDD and data mining. In 2007 IET-UK International Conference on Information and Communication Technology in Electrical Sciences (ICTES 2007) (pp. 1105–1110). IET.
Khan, F. (2019). Design Thinking humanizes Data Science & more. retrieved from https://medium.com/technicity/design-thinking-humanizes-data-science-more-5a666119c8b1.
Khan, A., & Ghosh, S. K. (2018). Data mining based analysis to explore the effect of teaching on student performance. Educ Inf Technol, 23, 1677–1697.
Khanna, L., Singh, S. N., & Alam, M. (2016, August). Educational data mining and its role in determining factors affecting students academic performance: A systematic review. In 2016 1st India international conference on information processing (IICIP) (pp. 1–7). IEEE.
Kulin, M., Kazaz, T., De Poorter, E., & Moerman, I. (2021). A survey on machine learning-based performance improvement of wireless networks: PHY, MAC and network layer. Electronics, 10(3), 318.
Kumar, R., & Sharma, A. (2017). Data mining in education: A review. International Journal of Mechanical Engineering and Information Technology, 5(1), 1843–1845.
Mariscal, G., Marban, O., & Fernandez, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review, 25(2), 137–166.
Maqableh, M., Jaradat, M., & Azzam, A. (2021). Exploring the determinants of students’ academic performance at university level: The mediating role of internet usage continuance intention. Educ Inf Technol, 26, 4003–4025.
Mellor, J. C., Stone, M. A., & Keane, J. (2018). Application of data mining to “big data” acquired in audiology: Principles and potential. Trends in hearing, 22, 233–250.
Mengash, H. A. (2020). Using data mining techniques to predict student performance to support decision making in university admission systems. Ieee Access : Practical Innovations, Open Solutions, 8, 55462–55470.
Moosa, S. M., & Ibrahim, A. M. (2008). Academic Observation as Perceived by students: Causes, reactions, and remedies. Journal of Higher Education in the Arab World, 11(2), 15–28.
Sarfra, M., Khawaja, K. F., & Ivascu, L. (2022). Factors affecting business school students’ performance during the COVID-19 pandemic: A moderated and mediated model,The International Journal of Management Education, 20(2).
Nahar, K., Shova, B. I., Ria, T., et al. (2021). Mining educational data to predict students performance. Educ Inf Technol, 26, 6051–6067.
Naicker, N., Adeliyi, T., & Wing, J. (2020). Linear support vector machines for prediction of student performance in school-based education. Mathematical Problems in Engineering, 2020.
Nilashi, M., Abumalloh, R. A., Zibarzani, M., et al. (2022). What factors influence students satisfaction in massive Open Online Courses? Findings from user-generated content using Educational Data Mining. Educ Inf Technol.
Oman (2040 vision). [online] Available: https://www.2040.om/wp-content/uploads/2019/02/190207-Preliminmy-Vision-Docunent-English.pdf.
Orriols-Puig, A., Martínez-López, F. J., Casillas, J., & Lee, N. (2013). Unsupervised KDD to creatively support managers’ decision making with fuzzy association rules: A distribution channel application. Industrial Marketing Management, 42(4), 532–543.
Powers, D. M. W. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. ArXiv abs/2010.16061.
Rahman, F. A., Desa, M. I., Wibowo, A., & Haris, N. A. (2014). Knowledge discovery database (KDD)-data mining application in transportation. Proceeding of the Electrical Engineering Computer Science and Informatics, 1(1), 116–119.
Rahman, F. A., Desa, M. I., & Wibowo, A. (2016, June). A review of kdd-data mining framework and its application in logistics and transportation. In The 7th International Conference on Networked Computing and Advanced Information Management (pp. 175–180). IEEE.
Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of database systems, 5, 532–538.
Rivas, A., Gonzalez-Briones, A., Hernandez, G., Prieto, J., & Chamoso, P. (2021). Artificial neural network analysis of the academic performance of students in virtual learning environments. Neurocomputing, 423, 713–720.
Sang (2022). K-Nearest Neighbor(KNN) Algorithm for Machine Learning. from https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning.
Sekeroglu, B., Abiyev, R., Ilhan, A., Arslan, M., & Idoko, J. B. (2021). Systematic literature review on machine learning and student performance prediction: Critical gaps and possible remedies. Applied Sciences, 11(22), 10907.
Shah, M. B., Kaistha, M., & Gupta, Y. (2019, November). Student Performance Assessment and Prediction System using Machine Learning. In 2019 4th International Conference on Information Systems and Computer Networks (ISCON) (pp. 386–390). IEEE.
Shoyukhi, M., Vossen, P. H., Ahmadi, A. H., Kafipour, R., & Beattie, K. A. (2022). Developing a comprehensive plagiarism assessment rubric. Educ Inf Technol. https://doi.org/10.1007/s10639-022-11365-1.
Shyamala, K. (2008). A study on data mining techniques using higher educational system for efficient prediction. Department of Computer Science, Mother Teresa Women’s University. Doctor of Philosophy in Computer Science.
Academic Procedure, S. Q. U. (2019, February 24). Retrived from https://www.squ.edu.om/Portals/14/Users/027/27/27/Academic%20Procedure%20Electronic%20Booklet%202019%20.pdf.
SQU Annual Statistics Book 2019–2020 (2020). accessed on May 2021 retrieved from https://www.squ.edu.om/Portals/0/DNNGalleryPro/uploads/2020/9/3/AnnualStatisticsBOOK_2019-2020_compressed.pdf.
Thonnard, O., & Dacier, M. (2008, December). Actionable knowledge discovery for threats intelligence support using a multi-dimensional data mining methodology. In 2008 ieee international conference on data mining workshops (pp. 154–163). IEEE.
Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & education, 143, 1–15.
Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289.
Tsiakmaki, M., Kostopoulos, G., Kotsiantis, S., & Ragos, O. (2020). Transfer learning from deep neural networks for predicting student performance. Applied Sciences, 10(6), 2145.
Vidiyala, R. (2020). Performance Metrics for Classification Machine Learning Problems. Retrieved from https://towardsdatascience.com/performance-metrics-for-classification-machine-learning-problems-97e7e774a007
Yang, S. (2019). An Introduction to Naïve Bayes Classifier: From theory to practice, learn underlying principles of Naïve Bayes. from https://towardsdatascience.com/introduction-to-na%C3%AFve-bayes-classifier-fa59e3e24aaf.
Waheed, H., Hassan, S. U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human behavior, 104, 106189.
Wang, X., Yu, X., Guo, L., Liu, F., & Xu, L. (2020). Student performance prediction with short-term sequential campus behaviors. Information, 11(4), 201.
Wang, X. (2011, July). A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In The 2011 International Joint Conference on Neural Networks (pp. 1293–1299). IEEE.
Wook, M., Yusof, Z. M., & Nazri, M. Z. A. (2017). Educational data mining acceptance among undergraduate students. Educ Inf Technol, 22, 1195–1216.
Yakubu, M. N., & Abubakar, A. M. (2022). Applying machine learning approach to predict students’ performance in higher educational institutions. Kybernetes, 51(2), 916–934. https://doi.org/10.1108/K-12-2020-0865.
Zaffar, M., Hashmani, M. A., Savita, K. S., Rizvi, S. S. H., & Rehman, M. (2020). Role of FCBF feature selection in educational data mining. Mehran University Research Journal Of Engineering & Technology, 39(4), 772–778.
Zhu, Y., Xu, S., Wang, W., Zhang, L., Liu, D., Liu, Z., & Xu, Y. (2022). The impact of Online and Offline Learning motivation on learning performance: the mediating role of positive academic emotion.Education and Information Technologies,1–18.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Al-Alawi, L., Al Shaqsi, J., Tarhini, A. et al. Using machine learning to predict factors affecting academic performance: the case of college students on academic probation. Educ Inf Technol 28, 12407–12432 (2023). https://doi.org/10.1007/s10639-023-11700-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-023-11700-0