Skip to main content

Advertisement

Log in

Using machine learning to predict factors affecting academic performance: the case of college students on academic probation

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

This study aims to employ the supervised machine learning algorithms to examine factors that negatively impacted academic performance among college students on probation (underperforming students). We used the Knowledge Discovery in Databases (KDD) methodology on a sample of N = 6514 college students spanning 11 years (from 2009 to 2019) provided by a major public university in Oman. We used the Information Gain (InfoGain) algorithm to select the most effective features and ensemble methods to compare the accuracy with more robust algorithms, including Logit Boost, Vote, and Bagging. The algorithms were evaluated based on the performance evaluation metrics such as accuracy, precision, recall, F-measure, and ROC curve, and then validated using 10-folds cross-validation. The study revealed that the main identified factors affecting student academic achievement include study duration in the university and previous performance in secondary school. Based on the experimental results, these features were consistently ranked as the top factors that negatively impacted academic performance. The study also indicated that gender, estimated graduation year, cohort, and academic specialization significantly contributed to whether a student was under probation. Domain experts and other students were involved in verifying some of the results. The theoretical and practical implications of this study are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

  • Abdul-Wahab, S. A., Salem, N. M., Yetilmezsoy, K., & Fadlallah, S. O. (2019). Students’ reluctance to attend Office hours: Reasons and suggested solutions. Journal of Educational and Psychological Studies [JEPS], 13(4), 715–732.

    Article  Google Scholar 

  • Akbari, A., Ng, L., & Solnik, B. (2021). Drivers of economic and financial integration: A machine learning approach. Journal of Empirical Finance, 61, 82–102.

    Article  Google Scholar 

  • Al-Busaidi, A. S., Dauletova, V., & Al-Wahaibi, I. (2022). The role of excessive social media content generation, attention seeking, and individual differences on the fear of missing out: a multiple mediation model. Behaviour & Information Technology, 1–21.

  • Al-Emran, M., Al-Nuaimi, M. N., & Arpaci, I. (2022). Towards a wearable education: Understanding the determinants affecting students’ adoption of wearable technologies using machine learning algorithms.Education and Information Technologies,1–20.

  • Al Hamdi, S. S. N., & Edakkalayil, L. A. (2022). Measuring Students’ Performance in Face To Face and Online Learning-An Empirical Evidence From Oman in the Pre and During the Covid-19 Pandemic Period. Proceedings of the fourth international conference on teaching, learning and Education, Berlin, Germany, 11–13 March 2022.

  • AlHarthi, H., Kadhim, A., et al. (2011). Predicting the difficulties faced by students living outside the university campus in light of some demographic variables. Journal of Qualitative Educational Research, 18(3), 306–430.

    Google Scholar 

  • Al-Kindi, I., & Al-Khanjari, Z. (2020, August). A Novel Architecture of SQU SMART LMS: The New Horizon for SMART City in Oman. In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 751–756). IEEE.

  • Al Muqarshi, A. (2022). Outsourcing, national diversity and transience: the reality of social identity in an ELT context in Omani higher education. International Journal of Qualitative Studies in Education, 1–17.

  • Al-Mahrouqia, R., & Karadsheh, M. A. (2016). Sultan Qaboos University students reasons of being under Observation. Humanities and social sciences, 43(3), 2343–2360.

    Google Scholar 

  • Al-Sharafi, M. A., Al-Emran, M., Iranmanesh, M., Al-Qaysi, N., Iahad, N. A., & Arpaci, I. (2022). Understanding the impact of knowledge management factors on the sustainable use of AI-based chatbots for educational purposes using a hybrid SEM-ANN approach.Interactive Learning Environments,1–20.

  • AlGhanboosi, S., & Kadhim, A. (2004). Problems of Academic Supervision at Sultan Qaboos University from Professors and students perspectives. Journal of Education, 10(2), 39–75.

    Google Scholar 

  • Anil, Ö., & Batdi, V. (2022). Use of augmented reality in science education: A mixed-methods research with the multi-complementary approach.Education and Information Technologies,1–39.

  • Belwal, R., Belwal, S., Sufian, A. B., & Al Badi, A. (2020). Project-based learning (PBL): Outcomes of students’ engagement in an external consultancy project in Oman. Education + Training, 63(3), 336–359.

    Article  Google Scholar 

  • Bowman, N. A., & Jang, N. (2022). What is the Purpose of Academic Probation? Its Substantial Negative Effects on Four-Year Graduation.Research in Higher Education,1–27.

  • Brownlee, J. (2018). August 3, 2020). A Gentle Introduction to k-fold Cross-Validation. Online resources.

  • Chugh, S., Gulistan, A., Ghosh, S., & Rahman, B. M. A. (2019). Machine learning approach for computing optical properties of a photonic crystal fiber. Optics express, 27(25), 36414–36425.

    Article  Google Scholar 

  • De Smedt, J., Deeva, G., & De Weerdt, J. (2019). Mining behavioral sequence constraints for classification. IEEE Transactions on Knowledge and Data Engineering, 32(6), 1130–1142.

    Article  Google Scholar 

  • Debuse, J. C. W., Iglesia, B., Howard, C. M., & Rayward-Smith, V. J. (2000). Building the KDD Roadmap: A methodology for Knowledge Discovery. Industrial Knowledge Management (pp. 179–196). London: Springer.

    Google Scholar 

  • Du, X., Yang, J., Hung, J. L., & Shelton, B. (2020). Educational data mining: A systematic review of research and emerging trends. Information Discovery and Delivery, 48(4), 225–236.

    Article  Google Scholar 

  • Deeva, G., De, S. J., Saint-Pierre, C., Weber, R., & De, W. J. (2022). Predicting student performance using sequence classification with time-based windows,Expert Systems with Applications,209.

  • Gamal, B. (2020). Naïve Bayes Algorithm. Retrieved from https://medium.com/analytics-vidhya/na%C3%AFve-bayes-algorithm-5bf31e9032a2.

  • Gareth, J., Daniela, W., Trevor, H., & Robert, T. (2013). An introduction to statistical learning: with applications in R. Spinger, London, UK.

  • Hammad, W., & Al-Harthi, A. S. A. (2021). Aligning ‘international’standards with ‘national’educational leadership preparation needs: The case of a master’s programme in Oman. Internationalisation of Educational Administration and Leadership Curriculum (pp. 117–138). Bingley: Emerald Publishing Limited.

    Chapter  Google Scholar 

  • Hussain, S., Gaftandzhieva, S., Maniruzzaman, M., et al. (2021). Regression analysis of student academic performance using deep learning. Educ Inf Technol, 26, 783–798.

    Article  Google Scholar 

  • Hussain, M., Zhu, W., Zhang, W., Abidi, S. M. R., & Ali, S. (2019). Using machine learning to predict student difficulties from learning session data. Artificial Intelligence Review, 52(1), 381–407. https://doi.org/10.1007/s10462-018-9620-8.

    Article  Google Scholar 

  • Ibrahim, A., & Al-Barwani, T. A. (1993). A study of Omani secondary school Certificate Examination as a predictor of academic performance of Sultan Qaboos University. Research in college Teaching Practicum Research in Sultan Qaboos University, 1, 1–29.

    Google Scholar 

  • Imran, M., Latif, S., Mehmood, D., & Shah, M. S. (2019). Student Academic Performance Prediction using Supervised Learning Techniques.International Journal of Emerging Technologies in Learning, 14(14)

  • Jalota, C., & Agrawal, R. (2019, February). Analysis of educational data mining using classification. In 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon) (pp. 243–247). IEEE.

  • Jiao, P., Ouyang, F., Zhang, Q., & Alavi, A. H. (2022). Artificial intelligence-enabled prediction model of student academic performance in online engineering education.Artificial Intelligence Review,1–24.

  • Jia, J. W. (2013). Machine learning algorithms and predictive models for undergraduate student retention at an HBCU (Doctoral dissertation, Bowie State University).

  • Kalavathy, R., Suresh, R. M., & Akhila, R. (2007, December). KDD and data mining. In 2007 IET-UK International Conference on Information and Communication Technology in Electrical Sciences (ICTES 2007) (pp. 1105–1110). IET.

  • Khan, F. (2019). Design Thinking humanizes Data Science & more. retrieved from https://medium.com/technicity/design-thinking-humanizes-data-science-more-5a666119c8b1.

  • Khan, A., & Ghosh, S. K. (2018). Data mining based analysis to explore the effect of teaching on student performance. Educ Inf Technol, 23, 1677–1697.

    Article  Google Scholar 

  • Khanna, L., Singh, S. N., & Alam, M. (2016, August). Educational data mining and its role in determining factors affecting students academic performance: A systematic review. In 2016 1st India international conference on information processing (IICIP) (pp. 1–7). IEEE.

  • Kulin, M., Kazaz, T., De Poorter, E., & Moerman, I. (2021). A survey on machine learning-based performance improvement of wireless networks: PHY, MAC and network layer. Electronics, 10(3), 318.

    Article  Google Scholar 

  • Kumar, R., & Sharma, A. (2017). Data mining in education: A review. International Journal of Mechanical Engineering and Information Technology, 5(1), 1843–1845.

    Article  Google Scholar 

  • Mariscal, G., Marban, O., & Fernandez, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review, 25(2), 137–166.

    Article  Google Scholar 

  • Maqableh, M., Jaradat, M., & Azzam, A. (2021). Exploring the determinants of students’ academic performance at university level: The mediating role of internet usage continuance intention. Educ Inf Technol, 26, 4003–4025.

    Article  Google Scholar 

  • Mellor, J. C., Stone, M. A., & Keane, J. (2018). Application of data mining to “big data” acquired in audiology: Principles and potential. Trends in hearing, 22, 233–250.

    Article  Google Scholar 

  • Mengash, H. A. (2020). Using data mining techniques to predict student performance to support decision making in university admission systems. Ieee Access : Practical Innovations, Open Solutions, 8, 55462–55470.

    Article  Google Scholar 

  • Moosa, S. M., & Ibrahim, A. M. (2008). Academic Observation as Perceived by students: Causes, reactions, and remedies. Journal of Higher Education in the Arab World, 11(2), 15–28.

    Google Scholar 

  • Sarfra, M., Khawaja, K. F., & Ivascu, L. (2022). Factors affecting business school students’ performance during the COVID-19 pandemic: A moderated and mediated model,The International Journal of Management Education, 20(2).

  • Nahar, K., Shova, B. I., Ria, T., et al. (2021). Mining educational data to predict students performance. Educ Inf Technol, 26, 6051–6067.

    Article  Google Scholar 

  • Naicker, N., Adeliyi, T., & Wing, J. (2020). Linear support vector machines for prediction of student performance in school-based education. Mathematical Problems in Engineering, 2020.

  • Nilashi, M., Abumalloh, R. A., Zibarzani, M., et al. (2022). What factors influence students satisfaction in massive Open Online Courses? Findings from user-generated content using Educational Data Mining. Educ Inf Technol.

  • Oman (2040 vision). [online] Available: https://www.2040.om/wp-content/uploads/2019/02/190207-Preliminmy-Vision-Docunent-English.pdf.

  • Orriols-Puig, A., Martínez-López, F. J., Casillas, J., & Lee, N. (2013). Unsupervised KDD to creatively support managers’ decision making with fuzzy association rules: A distribution channel application. Industrial Marketing Management, 42(4), 532–543.

    Article  Google Scholar 

  • Powers, D. M. W. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. ArXiv abs/2010.16061.

  • Rahman, F. A., Desa, M. I., Wibowo, A., & Haris, N. A. (2014). Knowledge discovery database (KDD)-data mining application in transportation. Proceeding of the Electrical Engineering Computer Science and Informatics, 1(1), 116–119.

  • Rahman, F. A., Desa, M. I., & Wibowo, A. (2016, June). A review of kdd-data mining framework and its application in logistics and transportation. In The 7th International Conference on Networked Computing and Advanced Information Management (pp. 175–180). IEEE.

  • Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of database systems, 5, 532–538.

    Article  Google Scholar 

  • Rivas, A., Gonzalez-Briones, A., Hernandez, G., Prieto, J., & Chamoso, P. (2021). Artificial neural network analysis of the academic performance of students in virtual learning environments. Neurocomputing, 423, 713–720.

    Article  Google Scholar 

  • Sang (2022). K-Nearest Neighbor(KNN) Algorithm for Machine Learning. from https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning.

  • Sekeroglu, B., Abiyev, R., Ilhan, A., Arslan, M., & Idoko, J. B. (2021). Systematic literature review on machine learning and student performance prediction: Critical gaps and possible remedies. Applied Sciences, 11(22), 10907.

    Article  Google Scholar 

  • Shah, M. B., Kaistha, M., & Gupta, Y. (2019, November). Student Performance Assessment and Prediction System using Machine Learning. In 2019 4th International Conference on Information Systems and Computer Networks (ISCON) (pp. 386–390). IEEE.

  • Shoyukhi, M., Vossen, P. H., Ahmadi, A. H., Kafipour, R., & Beattie, K. A. (2022). Developing a comprehensive plagiarism assessment rubric. Educ Inf Technol. https://doi.org/10.1007/s10639-022-11365-1.

    Article  Google Scholar 

  • Shyamala, K. (2008). A study on data mining techniques using higher educational system for efficient prediction. Department of Computer Science, Mother Teresa Women’s University. Doctor of Philosophy in Computer Science.

  • Academic Procedure, S. Q. U. (2019, February 24). Retrived from https://www.squ.edu.om/Portals/14/Users/027/27/27/Academic%20Procedure%20Electronic%20Booklet%202019%20.pdf.

  • SQU Annual Statistics Book 2019–2020 (2020). accessed on May 2021 retrieved from https://www.squ.edu.om/Portals/0/DNNGalleryPro/uploads/2020/9/3/AnnualStatisticsBOOK_2019-2020_compressed.pdf.

  • Thonnard, O., & Dacier, M. (2008, December). Actionable knowledge discovery for threats intelligence support using a multi-dimensional data mining methodology. In 2008 ieee international conference on data mining workshops (pp. 154–163). IEEE.

  • Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & education, 143, 1–15.

    Article  Google Scholar 

  • Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289.

    Google Scholar 

  • Tsiakmaki, M., Kostopoulos, G., Kotsiantis, S., & Ragos, O. (2020). Transfer learning from deep neural networks for predicting student performance. Applied Sciences, 10(6), 2145.

    Article  Google Scholar 

  • Vidiyala, R. (2020). Performance Metrics for Classification Machine Learning Problems. Retrieved from https://towardsdatascience.com/performance-metrics-for-classification-machine-learning-problems-97e7e774a007

  • Yang, S. (2019). An Introduction to Naïve Bayes Classifier: From theory to practice, learn underlying principles of Naïve Bayes. from https://towardsdatascience.com/introduction-to-na%C3%AFve-bayes-classifier-fa59e3e24aaf.

  • Waheed, H., Hassan, S. U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human behavior, 104, 106189.

    Article  Google Scholar 

  • Wang, X., Yu, X., Guo, L., Liu, F., & Xu, L. (2020). Student performance prediction with short-term sequential campus behaviors. Information, 11(4), 201.

    Article  Google Scholar 

  • Wang, X. (2011, July). A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In The 2011 International Joint Conference on Neural Networks (pp. 1293–1299). IEEE.

  • Wook, M., Yusof, Z. M., & Nazri, M. Z. A. (2017). Educational data mining acceptance among undergraduate students. Educ Inf Technol, 22, 1195–1216.

    Article  Google Scholar 

  • Yakubu, M. N., & Abubakar, A. M. (2022). Applying machine learning approach to predict students’ performance in higher educational institutions. Kybernetes, 51(2), 916–934. https://doi.org/10.1108/K-12-2020-0865.

    Article  Google Scholar 

  • Zaffar, M., Hashmani, M. A., Savita, K. S., Rizvi, S. S. H., & Rehman, M. (2020). Role of FCBF feature selection in educational data mining. Mehran University Research Journal Of Engineering & Technology, 39(4), 772–778.

    Article  Google Scholar 

  • Zhu, Y., Xu, S., Wang, W., Zhang, L., Liu, D., Liu, Z., & Xu, Y. (2022). The impact of Online and Offline Learning motivation on learning performance: the mediating role of positive academic emotion.Education and Information Technologies,1–18.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Tarhini.

Ethics declarations

Conflict of interest

None

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Alawi, L., Al Shaqsi, J., Tarhini, A. et al. Using machine learning to predict factors affecting academic performance: the case of college students on academic probation. Educ Inf Technol 28, 12407–12432 (2023). https://doi.org/10.1007/s10639-023-11700-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-023-11700-0

Keywords