Abstract
Academic probation at universities has become a matter of pressing concern in recent years, as many students face severe consequences of academic probation. We carried out research to find solutions to decrease the situation mentioned above. Our research used the power of massive data sources from the education sector and the modernity of machine learning techniques to build an academic warning system. Our system is based on academic performance that directly reflects students’ academic probation status at the university. Through the research process, we provided a dataset that has been extracted and developed from raw data sources, including a wealth of information about students, subjects, and scores. We build a dataset with many features that are extremely useful in predicting students’ academic warning status via feature generation techniques and feature selection strategies. Remarkably, the dataset contributed is flexible and scalable because we provided detailed calculation formulas that its materials are found in any university or college in Vietnam. That allows any university to reuse or reconstruct another similar dataset based on their raw academic database. Moreover, we variously combined data, unbalanced data handling techniques, model selection techniques, and research to propose suitable machine learning algorithms to build the best possible warning system. As a result, a two-stage academic performance warning system for higher education was proposed, with the F2-score measure of more than 74% at the beginning of the semester using the algorithm Support Vector Machine and more than 92% before the final examination using the algorithm LightGBM.














Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Miguéis VL, Freitas A, Garcia PJV, Silva A (2018) Early segmentation of students according to their academic performance: a predictive modelling approach. Decis. Support Syst. 115:36–51
Mingyu Z, Sutong W, Yanzhang W, Dujuan W (2021) An interpretable prediction method for university student academic crisis warning. Complex Intell Syst 8(1):323–336
Bujang SDA, Selamat A, Ibrahim R, Krejcar O, Herrera-Viedma E, Fujita H, Ghani NAM (2021) Multiclass prediction model for student grade prediction using machine learning. IEEE Access 9:95608–95621
Hamim T, Benabbou F, Sael N (2022) Student profile modeling using boosting algorithms. Int J Web-Based Learn Teach Technol (IJWLTT) 17(5):1–13
Namdeo J, Jayakumar N (2014) Predicting students performance using data mining technique with rough set theory concepts. Int J Adv Res Comput Sci Manag Stud 2:367–373
Madeira B, Tasci T, Çelebi N (2021) Prediction of student performance using rough set theory and backpropagation neural networks. Eur Sci J. https://doi.org/10.19044/esj.2021.v17n7p1
Pham H-D, Le TD, Nguyen VT (2018) Static PE malware detection using gradient boosting decision trees algorithm. In: FDSE
Corchs S, Fersini E, Gasparini F (2019) Ensemble learning on visual and textual data for social image emotion classification. Int J Mach Learn Cybern 10(8):2057–2070
Yunan Z, Huang Q, Ma X, Yang Z, Jiang J (2016) Using multi-features and ensemble learning method for imbalanced malware classification, pp. 965–973. https://doi.org/10.1109/TrustCom.2016.0163
Possebon IP, Silva AS, Granville LZ, Schaeffer-Filho A, Marnerides A (2019) Improved network traffic classification using ensemble learning. In: 2019 IEEE symposium on computers and communications (ISCC), pp. 1–6. https://doi.org/10.1109/ISCC47284.2019.8969637
Carrasco R, Sicilia-Urban M-A (2020) Evaluation of deep neural networks for reduction of credit card fraud alerts. IEEE Access 8:186421–186432. https://doi.org/10.1109/ACCESS.2020.3026222
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: NIPS
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Hearst MA (1998) Trends and controversies: support vector machines. IEEE Intell. Syst. 13:18–28
Hasan H, Shafri H, Al-Habshi M (2019) A comparison between support vector machine (SVM) and convolutional neural network (CNN) models for hyperspectral image classification. IOP Conf Ser Earth Environ Sci 357:012035. https://doi.org/10.1088/1755-1315/357/1/012035
Gabrilovich E, Markovitch S (2005) Feature generation for text categorization using world knowledge. In: IJCAI
Li L, Yang H, Jiao Y, Lin K-Y (2020) Feature generation based on knowledge graph. IFAC-PapersOnLine 53(5):774–779. https://doi.org/10.1016/j.ifacol.2021.04.172 (3rd IFAC Workshop on Cyber-Physical & Human Systems CPHS 2020)
Shi H, Li H, Zhang D, Cheng C, Cao X (2018) An efficient feature generation approach based on deep learning and feature selection techniques for traffic classification. Comput. Netw. 132:81–98. https://doi.org/10.1016/j.comnet.2018.01.007
Nahar N, Ara F, Neloy MAI, Biswas A, Hossain MS, Andersson K (2021) Feature selection based machine learning to improve prediction of Parkinson disease. In: Mahmud M, Kaiser MS, Vassanelli S, Dai Q, Zhong N (eds) Brain informatics. Springer, Cham, pp 496–508
Rahman L, Setiawan NA, Permanasari AE (2017) Feature selection methods in improving accuracy of classifying students’ academic performance. In: 2017 2nd international conferences on information technology, information systems and electrical engineering (ICITISEE), pp. 267–271. https://doi.org/10.1109/ICITISEE.2017.8285509
Chen R-C, Dewi C, Huang S, Caraka R (2020) Selecting critical features for data classification based on machine learning methods. J Big Data 7:26. https://doi.org/10.1186/s40537-020-00327-4
Chumerin N, Van Hulle MM (2006) Comparison of two feature extraction methods based on maximization of mutual information. In: 2006 16th IEEE signal processing society workshop on machine learning for signal processing, pp. 343–348. https://doi.org/10.1109/MLSP.2006.275572
Guyon I (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Barros T, SouzaNeto P, Silva I, Guedes LA (2019) Predictive models for imbalanced data: a school dropout perspective. Educ. Sci. 9:275. https://doi.org/10.3390/educsci9040275
Rachburee N, Punlumjeak W (2021) Oversampling technique in student performance classification from engineering course. Int J Electr Comput Eng 11:3567
More A (2016) Survey of resampling techniques for improving classification performance in unbalanced datasets
Kumar P, Bhatnagar R, Gaur K, Bhatnagar A (2021) Classification of imbalanced data: review of methods and applications. IOP Conf Ser Mater Sci Eng 1099(1):012077. https://doi.org/10.1088/1757-899x/1099/1/012077
Rovira S, Puertas E, Igual L (2017) Data-driven system to predict academic grades and dropout. PLoS ONE 12(2):0171207
Huynh-Ly T-N, Le H-T, Thai-Nghe N (2021) Integrating deep learning architecture into matrix factorization for student performance prediction. In: Dang TK, Küng J, Chung TM, Takizawa M (eds) Future data and security engineering. Springer, Cham, pp 408–423
Yağcı M (2022) Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learn Environ 9(1):11. https://doi.org/10.1186/s40561-022-00192-z
Quinlan JR (2004) Induction of decision trees. Mach Learn 1:81–106
Breiman L (2004) Random forests. Mach Learn 45:5–32
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42
Niu L (2020) A review of the application of logistic regression in educational research: common issues, implications, and suggestions. Educ Rev 72(1):41–67. https://doi.org/10.1080/00131911.2018.1483892
Robbins HE (2007) A stochastic approximation method. Ann Math Stat 22:400–407
Wiering M, Ree M, Embrechts M, Stollenga M, Meijster A, Nolte A, Schomaker L (2013) The neural support vector machine
Ratner B (2009) The correlation coefficient: its values range between \(+1/-1\), or do they? J Target Meas Anal Market. https://doi.org/10.1057/jt.2009.5
Acknowledgements
This research was supported by The VNUHCM-University of Information Technology’s Scientific Research Support Fund.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Duong, H.TH., Tran, L.TM., To, H.Q. et al. Academic performance warning system based on data driven for higher education. Neural Comput & Applic 35, 5819–5837 (2023). https://doi.org/10.1007/s00521-022-07997-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07997-6