Skip to main content

Advertisement

Log in

Academic performance warning system based on data driven for higher education

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Academic probation at universities has become a matter of pressing concern in recent years, as many students face severe consequences of academic probation. We carried out research to find solutions to decrease the situation mentioned above. Our research used the power of massive data sources from the education sector and the modernity of machine learning techniques to build an academic warning system. Our system is based on academic performance that directly reflects students’ academic probation status at the university. Through the research process, we provided a dataset that has been extracted and developed from raw data sources, including a wealth of information about students, subjects, and scores. We build a dataset with many features that are extremely useful in predicting students’ academic warning status via feature generation techniques and feature selection strategies. Remarkably, the dataset contributed is flexible and scalable because we provided detailed calculation formulas that its materials are found in any university or college in Vietnam. That allows any university to reuse or reconstruct another similar dataset based on their raw academic database. Moreover, we variously combined data, unbalanced data handling techniques, model selection techniques, and research to propose suitable machine learning algorithms to build the best possible warning system. As a result, a two-stage academic performance warning system for higher education was proposed, with the F2-score measure of more than 74% at the beginning of the semester using the algorithm Support Vector Machine and more than 92% before the final examination using the algorithm LightGBM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Miguéis VL, Freitas A, Garcia PJV, Silva A (2018) Early segmentation of students according to their academic performance: a predictive modelling approach. Decis. Support Syst. 115:36–51

    Article  Google Scholar 

  2. Mingyu Z, Sutong W, Yanzhang W, Dujuan W (2021) An interpretable prediction method for university student academic crisis warning. Complex Intell Syst 8(1):323–336

    Article  Google Scholar 

  3. Bujang SDA, Selamat A, Ibrahim R, Krejcar O, Herrera-Viedma E, Fujita H, Ghani NAM (2021) Multiclass prediction model for student grade prediction using machine learning. IEEE Access 9:95608–95621

    Article  Google Scholar 

  4. Hamim T, Benabbou F, Sael N (2022) Student profile modeling using boosting algorithms. Int J Web-Based Learn Teach Technol (IJWLTT) 17(5):1–13

    Google Scholar 

  5. Namdeo J, Jayakumar N (2014) Predicting students performance using data mining technique with rough set theory concepts. Int J Adv Res Comput Sci Manag Stud 2:367–373

    Google Scholar 

  6. Madeira B, Tasci T, Çelebi N (2021) Prediction of student performance using rough set theory and backpropagation neural networks. Eur Sci J. https://doi.org/10.19044/esj.2021.v17n7p1

    Article  Google Scholar 

  7. Pham H-D, Le TD, Nguyen VT (2018) Static PE malware detection using gradient boosting decision trees algorithm. In: FDSE

  8. Corchs S, Fersini E, Gasparini F (2019) Ensemble learning on visual and textual data for social image emotion classification. Int J Mach Learn Cybern 10(8):2057–2070

    Article  Google Scholar 

  9. Yunan Z, Huang Q, Ma X, Yang Z, Jiang J (2016) Using multi-features and ensemble learning method for imbalanced malware classification, pp. 965–973. https://doi.org/10.1109/TrustCom.2016.0163

  10. Possebon IP, Silva AS, Granville LZ, Schaeffer-Filho A, Marnerides A (2019) Improved network traffic classification using ensemble learning. In: 2019 IEEE symposium on computers and communications (ISCC), pp. 1–6. https://doi.org/10.1109/ISCC47284.2019.8969637

  11. Carrasco R, Sicilia-Urban M-A (2020) Evaluation of deep neural networks for reduction of credit card fraud alerts. IEEE Access 8:186421–186432. https://doi.org/10.1109/ACCESS.2020.3026222

    Article  Google Scholar 

  12. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: NIPS

  13. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  MathSciNet  MATH  Google Scholar 

  14. Hearst MA (1998) Trends and controversies: support vector machines. IEEE Intell. Syst. 13:18–28

    Article  Google Scholar 

  15. Hasan H, Shafri H, Al-Habshi M (2019) A comparison between support vector machine (SVM) and convolutional neural network (CNN) models for hyperspectral image classification. IOP Conf Ser Earth Environ Sci 357:012035. https://doi.org/10.1088/1755-1315/357/1/012035

    Article  Google Scholar 

  16. Gabrilovich E, Markovitch S (2005) Feature generation for text categorization using world knowledge. In: IJCAI

  17. Li L, Yang H, Jiao Y, Lin K-Y (2020) Feature generation based on knowledge graph. IFAC-PapersOnLine 53(5):774–779. https://doi.org/10.1016/j.ifacol.2021.04.172 (3rd IFAC Workshop on Cyber-Physical & Human Systems CPHS 2020)

    Article  Google Scholar 

  18. Shi H, Li H, Zhang D, Cheng C, Cao X (2018) An efficient feature generation approach based on deep learning and feature selection techniques for traffic classification. Comput. Netw. 132:81–98. https://doi.org/10.1016/j.comnet.2018.01.007

    Article  Google Scholar 

  19. Nahar N, Ara F, Neloy MAI, Biswas A, Hossain MS, Andersson K (2021) Feature selection based machine learning to improve prediction of Parkinson disease. In: Mahmud M, Kaiser MS, Vassanelli S, Dai Q, Zhong N (eds) Brain informatics. Springer, Cham, pp 496–508

    Chapter  Google Scholar 

  20. Rahman L, Setiawan NA, Permanasari AE (2017) Feature selection methods in improving accuracy of classifying students’ academic performance. In: 2017 2nd international conferences on information technology, information systems and electrical engineering (ICITISEE), pp. 267–271. https://doi.org/10.1109/ICITISEE.2017.8285509

  21. Chen R-C, Dewi C, Huang S, Caraka R (2020) Selecting critical features for data classification based on machine learning methods. J Big Data 7:26. https://doi.org/10.1186/s40537-020-00327-4

    Article  Google Scholar 

  22. Chumerin N, Van Hulle MM (2006) Comparison of two feature extraction methods based on maximization of mutual information. In: 2006 16th IEEE signal processing society workshop on machine learning for signal processing, pp. 343–348. https://doi.org/10.1109/MLSP.2006.275572

  23. Guyon I (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  24. Barros T, SouzaNeto P, Silva I, Guedes LA (2019) Predictive models for imbalanced data: a school dropout perspective. Educ. Sci. 9:275. https://doi.org/10.3390/educsci9040275

    Article  Google Scholar 

  25. Rachburee N, Punlumjeak W (2021) Oversampling technique in student performance classification from engineering course. Int J Electr Comput Eng 11:3567

    Google Scholar 

  26. More A (2016) Survey of resampling techniques for improving classification performance in unbalanced datasets

  27. Kumar P, Bhatnagar R, Gaur K, Bhatnagar A (2021) Classification of imbalanced data: review of methods and applications. IOP Conf Ser Mater Sci Eng 1099(1):012077. https://doi.org/10.1088/1757-899x/1099/1/012077

    Article  Google Scholar 

  28. Rovira S, Puertas E, Igual L (2017) Data-driven system to predict academic grades and dropout. PLoS ONE 12(2):0171207

    Article  Google Scholar 

  29. Huynh-Ly T-N, Le H-T, Thai-Nghe N (2021) Integrating deep learning architecture into matrix factorization for student performance prediction. In: Dang TK, Küng J, Chung TM, Takizawa M (eds) Future data and security engineering. Springer, Cham, pp 408–423

    Chapter  Google Scholar 

  30. Yağcı M (2022) Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learn Environ 9(1):11. https://doi.org/10.1186/s40561-022-00192-z

    Article  Google Scholar 

  31. Quinlan JR (2004) Induction of decision trees. Mach Learn 1:81–106

    Article  Google Scholar 

  32. Breiman L (2004) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  33. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42

    Article  MATH  Google Scholar 

  34. Niu L (2020) A review of the application of logistic regression in educational research: common issues, implications, and suggestions. Educ Rev 72(1):41–67. https://doi.org/10.1080/00131911.2018.1483892

    Article  Google Scholar 

  35. Robbins HE (2007) A stochastic approximation method. Ann Math Stat 22:400–407

    Article  MathSciNet  MATH  Google Scholar 

  36. Wiering M, Ree M, Embrechts M, Stollenga M, Meijster A, Nolte A, Schomaker L (2013) The neural support vector machine

  37. Ratner B (2009) The correlation coefficient: its values range between \(+1/-1\), or do they? J Target Meas Anal Market. https://doi.org/10.1057/jt.2009.5

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by The VNUHCM-University of Information Technology’s Scientific Research Support Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiet Van Nguyen.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Grading System

See Table 9.

Table 9 Grade conversion table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duong, H.TH., Tran, L.TM., To, H.Q. et al. Academic performance warning system based on data driven for higher education. Neural Comput & Applic 35, 5819–5837 (2023). https://doi.org/10.1007/s00521-022-07997-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07997-6

Keywords