skip to main content
10.1145/3647444.3652473acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicimmiConference Proceedingsconference-collections
research-article

Ensemble Machine Learning Model Improves Prediction Accuracy for Academic Performance: A Comparative Study of Default ML VS Boosting Algorithm

Published: 13 May 2024 Publication History

Abstract

Overfitting is a common issue in machine learning models, where the model only predicts training data, but can be resolved using ensemble techniques like bagging or boosting. Bagging involves predicting accuracy through majority voting of multiple models running in parallel, while boosting uses a weak sequential model and passes the previously overfitted samples to another model to generate a more robust prediction. This research tries to fill the research gap between the selection of the best model for classification problem after comparing accuracy score of default and boosting algorithm. The multiple linear regression summary statistics indicate the dependent variables were significant, so prediction accuracy needs to be further tested using an ensemble machine learning model. The logistic regression model scored the least in both default (72%) and cross-validation (84%) accuracy than the random forest (82-94%), decision tree (81-82%), and k-nearest (83-88%) models, respectively. Similarly, the gradient boosting algorithms achieved the highest scores (93%), XGBoosting (94%), and AdaBoosting (93%), respectively. This accuracy score was further justified when testing the confusion matrix, classification report, and ROCAUC plots of each model.

References

[1]
Z. Dong, J. Liu, B. Liu, K. Li, and X. Li, “Hourly energy consumption prediction of an office building based on ensemble learning and energy consumption pattern classification,” Energy Build., vol. 241, p. 110929, 2021.
[2]
B. Vrigazova, “The proportion for splitting data into training and test set for the bootstrap in classification problems,” Bus. Syst. Res. Int. J. Soc. Adv. Innov. Res. Econ., vol. 12, no. 1, pp. 228–242, 2021.
[3]
B. Cherradi, O. Terrada, A. Ouhmida, S. Hamida, A. Raihani, and O. Bouattane, “Computer-aided diagnosis system for early prediction of atherosclerosis using machine learning and K-fold cross-validation,” in 2021 international congress of advanced technology and engineering (ICOTEN), IEEE, 2021, pp. 1–9.
[4]
R. Scheda and S. Diciotti, “Explanations of Machine Learning Models in Repeated Nested Cross-Validation: An Application in Age Prediction Using Brain Complexity Features,” Appl. Sci., vol. 12, no. 13, p. 6681, 2022.
[5]
A. Mohammed and R. Kora, “A comprehensive review on ensemble deep learning: Opportunities and challenges,” J. King Saud Univ.-Comput. Inf. Sci., 2023.
[6]
Z.-H. Zhou and Z.-H. Zhou, Ensemble learning. Springer, 2021.
[7]
G. Ngo, R. Beard, and R. Chandra, “Evolutionary bagging for ensemble learning,” Neurocomputing, vol. 510, pp. 1–14, 2022.
[8]
A. Nabil, M. Seyam, and A. Abou-Elfetouh, “Prediction of students’ academic performance based on courses’ grades using deep neural networks,” IEEE Access, vol. 9, pp. 140731–140746, 2021.
[9]
B. Albreiki, N. Zaki, and H. Alashwal, “A systematic literature review of student'performance prediction using machine learning techniques,” Educ. Sci., vol. 11, no. 9, Art. no. 9, 2021.
[10]
S. D. A. Bujang, “Multiclass Prediction Model for Student Grade Prediction Using Machine Learning,” IEEE Access, vol. 9, pp. 95608–95621, 2021.
[11]
L. Falát and T. Piscová, “Predicting GPA of University Students with Supervised Regression Machine Learning Models,” Appl. Sci., vol. 12, no. 17, p. 8403, 2022.
[12]
S. Gaftandzhieva, “Exploring online activities to predict the final grade of student,” Mathematics, vol. 10, no. 20, p. 3758, 2022.
[13]
M. T. Sathe and A. C. Adamuthe, “Comparative Study of Supervised Algorithms for Prediction of Students’ Performance.,” Int. J. Mod. Educ. Comput. Sci., vol. 13, no. 1, Art. no. 1, 2021.
[14]
L. K. Smirani, H. A. Yamani, L. J. Menzli, and J. A. Boulahia, “Using ensemble learning algorithms to predict student failure and enabling customized educational paths,” Sci. Program., vol. 2022, pp. 1–15, 2022.
[15]
H. Altabrawee, O. A. J. Ali, and S. Q. Ajmi, “Predicting students’ performance using machine learning techniques,” J. Univ. BABYLON Pure Appl. Sci., vol. 27, no. 1, Art. no. 1, 2019.
[16]
A. Al-Zawqari, D. Peumans, and G. Vandersteen, “A flexible feature selection approach for predicting students’ academic performance in online courses,” Comput. Educ. Artif. Intell., vol. 3, p. 100103, 2022.
[17]
S. C. Mwape and D. Kunda, “Using data mining techniques to predict university student's ability to graduate on schedule,” Int. J. Innov. Educ., vol. 8, no. 1, pp. 40–62, 2023.
[18]
L. K. Smirani, H. A. Yamani, L. J. Menzli, and J. A. Boulahia, “Using Ensemble Learning Algorithms to Predict Student Failure and Enabling Customized Educational Paths,” Sci. Program., vol. 2022, pp. 1–15, Apr. 2022.
[19]
J. Tang, A. Henderson, and P. Gardner, “Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets,” Analyst, vol. 146, no. 19, pp. 5880–5891, 2021.
[20]
S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 2, pp. 40–46, 2021.
[21]
N. Peppes, E. Daskalakis, T. Alexakis, E. Adamopoulou, and K. Demestichas, “Performance of machine learning-based multi-model voting ensemble methods for network threat detection in agriculture 4.0,” Sensors, vol. 21, no. 22, p. 7475, 2021.
[22]
S. Demir and E. K. Sahin, “An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost,” Neural Comput. Appl., pp. 1–18, 2022.
[23]
A. Vultureanu-Albişi and C. Bădică, “Improving students’ performance by interpretable explanations using ensemble tree-based approaches,” in 2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI), IEEE, 2021, pp. 215–220. Accessed: Nov. 16, 2023. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9465558/
[24]
A. Asselman, M. Khaldi, and S. Aammou, “Enhancing the prediction of student performance based on the machine learning XGBoost algorithm,” Interact. Learn. Environ., vol. 31, no. 6, pp. 3360–3379, Aug. 2023.
[25]
O. Sprangers, S. Schelter, and M. De Rijke, “Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual Event Singapore: ACM, Aug. 2021, pp. 1510–1520.
[26]
N. Aksoy and I. Genc, “Predictive models development using gradient boosting based methods for solar power plants,” J. Comput. Sci., vol. 67, p. 101958, 2023.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICIMMI '23: Proceedings of the 5th International Conference on Information Management & Machine Intelligence
November 2023
1215 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICIMMI 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 20
    Total Downloads
  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media