Abstract
Students’ performance prediction systems play a vital role in enhancing the educational performance inside universities, schools, and training centers. Big data can come from different resources such as examination centers, virtual courses, registration departments, e-learning systems. Extracting meaningful knowledge from educational data is a complex task, so reducing the data dimensionality is needed. In this paper, we proposed an enhanced binary genetic algorithm (EBGA) as a wrapper feature selection algorithm. Novel hybrid selection mechanism based on a k-means algorithm and electromagnetic-like mechanism (EM) method is proposed. K-means will cluster the population into a set of clusters, while EM will determine a value called a total force (TF) for each solution. Each cluster has an accumulated total force (ATF) (i.e., adding all TFs together). Selection process will select two solutions with the highest TF from the cluster, which has the highest ATF. We employed a hybrid machine learning approach between the proposed EBGA and five different classifiers (i.e., k-Nearest Neighbors (k-NN), Decision Trees (DT), Naive Bayes (NB), Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA)). Two real case studies obtained from UCI Machine Learning Repository are used in this paper. Obtained results showed the ability of the proposed approach to enhance the performance of the binary genetic algorithm. Moreover, the performances of all classifiers are improved between \(1\%\) and \(11\%\).
Similar content being viewed by others
References
Abdullah Z, Herawan T, Ahmad N, Deris MM (2011) Mining significant association rules from educational data using critical relative support approach. Procedia Soc Behav Sci. https://doi.org/10.1016/j.sbspro.2011.11.020
Ahmad A, Khan SS (2019) Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 7:31883–31902. https://doi.org/10.1109/ACCESS.2019.2903568
Aldowah H, Al-Samarraie H, Fauzy WM (2019) Educational data mining and learning analytics for 21st century higher education: a review and synthesis. Telemat Inf 37:13–49. https://doi.org/10.1016/j.tele.2019.01.007
Amra IAA, Maghari AYA (2017) Students performance prediction using knn and naïve bayesian. In: 2017 8th international conference on information technology (ICIT), pp. 909–913. https://doi.org/10.1109/ICITECH.2017.8079967
Baker RS, Inventado PS (2014) Educational data mining and learning analytics. Springer, New York, pp 61–75. https://doi.org/10.1007/978-1-4614-3305-7_4
Baker RS, Yacef K (2009) The state of educational data mining in 2009: a review and future visions. J Edu Data Min 1(1):3–17
Baradwaj BK, Pal S (2012) Mining educational data to analyze students’ performance. CoRR, arXiv:abs/1201.3417
Birbil C, Fang S-C (2003) An electromagnetism-like mechanism for global optimization. J Global Optim 25(3):263–282. https://doi.org/10.1023/A:1022452626305
Bogarín A, Romero C, Cerezo R, Sánchez-Santillán M (2014) Clustering for improving educational process mining. In: Proceedings of the fourth international conference on learning analytics and knowledge, LAK ’14, pages 11–15, New York, NY, USA. ACM. ISBN 978-1-4503-2664-3. https://doi.org/10.1145/2567574.2567604
Campagni R, Merlini D, Sprugnoli R, Verri MC (2015) Data mining models for student careers. Exp Syst Appl 42(13):5508–5521. https://doi.org/10.1016/j.eswa.2015.02.052
Chanchary FH, Haque I, Khalid MS (2008) Web usage mining to evaluate the transfer of learning in a web-based learning environment. In: first international workshop on knowledge discovery and data mining
Cortes C, Mohri M (2004) Auc optimization vs. error rate minimization. Adv Neural Inf Process Syst 16(16):313–320
Cortez P, Silva A (2008) Using data mining to predict secondary school student performance. In: A. Brito and J. Teixeira Eds., Proceedings of 5th Future Business Technology Conference (FUBUTEC 2008)
Dahman MR, Dağ H (2019) Machine learning model to predict an adult learner’s decision to continue esol course. Educ Inf Technol 24(4):2429–2452. https://doi.org/10.1007/s10639-019-09884-5
Damaševičius R (2010) Analysis of academic results for informatics course improvement using association rule mining. Springer, Boston, pp 357–363
Dua D, Graff C (2019) UCI machine learning repository, URL http://archive.ics.uci.edu/ml
Farhan M, Jabbar S, Aslam M, Hammoudeh M, Ahmad M, Khalid S, Khan M, Han K (2018) Iot-based students interaction framework using attention-scoring assessment in elearning. Future Gener Comput Syst 79:909–919
Fernandes E, Holanda M, Victorino M, Borges V, Carvalho R, Erven GV (2019) Educational data mining: predictive analysis of academic performance of public school students in the capital of Brazil. J Bus Res 94:335–343
Francis BK, Babu SS (2019) Predicting academic performance of students using a hybrid data mining approach. J Med Syst 43(6):162. https://doi.org/10.1007/s10916-019-1295-4
García E, Romero C, Ventura S, de Castro C (2011) A collaborative educational association rule mining tool. Internet High Edu 14(2):77–88. https://doi.org/10.1016/j.iheduc.2010.07.006
Giannakas F, Troussas C, Voyiatzis I, Sgouropoulou C (2021) A deep learning classification framework for early prediction of team-based academic performance. Appl Soft Comput 106:107355. https://doi.org/10.1016/j.asoc.2021.107355
Gunduz N, Fokoue E (2013) UCI machine learning repository, University of California, School of Information and Computer Science., URL http://archive.ics.uci.edu/ml/index.php
Alfiani Harwati AP, Wulandari FA (2015) Mapping student’s performance based on data mining approach (a case study). Agric Agric Sci Procedia 3:173–177. https://doi.org/10.1016/j.aaspro.2015.01.034
Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310
Hussain M, Zhu W, Zhang W, Abidi SMR, Ali S (2019) Using machine learning to predict student difficulties from learning session data. Artif Intell Rev 52(1):381–407. https://doi.org/10.1007/s10462-018-9620-8
Islam MJ, Wu QMJ, Ahmadi M, Sid-Ahmed MA (2007) Investigating the performance of naive- bayes classifiers and k- nearest neighbor classifiers. In: 2007 international conference on convergence information technology (ICCIT 2007), pp. 1541–1546, Nov 2007. doi: https://doi.org/10.1109/ICCIT.2007.148
Izenman AJ (2008) Linear discriminant analysis. Springer, New York, pp 237–280. https://doi.org/10.1007/978-0-387-78189-1_8
Kaur P, Singh M, Josan GS (2015) Classification and prediction based data mining algorithms to predict slow learners in education sector. Procedia Comput Sci 57:500–508. https://doi.org/10.1016/j.procs.2015.07.372
Keshtkar F, Burkett C, Li H, Graesser AC (2014) Using data mining techniques to detect the personality of players in an educational game. Springer International Publishing, Cham, pp 125–150
Kesumawati A, Utari DT (2018) Predicting patterns of student graduation rates using naïve bayes classifier and support vector machine. AIP Conf Proc 2021(1):060005
Khan A, Ghosh SK (2021) Student performance analysis and prediction in classroom learning: a review of educational data mining studies. Educ Inf Technol 26(1):205–240
Kotsiantis SB (2012) Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades. Artif Intell Rev 37(4):331–344. https://doi.org/10.1007/s10462-011-9234-x
Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recognit 33(1):25–41
Lin CF, Yeh Y-C, Hung YH, Chang RI (2013) Data mining for providing a personalized learning path in creativity: an application of decision trees. Comput Educ 68:199–210. https://doi.org/10.1016/j.compedu.2013.05.009
Masci C, Johnes G, Agasisti T (2018) Student and school performance across countries: a machine learning approach. Eur J Oper Res 269(3):1072–1085. https://doi.org/10.1016/j.ejor.2018.02.031
Nakamura S, Nozaki K, Morimoto Y, Miyadera Y (2014) Sequential pattern mining method for analysis of programming learning history based on the learning process. In: 2014 international conference on education technologies and computers (ICETC), pp. 55–60. https://doi.org/10.1109/ICETC.2014.6998902
Njeru AM, Omar MS, Yi S, Paracha S, Wannous M (2017) Using IoT technology to improve online education through data mining. In: 2017 international conference on applied system innovation (ICASI), pp. 515–518. https://doi.org/10.1109/ICASI.2017.7988469
Ougiaroglou S, Paschalis G (2012) Association rules mining from the educational data of ESOG web-based application. In: Iliadis L, Maglogiannis I, Papadopoulos H, Karatzas K, Sioutas S (eds) Artificial intelligence applications and innovations. Springer, Berlin, pp 105–114
Oztekin A, Delen D, Turkyilmaz A, Zaim S (2013) A machine learning-based usability evaluation method for elearning systems. Decis Support Syst 56:63–73. https://doi.org/10.1016/j.dss.2013.05.003
Park Y, Yu JH, Jo I-H (2016) Clustering blended learning courses by online behavior data: a case study in a Korean higher education institute. Internet High Educ 29:1–11. https://doi.org/10.1016/j.iheduc.2015.11.001
Rana S, Garg R (2018) Information and communication technology for sustainable development. In: Nayak MK, Mishra DK, Joshi A (eds) Student’s performance evaluation of an institute using various classification algorithms. Springer, Singapore, pp 229–238
Romero C, Ventura S (2007) Educational data mining: a survey from 1995 to 2005. Exp Syst Appl 33(1):135–146
Romero C, Ventura S, García E (2008) Data mining in course management systems: moodle case study and tutorial. Comput Educ 51(1):368–384. https://doi.org/10.1016/j.compedu.2007.05.016
Romero C, Ventura S, Zafra A, de Bra P (2009) Applying web usage mining for personalizing hyperlinks in web-based adaptive educational systems. Comput Educ 53(3):828–840. https://doi.org/10.1016/j.compedu.2009.05.003
Simpson K, Beukelman D, Sharpe T (2000) An elementary student with severe expressive communication impairment in a general education classroom: sequential analysis of interactions. Augment Alternat Commun 16(2):107–121. https://doi.org/10.1080/07434610012331278944
Singh D, Singh B (2020) Investigating the impact of data normalization on classification performance. Appl Soft Comput 97:105524
Son LH, Fujita H (2019) Neural-fuzzy with representative sets for prediction of student performance. Appl Intell 49(1):172–187. https://doi.org/10.1007/s10489-018-1262-7
Tarus JK, Niu Z, Yousif A (2017) A hybrid knowledge-based recommender system for e-learning based on ontology and sequential pattern mining. Future Gener Comput Syst 72:37–48. https://doi.org/10.1016/j.future.2017.02.049
Turabieh H, Mafarja M, Li X (2019) Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Exp Syst Appl 122:27–42. https://doi.org/10.1016/j.eswa.2018.12.033
Turabieh H, Al Azwari S, Rokaya M, Alosaimi W, Alharbi A, Alhakami W, Alnfiai M (2021) Enhanced harris hawks optimization as a feature selection for the prediction of student performance. Computing 1–22
Valsamidis S, Kontogiannis S, Kazanidis I, Theodosiou T, Karakos A (2012) A clustering methodology of web log data for learning management systems. J Educ Technol Soc 15(2):154–167
Xu J, Moon KH, Van Der Schaar M (2017) A machine learning approach for tracking and predicting student performance in degree programs. IEEE J Sel Top Signal Process 11(5):742–753
Yang F, Li FW (2018) Study on student performance estimation, student progress analysis, and student potential prediction based on data mining. Comput Educ 123:97–108. https://doi.org/10.1016/j.compedu.2018.04.006
Şen B, Uçar E, Delen D (2012) Predicting and analyzing secondary education placement-test scores: a data mining approach. Exp Syst Appl 39(10):9468–9476. https://doi.org/10.1016/j.eswa.2012.02.112
Acknowledgements
The authors would like to acknowledge Taif University Researchers Supporting Project Number (TURSP-2020/125), Taif University, Taif, Saudi Arabia.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shreem, S.S., Turabieh, H., Al Azwari, S. et al. Enhanced binary genetic algorithm as a feature selection to predict student performance. Soft Comput 26, 1811–1823 (2022). https://doi.org/10.1007/s00500-021-06424-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06424-7