Abstract
Analyzing factors related to learning progress such as coursework scores, how many times students were occasion, plagiarism or failure, and time spent at the library helps to determine factors in the reduction of dropouts. Many researchers have used traditional methods to predict students' academic performance, and a few research studies have developed a new hybrid approach, a combined classification and prediction method in this field. This study has assessed students’ performance using a hybrid method including a decision tree and multiple linear regression to predict their possibility of graduation. Specifically, the decision tree model is used to classify the ‘Adequate’ and ‘Fair’ classes. Then, multiple linear regression models were used to predict future Cumulative Grade Point Average (CGPA). After evaluating the statistics, the first and second coursework scores exhibit a significant impact on the results. Other attributes such as time spent at the campus or the number of times that students failed in the previous semester should be considered in this context. The decision tree model’s accuracy is 0.47 and the Correlation Coefficient of the multiple linear models is 0.52. The result of this research is an equation with a specific weighted score toward the final results. This, in turn, would ensure early and appropriate actions from education to increase the academic achievement of such students.






Similar content being viewed by others
References
Ching K. Data mining analysis on student’s academic performance through exploration of student’s background and social activities. In: Fourth international conference on advances in computing, communication & automation. 2018, pp. 1–5.
Ibrahim Z and Rusli D. Predicting students’ academic performance: comparing artificial neural network, decision tree and linear regression. In: 21st Annual SAS Malaysia Forum. 2007.
Norhidayah A, Jusoff K, Ali S, Najah M, Salamat A. The factors influencing students’ performance at Universiti Teknologi MARA Kedah. Malaysia: Canadian Research & Development Center of Sciences and Cultures; 2009.
Darling-Hammond L, Flook L, Cook-Harvey C, Barron B, Osher D. Implications for educational practice of the science of learning and development. Appl Dev Sci. 2020. https://doi.org/10.1080/10888691.2018.1537791.
Alexeyev A, S. T. Decision-making support system for experts of penal law: data-centric business and applications. Berlin: Springer; 2020. p. 163–82.
Ashraf A, Anwer S, Khan MG. A comparative study of predicting student’s performance by use of data mining techniques. Am Sci Res J Eng Technol Sci. 2018;44(1):122–36.
OECD. How many students complete tertiary education. Educ Glance. 2019. https://doi.org/10.1787/19991487.
Elana Z. The structural consequences of big data-driven education. Big Data. 2017. https://doi.org/10.1089/big.2016.0061.
Sukumar Letchuman MW, Mac R. Pragmatic cost estimation for web applications. (n.d.).
Erdem C, Şentürk İ, Arslan C. Factors affecting grade point average of university students. Empirical Econ Lett. 2007;6(5):360–8.
Christian TM, Ayub M. Exploration of classification using nbtree for predicting students’ performance. In: 2014 international conference on data and software engineering (ICODSE); 2014. p. 1–6. https://doi.org/10.1109/ICODSE.2014.7062654.
bin Mat U, Buniyamin N, Arsad PM, Kassim R. An overview of using academic analytics to predict and improve students’ achievement: a proposed proactive intelligent intervention. In: 2013 IEEE 5th conference on engineering education (ICEED); 2013. p. 126–30. https://doi.org/10.1109/ICEED.2013.6908316.
Alyahyan E, Dustegor D. Predicting academic success in higher education literature review and best practices. Int J Educ Technol High Educ. 2020. https://doi.org/10.1186/s41239-020-0177-7.
Dayioglu M, Türüt-Asik S. Gender differences in academic performance in a large public University in Turkey. In: Economic Research Center, Middle East Technical University, Working Papers. 2004. vol 53, no. 2. p. 255–277.
Nguyen HHX, Dang TK, Nguyen ND. A hybrid approach using decision tree and multiple linear regression for predicting students’ performance. In: Future data and security engineering. big data, security and privacy, Smart City and Industry 4.0 Applications. 2021, pp. 23–35. https://doi.org/10.1007/978-981-16-8062-5_2.
Angeline DMD. Association rule generation for student performance analysis using apriori algorithm. SIJ Trans Comput Sci Eng Appl. 2013;1:12–16.
Deepika K, Sathyanarayana N. Hybrid model for improving student academic performance. Int J Adv Res Eng Technol. 2020;11(10):768–779
Mueen A, Zafar B, Manzoor U. Modeling and predicting students’ academic performance using data mining techniques. Int J Modern Educ Comput Sci. 2016;8(11):36–42.
Shahiri AM, Husain W, Rashid NA. A Review on predicting student’s performance using data mining techniques. Procedia Comput Sci. 2015;72:414–22.
Xu H. Prediction of students’ performance based on the hybrid IDA-SVR model. Complexity. 2022. https://doi.org/10.1155/2022/1845571.
Chowdry H, Crawford C, Dearden L, Goodman A, Vignoles A. Widening participation in higher education: analysis using linked administrative data. J Roy Stat Soc: Ser A: (Stat Soc). 2013;176(2):431–457.
Witten IH, Frank E, Hall MA, Pal CJ. Data mining: practical machine learning tools and techniques. In: Data mining: practical machine learning tools and techniques (3 rd). 2005. vol. 2, no 4
Al-Barrak M, Al-Razgan M. Predicting students final GPA using decision trees: a case study. Int J Inf Educ Technol. 2016. https://doi.org/10.7763/IJIET.2016.V6.745.
Hamoud AK, Hashim AS, Awadh WA. Predicting Student Performance in Higher Education Institutions Using Decision Tree Analysis. Int J Interact Multimed Artif Intell. 2018;5(2):26.
Putpuek N, Rojanaprasert N, Atchariyachanvanich K, Thamrongthanyawong T. Comparative study of prediction models for final GPA score: a case study of Rajabhat Rajanagarindra University. In: International Conference on Computer and Information Science. 2018. pp. 92–97.
Hssina B, Merbouha A, Ezzikouri H, Erritali M. A comparative study of decision tree ID3 and C4.5. Int J Adv Comput Sci Appl. 2014. https://doi.org/10.14569/SpecialIssue.2014.040203.
Francis B, Babu S. Predicting academic performance of students using a hybrid data mining approach. J Med Syst. 2019. https://doi.org/10.1007/s10916-019-1295-4.
Madni HA, Anwar Z, Shah MA. Data mining techniques and applications—a decade review. In: Proceedings of the international conference on automation and computing; 2017. p. 1–7. https://doi.org/10.23919/IConAC.2017.8082090.
Hasan R, Palaniappan S, Mahmood S, Abbas A, Sarker KU. Dataset of students’ performance using student information system, moodle and the mobile application “eDify.” Data. 2021. https://doi.org/10.3390/data6110110.
Rahman L, Setiawan NA and Permanasari AE. Feature selection methods in improving accuracy of classifying students’ academic performance. In: 2nd International conferences on Information Technology, Information Systems and Electrical Engineering; 2017. pp. 267–271. https://doi.org/10.1109/ICITISEE.2017.8285509.
Amrieh E, Hamtini T, Aljarah I. Mining educational data to predict student’s academic performance using ensemble methods. Int J Database Theory Appl. 2016;9(8):119–36.
Pereira RB, Plastino A, Zadrozny B, Merschmann LH. Information gain feature selection for multi-label classification. J Inf Data Manag. 2015;6: 48–58
Thiele T, Singleton A, Pope D, Stanistreet D. Predicting students’ academic performance based on school and socio-demographic characteristics. Stud High Educ. 2016;41(8):1424–46.
Brodley C, Lane T, Stough T. Knowledge discovery and data mining. Am Sci. 1999. https://doi.org/10.1511/1999.1.54.
Maimon O, Rokach L. Data mining and knowledge discovery handbook. Springer. 2010. https://doi.org/10.1007/0-387-25465-X_9.
Kohavi R, Quinlan R. Data mining tasks and methods: classification: decision-tree discovery; 2002. p. 267–276.
Hämäläinen W, Vinni M. Classifiers for educational data mining. In: Handbook on Educational Data Mining. n.d., pp. 54–74.
Jobson JD. Multiple linear regression: applied multivariate data analysis: regression and experimental design. New York: Springer; 1991. p. 219–398.
Chicco D, Warrens M, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peer Comput Sci. 2021. https://doi.org/10.7717/peerj-cs.623.
Sánchez-Maroño N, Alonso-Betanzos A, Calvo-Estévez R. A wrapper method for feature selection in multiple classes datasets. In: International work-conference on artificial neural networks. 2009, pp. 456–63. https://doi.org/10.1007/978-3-642-02478-8_57.
Struyven K, Dochy F and Janssens S. Students’ perceptions about new modes of assessment in higher education: a review. 2006. https://doi.org/10.1007/0-306-48125-1_8.
Richardson J. Coursework versus examinations in end-of-module assessment: a literature review. Assess Eval High Educ. 2014. https://doi.org/10.1080/02602938.2014.919628.
Zhang D, Zhou L, Briggs R, Nunamaker J. Instructional video in e-learning: assessing the impact of interactive video on learning effectiveness. Inf Manag. 2006;43:15–27. https://doi.org/10.1016/j.im.2005.01.004.
Moore DS, Notz WI, Flinger MA. The basic practice of statistics. New York: W. H. Freeman and Company; 2013. p. 138.
Nguyen LH, Dang TK. Alpha lightweight coreset for k-means clustering. In: 16th international conference on ubiquitous information management and communication, IMCOM 2022, Seoul, Korea, Republic of 2022. p. 1–8.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Future Data and Security Engineering 2021” guest edited by Tran Khanh Dang.
Rights and permissions
About this article
Cite this article
Dang, T.K., Nguyen, H.H.X. A Hybrid Approach Using Decision Tree and Multiple Linear Regression for Predicting Students’ Performance Based on Learning Progress and Behavior. SN COMPUT. SCI. 3, 393 (2022). https://doi.org/10.1007/s42979-022-01251-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01251-5