Skip to main content

Advertisement

Log in

A Hybrid Approach Using Decision Tree and Multiple Linear Regression for Predicting Students’ Performance Based on Learning Progress and Behavior

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Analyzing factors related to learning progress such as coursework scores, how many times students were occasion, plagiarism or failure, and time spent at the library helps to determine factors in the reduction of dropouts. Many researchers have used traditional methods to predict students' academic performance, and a few research studies have developed a new hybrid approach, a combined classification and prediction method in this field. This study has assessed students’ performance using a hybrid method including a decision tree and multiple linear regression to predict their possibility of graduation. Specifically, the decision tree model is used to classify the ‘Adequate’ and ‘Fair’ classes. Then, multiple linear regression models were used to predict future Cumulative Grade Point Average (CGPA). After evaluating the statistics, the first and second coursework scores exhibit a significant impact on the results. Other attributes such as time spent at the campus or the number of times that students failed in the previous semester should be considered in this context. The decision tree model’s accuracy is 0.47 and the Correlation Coefficient of the multiple linear models is 0.52. The result of this research is an equation with a specific weighted score toward the final results. This, in turn, would ensure early and appropriate actions from education to increase the academic achievement of such students.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Ching K. Data mining analysis on student’s academic performance through exploration of student’s background and social activities. In: Fourth international conference on advances in computing, communication & automation. 2018, pp. 1–5.

  2. Ibrahim Z and Rusli D. Predicting students’ academic performance: comparing artificial neural network, decision tree and linear regression. In: 21st Annual SAS Malaysia Forum. 2007.

  3. Norhidayah A, Jusoff K, Ali S, Najah M, Salamat A. The factors influencing students’ performance at Universiti Teknologi MARA Kedah. Malaysia: Canadian Research & Development Center of Sciences and Cultures; 2009.

    Google Scholar 

  4. Darling-Hammond L, Flook L, Cook-Harvey C, Barron B, Osher D. Implications for educational practice of the science of learning and development. Appl Dev Sci. 2020. https://doi.org/10.1080/10888691.2018.1537791.

    Article  Google Scholar 

  5. Alexeyev A, S. T. Decision-making support system for experts of penal law: data-centric business and applications. Berlin: Springer; 2020. p. 163–82.

    Google Scholar 

  6. Ashraf A, Anwer S, Khan MG. A comparative study of predicting student’s performance by use of data mining techniques. Am Sci Res J Eng Technol Sci. 2018;44(1):122–36.

    Google Scholar 

  7. OECD. How many students complete tertiary education. Educ Glance. 2019. https://doi.org/10.1787/19991487.

    Article  Google Scholar 

  8. Elana Z. The structural consequences of big data-driven education. Big Data. 2017. https://doi.org/10.1089/big.2016.0061.

    Article  Google Scholar 

  9. Sukumar Letchuman MW, Mac R. Pragmatic cost estimation for web applications. (n.d.).

  10. Erdem C, Şentürk İ, Arslan C. Factors affecting grade point average of university students. Empirical Econ Lett. 2007;6(5):360–8.

    Google Scholar 

  11. Christian TM, Ayub M. Exploration of classification using nbtree for predicting students’ performance. In: 2014 international conference on data and software engineering (ICODSE); 2014. p. 1–6. https://doi.org/10.1109/ICODSE.2014.7062654.

  12. bin Mat U, Buniyamin N, Arsad PM, Kassim R. An overview of using academic analytics to predict and improve students’ achievement: a proposed proactive intelligent intervention. In: 2013 IEEE 5th conference on engineering education (ICEED); 2013. p. 126–30. https://doi.org/10.1109/ICEED.2013.6908316.

  13. Alyahyan E, Dustegor D. Predicting academic success in higher education literature review and best practices. Int J Educ Technol High Educ. 2020. https://doi.org/10.1186/s41239-020-0177-7.

    Article  Google Scholar 

  14. Dayioglu M, Türüt-Asik S. Gender differences in academic performance in a large public University in Turkey. In: Economic Research Center, Middle East Technical University, Working Papers. 2004. vol 53, no. 2. p. 255–277.

  15. Nguyen HHX, Dang TK, Nguyen ND. A hybrid approach using decision tree and multiple linear regression for predicting students’ performance. In: Future data and security engineering. big data, security and privacy, Smart City and Industry 4.0 Applications. 2021, pp. 23–35. https://doi.org/10.1007/978-981-16-8062-5_2.

  16. Angeline DMD. Association rule generation for student performance analysis using apriori algorithm. SIJ Trans Comput Sci Eng Appl. 2013;1:12–16.

    Google Scholar 

  17. Deepika K, Sathyanarayana N. Hybrid model for improving student academic performance. Int J Adv Res Eng Technol. 2020;11(10):768–779

    Google Scholar 

  18. Mueen A, Zafar B, Manzoor U. Modeling and predicting students’ academic performance using data mining techniques. Int J Modern Educ Comput Sci. 2016;8(11):36–42.

    Article  Google Scholar 

  19. Shahiri AM, Husain W, Rashid NA. A Review on predicting student’s performance using data mining techniques. Procedia Comput Sci. 2015;72:414–22.

    Article  Google Scholar 

  20. Xu H. Prediction of students’ performance based on the hybrid IDA-SVR model. Complexity. 2022. https://doi.org/10.1155/2022/1845571.

    Article  Google Scholar 

  21. Chowdry H, Crawford C, Dearden L, Goodman A, Vignoles A. Widening participation in higher education: analysis using linked administrative data. J Roy Stat Soc: Ser A: (Stat Soc). 2013;176(2):431–457.

    Article  MathSciNet  Google Scholar 

  22. Witten IH, Frank E, Hall MA, Pal CJ. Data mining: practical machine learning tools and techniques. In: Data mining: practical machine learning tools and techniques (3 rd). 2005. vol. 2, no 4

  23. Al-Barrak M, Al-Razgan M. Predicting students final GPA using decision trees: a case study. Int J Inf Educ Technol. 2016. https://doi.org/10.7763/IJIET.2016.V6.745.

    Article  Google Scholar 

  24. Hamoud AK, Hashim AS, Awadh WA. Predicting Student Performance in Higher Education Institutions Using Decision Tree Analysis. Int J Interact Multimed Artif Intell. 2018;5(2):26.

    Google Scholar 

  25. Putpuek N, Rojanaprasert N, Atchariyachanvanich K, Thamrongthanyawong T. Comparative study of prediction models for final GPA score: a case study of Rajabhat Rajanagarindra University. In: International Conference on Computer and Information Science. 2018. pp. 92–97.

  26. Hssina B, Merbouha A, Ezzikouri H, Erritali M. A comparative study of decision tree ID3 and C4.5. Int J Adv Comput Sci Appl. 2014. https://doi.org/10.14569/SpecialIssue.2014.040203.

    Article  Google Scholar 

  27. Francis B, Babu S. Predicting academic performance of students using a hybrid data mining approach. J Med Syst. 2019. https://doi.org/10.1007/s10916-019-1295-4.

    Article  Google Scholar 

  28. Madni HA, Anwar Z, Shah MA. Data mining techniques and applications—a decade review. In: Proceedings of the international conference on automation and computing; 2017. p. 1–7. https://doi.org/10.23919/IConAC.2017.8082090.

  29. Hasan R, Palaniappan S, Mahmood S, Abbas A, Sarker KU. Dataset of students’ performance using student information system, moodle and the mobile application “eDify.” Data. 2021. https://doi.org/10.3390/data6110110.

    Article  Google Scholar 

  30. Rahman L, Setiawan NA and Permanasari AE. Feature selection methods in improving accuracy of classifying students’ academic performance. In: 2nd International conferences on Information Technology, Information Systems and Electrical Engineering; 2017. pp. 267–271. https://doi.org/10.1109/ICITISEE.2017.8285509.

  31. Amrieh E, Hamtini T, Aljarah I. Mining educational data to predict student’s academic performance using ensemble methods. Int J Database Theory Appl. 2016;9(8):119–36.

    Article  Google Scholar 

  32. Pereira RB, Plastino A, Zadrozny B, Merschmann LH. Information gain feature selection for multi-label classification. J Inf Data Manag. 2015;6: 48–58

    Google Scholar 

  33. Thiele T, Singleton A, Pope D, Stanistreet D. Predicting students’ academic performance based on school and socio-demographic characteristics. Stud High Educ. 2016;41(8):1424–46.

    Article  Google Scholar 

  34. Brodley C, Lane T, Stough T. Knowledge discovery and data mining. Am Sci. 1999. https://doi.org/10.1511/1999.1.54.

    Article  Google Scholar 

  35. Maimon O, Rokach L. Data mining and knowledge discovery handbook. Springer. 2010. https://doi.org/10.1007/0-387-25465-X_9.

    Article  MATH  Google Scholar 

  36. Kohavi R, Quinlan R. Data mining tasks and methods: classification: decision-tree discovery; 2002. p. 267–276.

  37. Hämäläinen W, Vinni M. Classifiers for educational data mining. In: Handbook on Educational Data Mining. n.d., pp. 54–74.

  38. Jobson JD. Multiple linear regression: applied multivariate data analysis: regression and experimental design. New York: Springer; 1991. p. 219–398.

    Google Scholar 

  39. Chicco D, Warrens M, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peer Comput Sci. 2021. https://doi.org/10.7717/peerj-cs.623.

    Article  Google Scholar 

  40. Sánchez-Maroño N, Alonso-Betanzos A, Calvo-Estévez R. A wrapper method for feature selection in multiple classes datasets. In: International work-conference on artificial neural networks. 2009, pp. 456–63. https://doi.org/10.1007/978-3-642-02478-8_57.

  41. Struyven K, Dochy F and Janssens S. Students’ perceptions about new modes of assessment in higher education: a review. 2006. https://doi.org/10.1007/0-306-48125-1_8.

  42. Richardson J. Coursework versus examinations in end-of-module assessment: a literature review. Assess Eval High Educ. 2014. https://doi.org/10.1080/02602938.2014.919628.

    Article  Google Scholar 

  43. Zhang D, Zhou L, Briggs R, Nunamaker J. Instructional video in e-learning: assessing the impact of interactive video on learning effectiveness. Inf Manag. 2006;43:15–27. https://doi.org/10.1016/j.im.2005.01.004.

    Article  Google Scholar 

  44. Moore DS, Notz WI, Flinger MA. The basic practice of statistics. New York: W. H. Freeman and Company; 2013. p. 138.

    Google Scholar 

  45. Nguyen LH, Dang TK. Alpha lightweight coreset for k-means clustering. In: 16th international conference on ubiquitous information management and communication, IMCOM 2022, Seoul, Korea, Republic of 2022. p. 1–8.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huu Huong Xuan Nguyen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Future Data and Security Engineering 2021” guest edited by Tran Khanh Dang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dang, T.K., Nguyen, H.H.X. A Hybrid Approach Using Decision Tree and Multiple Linear Regression for Predicting Students’ Performance Based on Learning Progress and Behavior. SN COMPUT. SCI. 3, 393 (2022). https://doi.org/10.1007/s42979-022-01251-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01251-5

Keywords