Skip to main content

Advertisement

Log in

Predicting Freshmen Attrition in Computing Science using Data Mining

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

The need for a knowledge-based society has perpetuated an increasing demand for higher education around the globe. Recently, there has been an increase in the demand for Computer Science professionals due to the rise in the use of ICT in the business, health and education sector. The enrollment numbers in Computer Science undergraduate programmes are usually high, but unfortunately, many of these students drop out from or abscond these programmes, leading to a shortage of Computer Science professionals in the job market. One way to diminish if not completely eradicate this problem is to identify students who are at risk of dropping out and provide them with special intervention programmes that will help them to remain in their programmes till graduation. In this paper, data mining techniques were used to build predictive models that can identify student dropout in Computer Science programmes, more specifically focusing on freshmen attrition since a significant number of dropout occurs in the first year of university studies. The predictive models were built for three stages of the first academic year using five classification algorithms which were Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, and K-Nearest Neighbour. The models used past five years of institutional data stored in university’s repositories. Results show that the Naïve Bayes model performed better in stage 1 with an AUC of 0.6132 but in stages 2 and 3, the overall performance of the Logsitic Regression models were better with an AUC of 0.7523 and 0.8902, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Aguiar, E., Chawla, N. V., Brockman, J., Ambrose, G. A., & Goodrich, V. (2014). Engagement vs performance: using electronic portfolios to predict first semester engineering student retention. Proceedings of the Fourth International Conference on Learning Analytics And Knowledge (pp. 103–112). ACM.

  • Al-Badarenah, A., & Alsakran, J. (2016). An automated recommender system for course selection. International Journal of Advanced Computer Science and Applications, 7(3), 1166–1175.

    Article  Google Scholar 

  • Aulck, L., Aras, R., Li, L., L'Heureux, C., Lu, P., & West, J. (2017). STEM-ming the Tide: Predicting STEM attrition using student transcript data. arXiv preprint arXiv:1708.09344.

  • Badr, G., Algobail, A., Almutairi, H., & Almutery, M. (2016). Predicting students’ performance in university courses: a case study and tool in KSU mathematics department. Procedia Computer Science 82, (pp. 80–89).

  • Baker, R. S., & Kalina, Y. (2009). The state of educational data mining in 2009: A review and future visions. JEDM Journal of Educational Data Mining, 1(1), 3–17.

    Google Scholar 

  • Beaubouef, T. (2002). Why computer science students need math. SIGCSE Bulletin, 34(4), 57–59.

    Article  Google Scholar 

  • Blekic, M., Carpenter, R., & Cao, Y. (2017). Continuing and transfer students: Exploring retention and second-year success. Journal of College Student Retention: Research, Theory & Practice, 22(1), 71–98. https://doi.org/10.1177/1521025117726048

    Article  Google Scholar 

  • Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.

    Article  Google Scholar 

  • Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  • Costa, E. B., Fonseca, B., Santana, M. A., de Araujo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256.

    Article  Google Scholar 

  • Dekker, G. W., Pechenizkiy, M., & Vleeshouwers, J. M. (2009). Predicting Students Drop Out: A Case Study. International Working Group on Educational Data Mining. Cordoba, Spain.

  • Delen, D. (2012). Predicting Student Attrition with Data Mining Methods. Journal of College Student Retention: Research, Theory & Practice, 13(1), 17–35.

    Article  Google Scholar 

  • Dolatabadi, S. H., & Keynia, F. (2017). Designing of customer and employee churn prediction model based on data mining method and neural predictor. In 2017 2nd International Conference on Computer and Communication Systems (ICCCS) (pp. 74–77). IEEE.

  • Evans, M. (2000). Planning for the transition to tertiary study: A literature. Journal of Institutional Research, 9(1), 1–13.

    Google Scholar 

  • Gairín, S. J., i Ivern, T., Ma, X., Feixas Condom, M., Gazo, P., Aparicio Chueca, M., & Torrado Fonseca, M. (2014).Student dropout rates in Catalan universities: Profile and motives for disengagement. Quality in Higher Education 20(2), 165-182

  • Ghadeer, A.-O.S., & Alaa, E.-H.M. (2015). Data Mining In Higher Education: University Student Dropout Case Study. International Journal of Data Mining & Knowledge Management Process (IJDKP), 5(1), 15–27.

    Article  Google Scholar 

  • Giannakos, M. N., Pappas, I. O., Jaccheri, L., & Sampson, D. G. (2017). Understanding student retention in computer science education: The role of environment, gains, barriers and usefulness. Education and Information Technologies, 22(5), 2365–2382.

    Article  Google Scholar 

  • Kansal, T., Bahuguna, S., Singh, V., & Choudhury, T. (2018). Customer Segmentation using K-means Clustering. In 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS). 7, pp. 135–139. IEEE.

  • Kaur, M., & Kang, S. (2016). Market Basket Analysis: Identify the changing trends of market data using association rule mining. Procedia Computer Science, 85, 78–85.

    Article  Google Scholar 

  • Kazemi, A., Babaei, M. E., & Javad, M. O. (2015). A data mining approach for turning potential customers into real ones in basket purchase analysis. International Journal of Business Information Systems, 19(2), 139–158.

    Article  Google Scholar 

  • Kemper, L., Vorhoff, G., & Wigger, B. (2020). Predicting student dropout: A machine learning approach. European Journal of Higher Education, 10(1), 28–47.

    Article  Google Scholar 

  • Kori, K., Margus, P., Eno, T., Tauno, P., Heilo, A., Ramon, R., . . . Tiia, R. (2015). First-year dropout in ICT studies. 2015 IEEE Global Engineering Education Conference (EDUCON) (pp. 437–445). IEEE.

  • Kovacic, Z. (2010). Early prediction of student success: Mining students' enrolment data. Informing Science + Information Technology Education Joint Conference. Cassino, Italy. Retrieved from http://hdl.handle.net/11072/646

  • Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13.

    Article  Google Scholar 

  • Lacave, C., Molina, A. I., & Cruz-Lemus, J. A. (2018). Learning Analytics to identify dropout factors of Computer Science studies through Bayesian networks. Behaviour & Information Technology, (pp. 1–15).

  • Lin, C. F., Yeh, Y. C., Hung, Y. H., & Chang, R. I. (2013). Data mining for providing a personalized learning path in creativity: An application of decision trees. Computers & Education, 68, 199–210.

    Article  Google Scholar 

  • Márquez-Vera, C., Cano, A., Romero, C., & Ventura, S. (2013). Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied Intelligence, 38(3), 315–330.

    Article  Google Scholar 

  • Minges, M., & Stork, C. (2015). Economic and social impact of ICT in the Pacific. Pacific Region Infrastructure Facility.

    Google Scholar 

  • Murtaugh, P. A., Burns, L. D., & Schuster, J. (1999). Predicting the retention of university students. Research in Higher Education, 40(3), 355–371.

    Article  Google Scholar 

  • Olaya, D., Vásquez, J., Maldonado, S., Miranda, J., & Verbeke, W. (2020). Uplift Modeling for preventing student dropout in higher education. Decision Support Systems, 134, 113320.

  • Orozco, M. E., & Niguidula, J. C. (2017). Predicting Student Attrition Using Data Mining Predictive Models. Proceedings of 143rd The IIER International Conference. Jeju Island, South Korea.

  • Oztekin, A. (2016). A hybrid data analytic approach to predict college graduation status and its determinative factors. Industrial Management & Data Systems, 116(8), 1678–1699.

    Article  Google Scholar 

  • Pal, S. (2012). Mining Educational Data to Reduce Dropout Rates of Engineering Students. International Journal of Information Engineering and Electronic Business, 2, 1–7.

    Article  Google Scholar 

  • Patil, R., Salunke, S., Kalbhor, M., & Lomte, R. (2018). Prediction System for Student Performance Using Data Mining Classification. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) (pp. 1–4). IEEE.

  • Pérez, B., Castellanos, C., & Correal, D. (2018). Predicting student drop-out rates using data mining techniques: A case study. IEEE Colombian Conference on Applications in Computational Intelligence (pp. 111–125). Springer, Cham.

  • Reddy, P., & Sharma, B. (2015). Effectiveness of Tablet Learning in Online Courses at University of the South Pacific. Proceedings of Asia-Pacific World Congress on Computer Science and Engineering (pp. 1–9). Fiji: IEEE.

  • Reddy, E., & Sharma, B. (2018). Mobile Learning Perception and Attitude of Secondary School Students in the Pacific Islands. Proceedings of the 22nd Pacific Asia Conference on Information Systems (PACIS 2018). Yokohama, Japan. Retrieved from https://aisel.aisnet.org/pacis2018/319/

  • Richards, E., & Terkanian, D. (2013). Occupational employment projections to 2022. Monthly Labor Review, 136, 1.

    Google Scholar 

  • Rovira, S., Puertas, E., & Igual, L. (2017). Data-driven system to predict academic grades and dropout. PLoS ONE, 12(2), e0171207. https://doi.org/10.1371/journal.pone.0171207

    Article  Google Scholar 

  • Schneider, K., Berens, J., Oster, S., & Burghoff, J. (2018). Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data and Machine Learning Methods. Annual Conference 2018 (Freiburg, Breisgau): Digital Economy. Verein für Socialpolitik / German Economic Association. Retrieved from https://ideas.repec.org/p/zbw/vfsc18/181544.html

  • Sharma, B., Jokhan, A., Kumar, R., Finiasi, R., Chand, S., & Rao, V. (2015). Use of Short Message Service for Learning and Student Support in the Pacific Region. In Y. Zhang, Handbook of Mobile Teaching and Learning. Springer.

  • Sharma, B., Kumar, R., Rao, V., Finiasi, R., Chand, S., Singh, V., & Naicker, R. (2017). A Mobile Learning Journey in Pacific Education. In Angela Murphy et al. (Eds) Mobile Learning in Higher Education in the Asia-Pacific Region – Harnessing Trends and Challenging Orthodoxies (Vol. 40, pp. 581–606).

  • Shilbayeh, S., & Abonamah, A. (2021). Predicting Student Enrolments and Attrition Patterns in Higher Educational Institutions using Machine Learning. International Arab Journal of Information Technology, 18(4), 562–567.

    Article  Google Scholar 

  • Spady, W. G. (1970). Dropouts from higher education: An interdisciplinary review and synthesis. Interchange, 1(1), 64–85.

    Article  Google Scholar 

  • Thammasiri, D., Delen, D., Meesad, P., & Kasap, N. (2014). A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition. Expert Systems with Applications, 41(2), 321–330.

    Article  Google Scholar 

  • Tinto, V. (1975). Dropout from Higher Education: A theatrical synthesis of recent research. Review of Education Research, 45, 89–125.

    Article  Google Scholar 

  • Uliyan, D., Aljaloud, A. S., Alkhalil, A., Al Amer, H. S., Mohamed, M. A., & Alogali, A. F. (2021). Deep Learning Model to Predict Students Retention Using BLSTM and CRF. IEEE Access, 9, 135550–135558.

    Article  Google Scholar 

  • Yaacob, W. W., Sobri, M., Nasir, S. M., Norshahidi, N. D., & Husin, W. W. (2020). Predicting student drop-out in higher institution using data mining techniques. Journal of Physics: Conference Series, 1496(1), 1–13.

    Google Scholar 

  • Yasmin, D. (2013). Application of the classification tree model in predicting learner dropout behaviour in open and distance learning. Distance Education, 34(2), 218–231.

    Article  Google Scholar 

  • Yu, C. H., DiGangi, S., Jannasch-Pennell, A., & Kaprolet, C. (2010). A Data Mining Approach for Identifying Predictors of Student Retention from Sophomore to Junior Year. Journal of Data Science, 8, 307–325.

    Article  Google Scholar 

  • Yukselturk, E., Ozekes, S., & Türel, Y. K. (2014). Predicting dropout student: An application of data mining methods in an online education program. European Journal of Open, Distance and e-Learning, 17(1), 118–133.

    Article  Google Scholar 

  • Zaffar, M., Hashmani, M. A., & Savita, K. S. (2018). A Study of Prediction Models for Students Enrolled in Programming Subjects. 2018 4th International Conference on Computer and Information Sciences (ICCOINS) (pp. 1–5). IEEE.

Download references

Acknowledgements

The partial data in the result was presented at the 2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) in Melbourne, VIC, Australia.

Anonymised raw data used in this study can be obtained from this link: https://github.com/mohd-naseem/Student-Attrition/blob/main/Student%20attrition.csv

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Naseem.

Ethics declarations

Conflict of Interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naseem, M., Chaudhary, K. & Sharma, B. Predicting Freshmen Attrition in Computing Science using Data Mining. Educ Inf Technol 27, 9587–9617 (2022). https://doi.org/10.1007/s10639-022-11018-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-022-11018-3

Keywords

Navigation