Predicting Freshmen Attrition in Computing Science using Data Mining

Naseem, Mohammed; Chaudhary, Kaylash; Sharma, Bibhya

doi:10.1007/s10639-022-11018-3

Predicting Freshmen Attrition in Computing Science using Data Mining

Published: 04 April 2022

Volume 27, pages 9587–9617, (2022)
Cite this article

Education and Information Technologies Aims and scope Submit manuscript

621 Accesses
4 Citations
Explore all metrics

Abstract

The need for a knowledge-based society has perpetuated an increasing demand for higher education around the globe. Recently, there has been an increase in the demand for Computer Science professionals due to the rise in the use of ICT in the business, health and education sector. The enrollment numbers in Computer Science undergraduate programmes are usually high, but unfortunately, many of these students drop out from or abscond these programmes, leading to a shortage of Computer Science professionals in the job market. One way to diminish if not completely eradicate this problem is to identify students who are at risk of dropping out and provide them with special intervention programmes that will help them to remain in their programmes till graduation. In this paper, data mining techniques were used to build predictive models that can identify student dropout in Computer Science programmes, more specifically focusing on freshmen attrition since a significant number of dropout occurs in the first year of university studies. The predictive models were built for three stages of the first academic year using five classification algorithms which were Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, and K-Nearest Neighbour. The models used past five years of institutional data stored in university’s repositories. Results show that the Naïve Bayes model performed better in stage 1 with an AUC of 0.6132 but in stages 2 and 3, the overall performance of the Logsitic Regression models were better with an AUC of 0.7523 and 0.8902, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Predicting Student Drop-Out Rates Using Data Mining Techniques: A Case Study

Comparison of Predictive Models with Balanced Classes for the Forecast of Student Dropout in Higher Education

Mining Pre-Grade Academic and Demographic Data to Predict University Dropout

References

Aguiar, E., Chawla, N. V., Brockman, J., Ambrose, G. A., & Goodrich, V. (2014). Engagement vs performance: using electronic portfolios to predict first semester engineering student retention. Proceedings of the Fourth International Conference on Learning Analytics And Knowledge (pp. 103–112). ACM.
Al-Badarenah, A., & Alsakran, J. (2016). An automated recommender system for course selection. International Journal of Advanced Computer Science and Applications, 7(3), 1166–1175.
Article Google Scholar
Aulck, L., Aras, R., Li, L., L'Heureux, C., Lu, P., & West, J. (2017). STEM-ming the Tide: Predicting STEM attrition using student transcript data. arXiv preprint arXiv:1708.09344.
Badr, G., Algobail, A., Almutairi, H., & Almutery, M. (2016). Predicting students’ performance in university courses: a case study and tool in KSU mathematics department. Procedia Computer Science 82, (pp. 80–89).
Baker, R. S., & Kalina, Y. (2009). The state of educational data mining in 2009: A review and future visions. JEDM Journal of Educational Data Mining, 1(1), 3–17.
Google Scholar
Beaubouef, T. (2002). Why computer science students need math. SIGCSE Bulletin, 34(4), 57–59.
Article Google Scholar
Blekic, M., Carpenter, R., & Cao, Y. (2017). Continuing and transfer students: Exploring retention and second-year success. Journal of College Student Retention: Research, Theory & Practice, 22(1), 71–98. https://doi.org/10.1177/1521025117726048
Article Google Scholar
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.
Article Google Scholar
Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Article MATH Google Scholar
Costa, E. B., Fonseca, B., Santana, M. A., de Araujo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256.
Article Google Scholar
Dekker, G. W., Pechenizkiy, M., & Vleeshouwers, J. M. (2009). Predicting Students Drop Out: A Case Study. International Working Group on Educational Data Mining. Cordoba, Spain.
Delen, D. (2012). Predicting Student Attrition with Data Mining Methods. Journal of College Student Retention: Research, Theory & Practice, 13(1), 17–35.
Article Google Scholar
Dolatabadi, S. H., & Keynia, F. (2017). Designing of customer and employee churn prediction model based on data mining method and neural predictor. In 2017 2nd International Conference on Computer and Communication Systems (ICCCS) (pp. 74–77). IEEE.
Evans, M. (2000). Planning for the transition to tertiary study: A literature. Journal of Institutional Research, 9(1), 1–13.
Google Scholar
Gairín, S. J., i Ivern, T., Ma, X., Feixas Condom, M., Gazo, P., Aparicio Chueca, M., & Torrado Fonseca, M. (2014).Student dropout rates in Catalan universities: Profile and motives for disengagement. Quality in Higher Education 20(2), 165-182
Ghadeer, A.-O.S., & Alaa, E.-H.M. (2015). Data Mining In Higher Education: University Student Dropout Case Study. International Journal of Data Mining & Knowledge Management Process (IJDKP), 5(1), 15–27.
Article Google Scholar
Giannakos, M. N., Pappas, I. O., Jaccheri, L., & Sampson, D. G. (2017). Understanding student retention in computer science education: The role of environment, gains, barriers and usefulness. Education and Information Technologies, 22(5), 2365–2382.
Article Google Scholar
Kansal, T., Bahuguna, S., Singh, V., & Choudhury, T. (2018). Customer Segmentation using K-means Clustering. In 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS). 7, pp. 135–139. IEEE.
Kaur, M., & Kang, S. (2016). Market Basket Analysis: Identify the changing trends of market data using association rule mining. Procedia Computer Science, 85, 78–85.
Article Google Scholar
Kazemi, A., Babaei, M. E., & Javad, M. O. (2015). A data mining approach for turning potential customers into real ones in basket purchase analysis. International Journal of Business Information Systems, 19(2), 139–158.
Article Google Scholar
Kemper, L., Vorhoff, G., & Wigger, B. (2020). Predicting student dropout: A machine learning approach. European Journal of Higher Education, 10(1), 28–47.
Article Google Scholar
Kori, K., Margus, P., Eno, T., Tauno, P., Heilo, A., Ramon, R., . . . Tiia, R. (2015). First-year dropout in ICT studies. 2015 IEEE Global Engineering Education Conference (EDUCON) (pp. 437–445). IEEE.
Kovacic, Z. (2010). Early prediction of student success: Mining students' enrolment data. Informing Science + Information Technology Education Joint Conference. Cassino, Italy. Retrieved from http://hdl.handle.net/11072/646
Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13.
Article Google Scholar
Lacave, C., Molina, A. I., & Cruz-Lemus, J. A. (2018). Learning Analytics to identify dropout factors of Computer Science studies through Bayesian networks. Behaviour & Information Technology, (pp. 1–15).
Lin, C. F., Yeh, Y. C., Hung, Y. H., & Chang, R. I. (2013). Data mining for providing a personalized learning path in creativity: An application of decision trees. Computers & Education, 68, 199–210.
Article Google Scholar
Márquez-Vera, C., Cano, A., Romero, C., & Ventura, S. (2013). Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied Intelligence, 38(3), 315–330.
Article Google Scholar
Minges, M., & Stork, C. (2015). Economic and social impact of ICT in the Pacific. Pacific Region Infrastructure Facility.
Google Scholar
Murtaugh, P. A., Burns, L. D., & Schuster, J. (1999). Predicting the retention of university students. Research in Higher Education, 40(3), 355–371.
Article Google Scholar
Olaya, D., Vásquez, J., Maldonado, S., Miranda, J., & Verbeke, W. (2020). Uplift Modeling for preventing student dropout in higher education. Decision Support Systems, 134, 113320.
Orozco, M. E., & Niguidula, J. C. (2017). Predicting Student Attrition Using Data Mining Predictive Models. Proceedings of 143rd The IIER International Conference. Jeju Island, South Korea.
Oztekin, A. (2016). A hybrid data analytic approach to predict college graduation status and its determinative factors. Industrial Management & Data Systems, 116(8), 1678–1699.
Article Google Scholar
Pal, S. (2012). Mining Educational Data to Reduce Dropout Rates of Engineering Students. International Journal of Information Engineering and Electronic Business, 2, 1–7.
Article Google Scholar
Patil, R., Salunke, S., Kalbhor, M., & Lomte, R. (2018). Prediction System for Student Performance Using Data Mining Classification. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) (pp. 1–4). IEEE.
Pérez, B., Castellanos, C., & Correal, D. (2018). Predicting student drop-out rates using data mining techniques: A case study. IEEE Colombian Conference on Applications in Computational Intelligence (pp. 111–125). Springer, Cham.
Reddy, P., & Sharma, B. (2015). Effectiveness of Tablet Learning in Online Courses at University of the South Pacific. Proceedings of Asia-Pacific World Congress on Computer Science and Engineering (pp. 1–9). Fiji: IEEE.
Reddy, E., & Sharma, B. (2018). Mobile Learning Perception and Attitude of Secondary School Students in the Pacific Islands. Proceedings of the 22nd Pacific Asia Conference on Information Systems (PACIS 2018). Yokohama, Japan. Retrieved from https://aisel.aisnet.org/pacis2018/319/
Richards, E., & Terkanian, D. (2013). Occupational employment projections to 2022. Monthly Labor Review, 136, 1.
Google Scholar
Rovira, S., Puertas, E., & Igual, L. (2017). Data-driven system to predict academic grades and dropout. PLoS ONE, 12(2), e0171207. https://doi.org/10.1371/journal.pone.0171207
Article Google Scholar
Schneider, K., Berens, J., Oster, S., & Burghoff, J. (2018). Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data and Machine Learning Methods. Annual Conference 2018 (Freiburg, Breisgau): Digital Economy. Verein für Socialpolitik / German Economic Association. Retrieved from https://ideas.repec.org/p/zbw/vfsc18/181544.html
Sharma, B., Jokhan, A., Kumar, R., Finiasi, R., Chand, S., & Rao, V. (2015). Use of Short Message Service for Learning and Student Support in the Pacific Region. In Y. Zhang, Handbook of Mobile Teaching and Learning. Springer.
Sharma, B., Kumar, R., Rao, V., Finiasi, R., Chand, S., Singh, V., & Naicker, R. (2017). A Mobile Learning Journey in Pacific Education. In Angela Murphy et al. (Eds) Mobile Learning in Higher Education in the Asia-Pacific Region – Harnessing Trends and Challenging Orthodoxies (Vol. 40, pp. 581–606).
Shilbayeh, S., & Abonamah, A. (2021). Predicting Student Enrolments and Attrition Patterns in Higher Educational Institutions using Machine Learning. International Arab Journal of Information Technology, 18(4), 562–567.
Article Google Scholar
Spady, W. G. (1970). Dropouts from higher education: An interdisciplinary review and synthesis. Interchange, 1(1), 64–85.
Article Google Scholar
Thammasiri, D., Delen, D., Meesad, P., & Kasap, N. (2014). A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition. Expert Systems with Applications, 41(2), 321–330.
Article Google Scholar
Tinto, V. (1975). Dropout from Higher Education: A theatrical synthesis of recent research. Review of Education Research, 45, 89–125.
Article Google Scholar
Uliyan, D., Aljaloud, A. S., Alkhalil, A., Al Amer, H. S., Mohamed, M. A., & Alogali, A. F. (2021). Deep Learning Model to Predict Students Retention Using BLSTM and CRF. IEEE Access, 9, 135550–135558.
Article Google Scholar
Yaacob, W. W., Sobri, M., Nasir, S. M., Norshahidi, N. D., & Husin, W. W. (2020). Predicting student drop-out in higher institution using data mining techniques. Journal of Physics: Conference Series, 1496(1), 1–13.
Google Scholar
Yasmin, D. (2013). Application of the classification tree model in predicting learner dropout behaviour in open and distance learning. Distance Education, 34(2), 218–231.
Article Google Scholar
Yu, C. H., DiGangi, S., Jannasch-Pennell, A., & Kaprolet, C. (2010). A Data Mining Approach for Identifying Predictors of Student Retention from Sophomore to Junior Year. Journal of Data Science, 8, 307–325.
Article Google Scholar
Yukselturk, E., Ozekes, S., & Türel, Y. K. (2014). Predicting dropout student: An application of data mining methods in an online education program. European Journal of Open, Distance and e-Learning, 17(1), 118–133.
Article Google Scholar
Zaffar, M., Hashmani, M. A., & Savita, K. S. (2018). A Study of Prediction Models for Students Enrolled in Programming Subjects. 2018 4th International Conference on Computer and Information Sciences (ICCOINS) (pp. 1–5). IEEE.

Download references

Acknowledgements

The partial data in the result was presented at the 2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) in Melbourne, VIC, Australia.

Anonymised raw data used in this study can be obtained from this link: https://github.com/mohd-naseem/Student-Attrition/blob/main/Student%20attrition.csv

Author information

Authors and Affiliations

School of Information Technology, Engineering, Mathematics and Physics, The University of the South Pacific, Suva, Fiji
Mohammed Naseem, Kaylash Chaudhary & Bibhya Sharma

Authors

Mohammed Naseem
View author publications
You can also search for this author in PubMed Google Scholar
Kaylash Chaudhary
View author publications
You can also search for this author in PubMed Google Scholar
Bibhya Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Naseem.

Ethics declarations

Conflict of Interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naseem, M., Chaudhary, K. & Sharma, B. Predicting Freshmen Attrition in Computing Science using Data Mining. Educ Inf Technol 27, 9587–9617 (2022). https://doi.org/10.1007/s10639-022-11018-3

Download citation

Received: 25 August 2021
Accepted: 22 March 2022
Published: 04 April 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10639-022-11018-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Predicting Freshmen Attrition in Computing Science using Data Mining

Abstract

Access this article

Similar content being viewed by others

Predicting Student Drop-Out Rates Using Data Mining Techniques: A Case Study

Comparison of Predictive Models with Balanced Classes for the Forecast of Student Dropout in Higher Education

Mining Pre-Grade Academic and Demographic Data to Predict University Dropout

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting Freshmen Attrition in Computing Science using Data Mining

Abstract

Access this article

Similar content being viewed by others

Predicting Student Drop-Out Rates Using Data Mining Techniques: A Case Study

Comparison of Predictive Models with Balanced Classes for the Forecast of Student Dropout in Higher Education

Mining Pre-Grade Academic and Demographic Data to Predict University Dropout

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation