Abstract
This study attempts to predict secondary school students’ performance in English and Mathematics subjects using data mining (DM) techniques. It aims to provide insights into predictors of students’ performance in English and Mathematics, characteristics of students with different levels of performance, the most effective DM technique for students’ performance prediction, and the relationship between these two subjects. The study employed the archival data of students who were 16 years old in 2019 and sat for the Malaysian Certificate of Examination (MCE) in 2021. The learning of English and Mathematics is a concern in many countries. Three main factors, namely students’ past academic performance, demographics, and psychological attributes were scrutinized to identify their impact on the prediction. This study utilized the Orange software for the DM process. It employed Decision Tree (DT) rules to determine the characteristics of students with low, moderate, and high performance in English and Mathematics subjects. DT and Naïve Bayes (NB) techniques show the best predictive performance for English and Mathematics subjects, respectively. Such characteristics and predictions may cue appropriate interventions to improve students’ performance in these subjects. This study revealed students’ past academic performance as the most critical predictor, as well as a few demographics and psychological attributes. By examining top predictors derived using four different classifier types, this study found that students’ past Mathematics performance predicts their MCE English performance and students’ past English performance predicts their MCE Mathematics performance. This finding shows students’ performances in both subjects are interrelated.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.
References
Adekitan, A. I., & Salau, O. (2019). The impact of engineering students’ performance in the first three years on their graduation result using educational data mining. Heliyon, 5(2), e01250. https://doi.org/10.1016/j.heliyon.2019.e01250
Ahuja, R., Chug, A., Gupta, S., Ahuja, P., & Kohli, S. (2020). Classification and clustering algorithms of machine learning with their applications. In Nature-Inspired Computation in Data Mining and Machine Learning (pp. 225–248). Springer, Cham. https://doi.org/10.1007/978-3-030-28553-1_11
Algarni, A. (2016). Data mining in education. International Journal of Advanced Computer Science and Applications, 7. https://doi.org/10.14569/IJACSA.2016.070659
Almeda, M. V., Zuech, J., Utz, C., Higgins, G., Reynolds, R., & Baker, R. S. (2018). Comparing the factors that predict completion and grades among for-credit and open/mooc students in online learning. Online Learning Journal, 22(1), 1–18. https://doi.org/10.24059/olj.v22i1.1060
Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education, 17(1), 3. https://doi.org/10.1186/s41239-020-0177-7
Amra, I. A. A., & Maghari, A. Y. (2017, May). Students performance prediction using KNN and Naïve Bayesian. In 2017 8th International Conference on Information Technology (pp. 909–913). IEEE. https://doi.org/10.1109/ICITECH.2017.8079967
Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate student’s performance using educational data mining. Computers & Education, 111, 117–194. https://doi.org/10.1016/j.compedu.2017.05.007
Atlay, C., Tieben, N., Hillmert, S., & Fauth, B. (2019). Instructional quality and achievement inequality: How effective is teaching in closing the social achievement gap? Learning and Instruction, 63, 101211. https://doi.org/10.1016/j.learninstruc.2019.05.008
Awan, R. U. N., Noureen, G., & Naz, A. (2011). A study of relationship between achievement motivation, self concept and achievement in English and Mathematics at secondary level. International Education Studies, 4(3), 72–79. https://doi.org/10.5539/ies.v4n3p72
Bagceci, B., Kutlar, E. L., & Cinkara, E. (2014). The relationship between English and Math success & some variables at freshman level. Journal of Education and Practice, 5(29), 6–11. https://www.iiste.org/Journals/index.php/JEP/article/view/16180
Baker, R. S. J. D., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17. https://doi.org/10.5281/zenodo.3554657
Bakhtiar, A., Suliantoro, H., Ningsi, R. H., & Pitipaldi, K. (2021). Relationship of quality management system standards to industrial property rights in Indonesia using Spearman correlation analysis method. In IOP Conference Series: Earth and Environmental Science (Vol. 623, No. 1, p. 012092). IOP Publishing. https://iopscience.iop.org/article/https://doi.org/10.1088/1755-1315/623/1/012092/pdf
Bergquist, S. L., Layton, T. J., McGuire, T. G., & Rose, S. (2019). Data transformations to improve the performance of health plan payment methods. Journal of Health Economics, 66, 195–207. https://doi.org/10.1016/j.jhealeco.2019.05.005
Berhanu, F., & Abera, A. (2015). Students’ performance prediction based on their academic record. International Journal of Computer Applications, 131(5), 27–35. https://doi.org/10.5120/ijca2015907348
Berland, M., Baker, R. S., & Blikstein, P. (2014). Educational data mining and learning analytics: Applications to constructionist research. Technology, Knowledge and Learning, 19, 205–220. https://doi.org/10.1007/s10758-014-9223-7
Burkart, N., & Huber, M. F. (2021). A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research, 70, 245–317. https://doi.org/10.1613/jair.1.12228
Chand, S., Chaudhary, K., Prasad, A., & Chand, V. (2021). Perceived causes of students’ poor performance in mathematics: A case study at Ba and Tavua secondary schools. Frontiers in Applied Mathematics and Statistics. https://doi.org/10.3389/fams.2021.614408
Chandrasekar, P., & Qian, K. (2016, June). The impact of data pre-processing on the performance of a Naive Bayes classifier. In 2016 IEEE 40th Annual Computer Software and Applications Conference (Vol. 2, pp. 618–619). IEEE. https://doi.ieeecomputersociety.org/https://doi.org/10.1109/COMPSAC.2016.205
Chapman, W. W., & Haug, P. J. (1999). Comparing expert systems for identifying chest x-ray reports that support pneumonia. In Proceedings of the AMIA Symposium (p. 216). American Medical Informatics Association. https://doi.org/10.1006/jbin.2001.1000
Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20–28. https://doi.org/10.38094/jastt20165
Dutt, A., Ismail, M. A., & Herawan, T. (2017). A systematic review on educational data mining. IEEE Access, 5, 15991–16005. https://doi.org/10.1109/ACCESS.2017.2654247
Etemadpour, R., Zhu, Y., Zhao, Q., Hu, Y., Chen, B., Sharier, M. A., & … Paiva, J. G. S. (2020). Role of absence in academic success: An analysis using visualization tools. Smart Learning Environments, 7(1), 1–25. https://doi.org/10.5130/AJCEB.v20i3.7056
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37–37. https://doi.org/10.1609/aimag.v17i3.1230
García-Jiménez, J., Rodríguez-Santero, J., & Torres-Gordillo, J. J. (2020). Influence of contextual variables on educational performance: A study using hierarchical segmentation trees. Sustainability, 12(23), 9933. https://doi.org/10.3390/su12239933
Garg, R. (2018). Predict student performance in different regions of Punjab. International Journal of Advanced Research in Computer Science, 9(1), 236–241. https://doi.org/10.26483/ijarcs.v9i1.5234
Goutte, C., & Gaussier, E. (2005, March). A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In European conference on information retrieval (pp. 345–359). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_25
Gray, G., McGuinness, C., & Owende, P. (2014, February). An application of classification models to predict learner progression in tertiary education. In 2014 IEEE International Advance Computing Conference (IACC) (pp. 549–554). IEEE. https://doi.org/10.1109/IAdCC.2014.6779384
Had, M. Z. C., & Rashid, R. A. (2019). A review of digital skills of Malaysian English language teachers. International Journal of Emerging Technologies in Learning, 14(2). https://doi.org/10.3991/ijet.v14i02.8732
Henry, D. L., Baltes, B., & Nistor, N. (2014). Examining the relationship between math scores and English language proficiency. Journal of Educational Research and Practice, 4(1), 2. https://doi.org/10.5590/JERAP.2014.04.1.02
Hoe, A. C. K., Ahmad, M. S., Hooi, T. C., Shanmugam, M., Gunasekaran, S. S., Cob, Z. C., & Ramasamy, A. (2013, November). Analyzing students records to identify patterns of students’ performance. In 2013 International Conference on Research and Innovation in Information Systems (pp. 544–547). IEEE. https://doi.org/10.1109/ICRIIS.2013.6716767
Idris, M., Hussain, S., & Ahmad, N. (2020). Relationship between parents’ education and their children’s academic achievement. Journal of Arts & Social Sciences (JASS), 7(2), 82–92. https://doi.org/10.46662/jass-vol7-iss2-2020(82-92)
Ioannis, B., & Maria, K. (2019). Gender and student course preferences and course performance in computer science departments: A case study. Education and Information Technologies, 24(2), 1269–1291. https://doi.org/10.1007/s10639-018-9828-x
Jamil, J. M., Pauzi, N. F. M., & Nee, I. N. M. S. (2018). An analysis on student academic performance by using decision tree models. The Journal of Social Sciences Research, 6, 615–620. https://doi.org/10.32861/jssr.spi6.615.620
Joseph, V. R. (2022). Optimal ratio for data splitting. Statistical Analysis and Data Mining: The ASA Data Science Journal. https://doi.org/10.1002/sam.11583
Kementerian Pendidikan Malaysia. (2020). Laporan Analisis Keputusan SPM 2019. Retrieved from https://www.moe.gov.my/en/muat-turun/laporan-dan-statistik/lp/3324-laporan-analisis-keputusan-spm-2019/file
Khairy, A. M., Adam, A., & Yaakub, M. R. (2018). Data analytics in Malaysian education system: Revealing the success of Sijil Pelajaran Malaysia from Ujian Aptitud Sekolah Rendah. Asia-Pacific Journal of Information Technology and Multimedia, 7(2), 29–45. https://doi.org/10.17576/apjitm-2018-0702-03
Kiu, C. C. (2018). Data mining analysis on student’s academic performance through exploration of student’s background and social activities. In 2018 Fourth International Conference on Advances in Computing, Communication & Automation (ICACCA) (pp. 1–5). IEEE. https://doi.org/10.1109/ICACCAF.2018.8776809
Krupat, E., Pelletier, S. R., & Dienstag, J. L. (2017). Academic performance on first-year medical school exams: How well does it predict later performance on knowledge-based and clinical assessments? Teaching and Learning in Medicine, 29(2), 181–187. https://doi.org/10.1080/10401334.2016.1259109
Lau, E. T., Sun, L., & Yang, Q. (2019). Modelling, prediction and classification of student academic performance using artificial neural networks. SN Applied Sciences, 1(9), 1–10. https://doi.org/10.1007/s42452-019-0884-7
Li, Z., & Qiu, Z. (2018). How does family background affect children’s educational achievement? Evidence from contemporary china. The Journal of Chinese Sociology, 5(1), 1–21. https://doi.org/10.1186/s40711-018-0083-8
Lipnevich, A. A., Preckel, F., & Krumm, S. (2016). Mathematics attitudes and their unique contribution to achievement: Going over and above cognitive ability and personality. Learning and Individual Differences, 47, 70–79. https://doi.org/10.1016/j.lindif.2015.12.027
López-Zambrano, J., Lara Torralbo, J. A., & Romero Morales, C. (2021). Early prediction of student learning performance through data mining: A systematic review. Psicothema, 33(3), 456–465. https://hdl.handle.net/11162/211835
Lu, O. H., Huang, A. Y., Huang, J. C., Lin, A. J., Ogata, H., & Yang, S. J. (2018). Applying learning analytics for the early prediction of students’ academic performance in blended learning. Journal of Educational Technology & Society, 21(2), 220–232. http://hdl.handle.net/2433/231307
Maghari, A., & Mousa, H. (2017). School students’ performance prediction using data mining classification. International Journal of Advanced Research in Computer and Communication Engineering, 6(8), 136–141. https://doi.org/10.17148/IJARCCE.2017.6824
Makhtar, M., Nawang, H., & Wan Shamsuddin, S. N. (2017). Analysis on students’ performance using Naïve Bayes classifier. Journal of Theoretical & Applied Information Technology, 95(16).
Martínez, C. R., & Gil, M. G. (2019). Gender differences in school performance and attitudes toward school. Ensaio: Avaliação e Políticas Públicas Em Educação, 1–21. https://doi.org/10.1590/s0104-40362019002702235
McKee, M. T., & Caldarella, P. (2016). Middle school predictors of high school performance: A case study of dropout risk indicators. Education, 136(4), 515–529. Retrieved from https://eric.ed.gov/?id=EJ1104172
Meng, M., & Zhao, C. (2015). Application of support vector machines to a small-sample prediction. Advances in Petroleum Exploration and Development, 10(2), 72–75. https://doi.org/10.3968/7830
Miller-Matero, L. R., Martinez, S., MacLean, L., Yaremchuk, K., & Ko, A. B. (2018). Grit: A predictor of medical student performance. Education for Health, 31(2), 109. https://doi.org/10.4103/efh.efh_152_16
Mishra, T., Kumar, D., & Gupta, S. (2014). Mining students’ data for prediction performance. In 2014 Fourth International Conference on Advanced Computing & Communication Technologies (pp. 255–262). IEEE. https://doi.org/10.1109/ACCT.2014.105
Mohamad, S. K., & Tasir, Z. (2013). Educational data mining: A review. Procedia - Social and Behavioral Sciences, 97(6), 320–324. https://doi.org/10.1016/j.sbspro.2013.10.240
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting students’ academic performance using data mining techniques. International Journal of Modern Education and Computer Science, 8(11), 36–42. https://doi.org/10.5815/ijmecs.2016.11.05
Mühlbacher, T., Linhardt, L., Möller, T., & Piringer, H. (2017). Treepod: Sensitivity-aware selection of pareto-optimal decision trees. IEEE Transactions on Visualization and Computer Graphics, 24(1), 174–183. https://doi.org/10.1109/tvcg.2017.2745158
Muller, C. (2018). Parent involvement and academic achievement: An analysis of family resources available to the child. In Parents, their children, and schools (pp. 77–114). Routledge. Retrieved from https://www.taylorfrancis.com/chapters/edit/https://doi.org/10.4324/9780429498497-4/parent-involvement-academic-achievement-chandra-muller
Namoun, A., & Alshanqiti, A. (2021). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences, 11(1), 237. https://doi.org/10.3390/app11010237
Nawai, S. N. M., Saharan, S., & Hamzah, N. A. (2021). An analysis of students’ performance using cart approach. In AIP Conference Proceedings (Vol. 2355, No. 1, p. 060009). AIP Publishing LLC. https://doi.org/10.1063/5.0053388
Parajuli, M., & Thapa, A. (2017). Gender differences in the academic performance of students. Journal of Development and Social Engineering, 3(1), 39–47. https://doi.org/10.3126/jdse.v3i1.27958
Perez, A. L., & Alieto, E. (2018). Change of" tongue" from English to a local language: A correlation of mother tongue proficiency and mathematics achievement. Online Submission, 14, 132–150. Retrieved from https://files.eric.ed.gov/fulltext/ED597105.pdf
Putpuek, N., Rojanaprasert, N., Atchariyachanvanich, K., & Thamrongthanyawong, T. (2018). Comparative study of prediction models for final GPA score: a case study of Rajabhat Rajanagarindra university. In 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (pp. 92–97). IEEE. https://doi.org/10.1109/ICIS.2018.8466475
Rambely, A. S., Ahmad, R. R., Majid, N., & Jaaman, S. H. (2013). The relationship of English proficiency and mathematics achievement. In Recent Advances in Educational Technologies (pp. 139–145). Retrieved from http://www.wseas. us/elibrary/conferences/2013/Cambridge USA/EET/EET-24. pdf.
Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), e1355. https://doi.org/10.1002/widm.1355
Rudd, M., & Honkiss, L. (2020). Analysing the correlation between English proficiency and academic performance among Thai university students. Athens Journal of Education, 7(1), 123–137. https://doi.org/10.30958/aje.7-1-6
Saa, A. A., Al-Emran, M., & Shaalan, K. (2019). Factors affecting students’ performance in higher education: A systematic review of predictive data mining techniques. Technology, Knowledge and Learning, 24(4), 567–598. Springer Netherlands. https://doi.org/10.1007/s10758-019-09408-7
Saleh, M. A., Palaniappan, S., & Abdalla, N. A. A. (2021). Predicting student performance using data mining techniques in Libyan high schools. Edukasi, 15(2), 91–100. https://doi.org/10.15294/edukasi.v15i2.30068
Salem, A. B. M., & Parusheva, S. (2018). Exploiting the knowledge engineering paradigms for designing smart learning systems. Eastern-European Journal of Enterprise Technologies, 2(2), 38–44. https://doi.org/10.15587/1729-4061.2018.128410
Saritas, M. M., & Yasar, A. (2019). Performance analysis of ANN and Naive Bayes classification algorithm for data classification. International Journal of Intelligent Systems and Applications in Engineering, 7(2), 88–91. https://doi.org/10.18201//ijisae.2019252786
Sathe, M. T., & Adamuthe, A. C. (2021). Comparative study of supervised algorithms for prediction of students’ performance. International Journal of Modern Education & Computer Science, 13(1). https://doi.org/10.5815/ijmecs.2021.01.01
Shahiri, A. M., & Husain, W. (2015). A review on predicting student’s performance using data mining techniques. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157
Shetu, S. F., Saifuzzaman, M., Moon, N. N., Sultana, S., & Yousuf, R. (2021). Student’s performance prediction using data mining technique depending on overall academic status and environmental attributes. In International Conference on Innovative Computing and Communications (pp. 757–769). Springer, Singapore. https://doi.org/10.1007/978-981-15-5148-2_66
Shinde, T. A., & Prasad, J. R. (2017). IoT based animal health monitoring with Naive Bayes classification. International Journal of Emerging Trends & Technology, 1(2), 252–257.
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Sordo, M., & Zeng, Q. (2005, November). On sample size and classification accuracy: A performance comparison. In International Symposium on Biological and Medical Data Analysis (pp. 193–201). Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573067_20
Stoffelsma, L., & Spooren, W. (2019). The relationship between English reading proficiency and academic achievement of first-year science and mathematics students in a multilingual context. International Journal of Science and Mathematics Education, 17(5), 905–922. https://doi.org/10.1007/s10763-018-9905-z
Sulphey, M. M., Al-Kahtani, N. S., & Syed, A. M. (2018). Relationship between admission grades and academic achievement. Entrepreneurship and Sustainability Issues, 5(3), 648–658. https://doi.org/10.9770/jesi.2018.5.3(17)
Tan, R. Z., Wang, P. C., Lim, W. H., Ong, S. H. C., & Avnit, K. (2019). Early prediction of students’ mathematics performance. Proceedings of 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering, 651–656. https://doi.org/10.1109/TALE.2018.8615289
Verma, S., & Yadav, R. K. (2020). Effect of different attributes on the academic performance of engineering students. In 2020 IEEE International Conference on Advent Trends in Multidisciplinary Research and Innovation (pp. 1–4). IEEE. https://doi.org/10.1109/ICATMRI51801.2020.9398442
Wakelam, E., Jefferies, A., Davey, N., & Sun, Y. (2020). The potential for student performance prediction in small cohorts with minimal available attributes. British Journal of Educational Technology, 51(2), 347–370. https://doi.org/10.1111/bjet.12836
Wilcox, A., & Hripcsak, G. (1999). Classification algorithms applied to narrative reports. In Proceedings of the AMIA Symposium (p. 455). American Medical Informatics Association. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2232569/pdf/procamiasymp00004-0492.pdf
Wong, S. L., & Wong, S. L. (2019). Relationship between interest and mathematics performance in a technology-enhanced learning context in Malaysia. Research and Practice in Technology Enhanced Learning, 14(1), 1–13. https://doi.org/10.1186/s41039-019-0114-3
Yung, K. W. H., & Cai, Y. (2020). Do secondary school-leaving English examination results predict university students’ academic writing performance? A latent profile analysis. Assessment & Evaluation in Higher Education, 45(4), 629–642. https://doi.org/10.1080/02602938.2019.1680951
Acknowledgements
This work is supported by the Malaysian Ministry of Higher Education, Fundamental Research Grant Scheme, FRGS/1/2020/SS10/UNIMAS/01/1, and UNIMAS Zamalah Scholarship.
Funding
This work is funded by the Malaysian Ministry of Higher Education, Fundamental Research Grant Scheme, FRGS/1/2020/SS10/UNIMAS/01/1, and UNIMAS Zamalah Scholarship.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval statement
The study obtained approval from the Education Policy Research and Development Division, Ministry of Education, Malaysia to use the archival data from the schools involved.
Conflict of interest
There is no potential conflict of interest in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Roslan, M.H.B., Chen, C.J. Predicting students’ performance in English and Mathematics using data mining techniques. Educ Inf Technol 28, 1427–1453 (2023). https://doi.org/10.1007/s10639-022-11259-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-022-11259-2