Abstract
This paper presents a comparative evaluation of classification algorithms using Waikato Environment for Knowledge Analysis (WEKA) software. The main goal of the paper is to conduct a comprehensive comparison and determine which predictive modelling technique is best for the problem of classifying breast cancer recurrence. The dataset for this study consists of 286 instances (201 instances belong to recurrence class and 85 instances belong to non-recurrence class) and 10 attributes. Comparison analysis is conducted for Naïve Bayes, J48, K*, Random Forest, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) models using different parameters. The performance of the developed models is calculated using the following evaluation metrics: accuracy, precision, sensitivity, specificity, mean absolute error, ROC curves and AUC values. Contribution of the attributes to the classification models is assessed by measuring information gain. Results show that J48 model and the SVM algorithm give the highest accuracy, which is 75.5% and 79.6%, respectively. Implementation of SVM algorithm also shows the highest sensitivity of 99%, while the highest precision is obtained by MLP algorithm which is 79%. In addition, SVM algorithm possesses the lowest mean absolute error. Furthermore, by measuring information gain, it is revealed that a degree of malignant tumour contributes more than other attributes to recurrence of breast cancer.
Graphical abstract
Similar content being viewed by others
References
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Magazine 17(3):37–37
Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, Parkin DM, Forman D, Bray F (2015) Cancer incidence and mortality worldwide: sources, methods and major patterns in globocan 2012. International Journal of Cancer 136(5):E359–E386
Gerber B, Freund M, Reimer T (2010) Recurrent breast cancer: treatment strategies for maintaining and prolonging good quality of life. Deutsches Arzteblatt International 107(6):85
Cruz JA, Wishart DS (2006) Applications of machine learning in cancer prediction and prognosis. Cancer Informatics 2:117693510600200030
Jalalian A, Mashohor SB, Mahmud HR, Saripan MIB, Ramli ARB, Karasfi B (2013) Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: a review. Clinical Imaging 37(3):420–426
Pradeep N, Girisha H, Karibasappa K (2012) Segmentation and feature extraction of tumors from digital mammograms. Computer Engineering and Intelligent Systems 3(4):37–46
Sheth D, Giger ML (2020) Artificial intelligence in the interpretation of breast cancer on mri. Journal of Magnetic Resonance Imaging 51(5):1310–1324
Padmapriya B, Velmurugan T (2016) Classification algorithm based analysis of breast cancer data. International Journal of Data Mining Techniques and Applications 6(1):43–49
Yue W, Wang Z, Chen H, Payne A, Liu X (2018) Machine learning with applications in breast cancer diagnosis and prognosis. Designs 2(2):13
Sadoughi F, Kazemy Z, Hamedan F, Owji L, Rahmanikatigari M, Azadboni TT (2018) Artificial intelligence methods for the diagnosis of breast cancer by image processing: a review. Breast Cancer: Targets and Therapy 10:219
Yavuz E, Eyupoglu C (2020) An effective approach for breast cancer diagnosis based on routine blood analysis features. Medical & Biological Engineering & Computing 58:1583–1601
Soltani bm, Rahpeima R, Moradi Kashk F (2019) Breast cancer diagnosis with a microwave thermoacoustic imaging technique-a numerical approach. Medical & Biological Engineering & Computing 57(7):1497–1513
Ghosh S, Mondal S, Ghosh B (2014) A comparative study of breast cancer detection based on svm and mlp bpn classifier. In: 2014 First International Conference on Automation, Control, Energy and Systems (ACES). IEEE, pp 1–4
Abreu PH, Santos MS, Abreu MH, Andrade B, Silva DC (2016) Predicting breast cancer recurrence using machine learning techniques: a systematic review. ACM Computing Surveys (CSUR) 49(3):1–40
Akinsola A, Sokunbi M, Okikiola F, Onadokun I (2017) Data mining for breast cancer classification. International Journal of Engineering And Computer Science 6(7):22 250-22 258
Sharma A, Kulshrestha S, Daniel S (2017) Machine learning approaches for breast cancer diagnosis and prognosis. In: 2017 International conference on soft computing and its engineering applications (icSoftComp). IEEE, pp 1–5
Ahmad LG, Eshlaghy A, Poorebrahimi A, Ebrahimi M, Razavi A et al (2013) Using three machine learning techniques for predicting breast cancer recurrence. J Health Med Inform 4(124):3
Razavi AR, Gill H, Stål O, Sundquist M, Thorstenson S, Åhlfeldt H, Shahsavar N (2005) Exploring cancer register data to find risk factors for recurrence of breast cancer-application of canonical correlation analysis. BMC Medical Informatics and Decision Making 5(1):1–7
Fan Q, Zhu C-J, Yin L (2010) Predicting breast cancer recurrence using data mining techniques, In: 2010 International conference on bioinformatics and biomedical technology. IEEE, pp 310–311
Alzu’bi A, Najadat H, Doulat W, Al-Shari O, Zhou L (2021) Predicting the recurrence of breast cancer using machine learning algorithms. Multimedia Tools and Applications 80(9):13 787-13 800
Lou S-J, Hou M-F, Chang H-T, Chiu C-C, Lee H-H, Yeh S-CJ, Shi H-Y (2020) Machine learning algorithms to predict recurrence within 10 years after breast cancer surgery: A prospective cohort study. Cancers 12(12):3817
Weka (2021) [Online]. Available: https://www.cs.waikato.ac.nz/ml/weka/
Nie Y, De Santis L, Carratù M, O’Nils M, Sommella P, Lundgren J (2020) Deep melanoma classification with k-fold cross-validation for process optimization. In: 2020 IEEE International symposium on medical measurements and applications (MeMeA). IEEE, pp 1–6
Salehi M, Razmara J, Lotfi S (2020) A novel data mining on breast cancer survivability using mlp ensemble learners. The Computer Journal 63(3):435–447
Bawah F, Ussiph N (2018) Appraisal of the classification technique in data mining of student performance using j48 decision tree, k-nearest neighbor and multilayer perceptron algorithms. International Journal of Computer Applications 179:39–46
Cleary JG, Trigg LE (1995) K*: An instance-based learner using an entropic distance measure, In: Machine learning proceedings 1995. Elsevier, pp. 108–114
Tiwari P, Dao H, Nguyen GN (2017) Performance evaluation of lazy, decision tree classifier and multilayer perceptron on traffic accident analysis. Informatica 41(1):39–46
Foody GM, Mathur A (2004) Toward intelligent training of supervised image classifications: directing training data acquisition for svm classification. Remote Sensing of Environment 93(1–2):107–117
Korting TS (2006) C4. 5 algorithm and multivariate decision trees. Image Processing Division, National Institute for Space Research–INPE Sao Jose dos Campos–SP, Brazil
Murugan S, Kumar BM, Amudha S (2017) Classification and prediction of breast cancer using linear regression, decision tree and random forest, In: 2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC). IEEE, pp 763–766
Sammut C, Webb GI (2011) Encyclopedia of machine learning. Springer Science & Business Media,
Uci breast cancer dataset (1995) [Online]. Available: https://archive.ics.uci.edu/ml/datasets/breast+cancer
White MC, Holman DM, Boehm JE, Peipins LA, Grossman M, Henley SJ (2014) Age and cancer risk: a potentially modifiable relationship. American Journal of Preventive Medicine 46(3):S7–S15
Jerez-Aragonés JM, Gómez-Ruiz JA, Ramos-Jiménez G, Muñoz-Pérez J, Alba-Conejo E (2003) A combined neural network and decision trees model for prognosis of breast cancer relapse. Artificial Intelligence in Medicine 27(1):45–63
Trichopoulos D, MacMahon B, Cole P (1972) Menopause and breast cancer risk. Journal of the National Cancer Institute 48(3):605–613
Abdou Y, Gupta M, Asaoka M, Attwood K, Mateusz O, Gandhi S, Takabe K (2020) Abstract p2-09-09: Breast cancer arising on the left side is biologically more aggressive and has worse outcomes compared to the right side
Siotos C, McColl M, Psoter K, Gilmore RC, Sebai ME, Broderick KP, Jacobs LK, Irwin S, Rosson GD, Habibi M (2018) Tumor site and breast cancer prognosis. Clinical Breast Cancer 18(5):e1045–e1052
Dias JG (2009) Breast cancer diagnostic typologies by grades of membership fuzzy modeling, In: Proceedings of the 2nd WSEAS international conference on multivariate analysis and its application in science and engineering
Group EBCTC et al (2011) Effect of radiotherapy after breast-conserving surgery on 10-year recurrence and 15-year breast cancer death: meta-analysis of individual patient data for 10 801 women in 17 randomised trials. The Lancet 378(9804):1707-1716,
Merino T, Ip T, Domínguez F, Acevedo F, Medina L, Villaroel A, Camus M, Vinés E, Sánchez C (2018) Risk factors for loco-regional recurrence in breast cancer patients: a retrospective study. Oncotarget 9(54):30355
Hess KR, Esteva FJ (2013) Effect of her2 status on distant recurrence in early stage breast cancer. Breast Cancer Research and Treatment. 137(2):449–455
Fedele P, Orlando L, Schiavone P, Quaranta A, Lapolla AM, De Pasquale M, Ardizzone A, Bria E, Sperduti I, Calvani N et al (2014) Bmi variation increases recurrence risk in women with early-stage breast cancer. Future Oncology 10(15):2459–2468
Guleria K, Sharma A, Lilhore UK, Prasad D (2020) Breast cancer prediction and classification using supervised learning techniques. Journal of Computational and Theoretical Nanoscience 17(6):2519–2522
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 408:189–215
Acknowledgements
This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data. Please include this citation if you plan to use this database.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mikhailova, V., Anbarjafari, G. Comparative analysis of classification algorithms on the breast cancer recurrence using machine learning. Med Biol Eng Comput 60, 2589–2600 (2022). https://doi.org/10.1007/s11517-022-02623-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-022-02623-y