Abstract
Breast cancer remains a global health challenge, contributing significantly to mortality worldwide. Addressing this critical issue, we review existing machine-learning algorithms for accurate breast cancer diagnosis. Our research employs an ensemble of algorithms, including Gaussian Naive Bayes, XGBoost, Support Vector Machine, Logistic Regression, Principal Component Analysis, Linear Discriminant Analysis, k Nearest Neighbors, Random Forest, Decision Tree Classifier, and an ensemble classifier. This study relies on specialized data preprocessing and balancing techniques, ensuring the reliability of the analysis. Our approach utilizes two prominent datasets, the Wisconsin Breast Cancer Diagnosis (WBCD) and the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, carefully partitioned with a 5-fold cross-validation strategy. The evaluation protocol is comprehensive, spanning diverse performance metrics. We explore confusion matrices, accuracy, precision, recall, F1 score, the area under the curve (AUC), and Receiver Operating Characteristic (ROC) curves. Notably, in classifying benign and malignant tumors using the WBCD dataset, the SVM model stands out with a detection accuracy of 99.27%, complemented by precision, recall, and F1-score, all achieving 99.28%. Transitioning to the WDBC dataset, the XGBoost model emerges as the optimal choice, attaining accuracy, precision, recall, and F1-score values of 98.25%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarap, A.F.M.: On breast cancer detection: an application of machine learning algorithms on the Wisconsin diagnostic dataset. In: Proceedings of the 2nd International Conference on Machine Learning and Soft Computing, ICMLSC 2018, pp. 5–9. Association for Computing Machinery, New York (2018)
Ak, M.F.: A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. In: Healthcare, vol. 8, p. 111. MDPI (2020)
Alshayeji, M.H., Ellethy, H., Abed, S., Gupta, R.: Computer-aided detection of breast cancer on the Wisconsin dataset: an artificial neural networks approach. Biomed. Signal Process. Control 71, 103141 (2022)
Amrane, M., Oukid, S., Gagaoua, I., Ensari, T.: Breast cancer classification using machine learning. In: 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), pp. 1–4. IEEE (2018)
Atallah, R., Al-Mousa, A.: Heart disease detection using machine learning majority voting ensemble method. In: 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS), pp. 1–6 (2019). https://doi.org/10.1109/ICTCS.2019.8923053
Bayrak, E.A., Kırcı, P., Ensari, T.: Comparison of machine learning methods for breast cancer diagnosis. In: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–3. IEEE (2019)
Chaurasia, V., Pal, S., Tiwari, B.: Prediction of benign and malignant breast cancer using data mining techniques. J. Algorithms Comput. Technol. 12(2), 119–126 (2018). https://doi.org/10.1177/1748301818756225
Dou, W., et al.: An AutoML approach for predicting risk of progression to active tuberculosis based on its association with host genetic variations. In: Proceedings of the 2021 10th International Conference on Bioinformatics and Biomedical Science, pp. 82–88 (2021)
Fatima, M., Pasha, M.: Survey of machine learning algorithms for disease diagnostic. J. Intell. Learn. Syst. Appl. 9(1), 1–16 (2017)
Garrido-Merchán, E.C., Hernández-Lobato, D.: Dealing with categorical and integer-valued variables in Bayesian optimization with gaussian processes. Neurocomputing 380, 20–35 (2020)
Gupta, P., Garg, S.: Breast cancer prediction using varying parameters of machine learning models. Procedia Comput. Sci. 171, 593–601 (2020). Third International Conference on Computing and Network Communications (CoCoNet 2019)
Kharya, S., Soni, S.: Weighted Naive Bayes classifier: a predictive model for breast cancer detection. Int. J. Comput. Appl. 133(9), 32–37 (2016). https://doi.org/10.5120/ijca2016908023
Kobayashi, S., Kane, T.B., Paton, C.: The privacy and security implications of open data in healthcare. Yearb. Med. Inform. 27(01), 041–047 (2018)
Liu, Z., et al.: Machine learning approaches to investigate the relationship between genetic factors and autism spectrum disorder. In: Proceedings of the 2021 4th International Conference on Machine Learning and Machine Intelligence, pp. 164–171 (2021)
TACS medical, editorial content team: Types of breast cancer (2021). https://www.cancer.org/cancer/types/breast-cancer/about/types-of-breast-cancer.html. Accessed 6 July 2023
Ogundokun, R.O., Misra, S., Douglas, M., Damaševičius, R., Maskeliūnas, R.: Medical internet-of-things based breast cancer diagnosis using hyperparameter-optimized neural networks. Future Internet 14(5), 153 (2022)
Omondiagbe, D.A., Veeramani, S., Sidhu, A.S.: Machine learning classification techniques for breast cancer diagnosis. In: IOP Conference Series: Materials Science and Engineering, vol. 495, no. 1, p. 012033 (2019). https://doi.org/10.1088/1757-899X/495/1/012033
Sung, H., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71(3), 209–249 (2021)
Thagaard, J., et al.: Pitfalls in machine learning-based assessment of tumor-infiltrating lymphocytes in breast cancer: a report of the international immuno-oncology biomarker working group. J. Pathol. (2023)
Thirumalaikolundusubramanian, P., et al.: Comparison of Bayes classifiers for breast cancer classification. Asian Pac. J. Cancer Prev.: APJCP 19(10), 2917 (2018)
Wolberg, W.: Breast Cancer Wisconsin (Original). UCI Machine Learning Repository (1992). https://doi.org/10.24432/C5HP4Z
Wolberg, W., Street, W., Mangasarian, O.: Breast cancer Wisconsin (diagnostic). UCI Machine Learning Repository (1995). https://doi.org/10.24432/C5DW2B
Zhenghan, N., Dib, O.: Agriculture stimulates Chinese GDP: a machine learning approach. In: Tang, L.C., Wang, H. (eds.) BDET 2022. LNDECT, vol. 150, pp. 21–36. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-17548-0_3
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, M., Fan, W., Tang, W., Liu, T., Li, D., Dib, O. (2024). Review of Machine Learning Algorithms for Breast Cancer Diagnosis. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2023. Communications in Computer and Information Science, vol 2018. Springer, Singapore. https://doi.org/10.1007/978-981-97-0844-4_17
Download citation
DOI: https://doi.org/10.1007/978-981-97-0844-4_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0843-7
Online ISBN: 978-981-97-0844-4
eBook Packages: Computer ScienceComputer Science (R0)