Skip to main content

Review of Machine Learning Algorithms for Breast Cancer Diagnosis

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2018))

Included in the following conference series:

  • 84 Accesses

Abstract

Breast cancer remains a global health challenge, contributing significantly to mortality worldwide. Addressing this critical issue, we review existing machine-learning algorithms for accurate breast cancer diagnosis. Our research employs an ensemble of algorithms, including Gaussian Naive Bayes, XGBoost, Support Vector Machine, Logistic Regression, Principal Component Analysis, Linear Discriminant Analysis, k Nearest Neighbors, Random Forest, Decision Tree Classifier, and an ensemble classifier. This study relies on specialized data preprocessing and balancing techniques, ensuring the reliability of the analysis. Our approach utilizes two prominent datasets, the Wisconsin Breast Cancer Diagnosis (WBCD) and the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, carefully partitioned with a 5-fold cross-validation strategy. The evaluation protocol is comprehensive, spanning diverse performance metrics. We explore confusion matrices, accuracy, precision, recall, F1 score, the area under the curve (AUC), and Receiver Operating Characteristic (ROC) curves. Notably, in classifying benign and malignant tumors using the WBCD dataset, the SVM model stands out with a detection accuracy of 99.27%, complemented by precision, recall, and F1-score, all achieving 99.28%. Transitioning to the WDBC dataset, the XGBoost model emerges as the optimal choice, attaining accuracy, precision, recall, and F1-score values of 98.25%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarap, A.F.M.: On breast cancer detection: an application of machine learning algorithms on the Wisconsin diagnostic dataset. In: Proceedings of the 2nd International Conference on Machine Learning and Soft Computing, ICMLSC 2018, pp. 5–9. Association for Computing Machinery, New York (2018)

    Google Scholar 

  2. Ak, M.F.: A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. In: Healthcare, vol. 8, p. 111. MDPI (2020)

    Google Scholar 

  3. Alshayeji, M.H., Ellethy, H., Abed, S., Gupta, R.: Computer-aided detection of breast cancer on the Wisconsin dataset: an artificial neural networks approach. Biomed. Signal Process. Control 71, 103141 (2022)

    Article  Google Scholar 

  4. Amrane, M., Oukid, S., Gagaoua, I., Ensari, T.: Breast cancer classification using machine learning. In: 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), pp. 1–4. IEEE (2018)

    Google Scholar 

  5. Atallah, R., Al-Mousa, A.: Heart disease detection using machine learning majority voting ensemble method. In: 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS), pp. 1–6 (2019). https://doi.org/10.1109/ICTCS.2019.8923053

  6. Bayrak, E.A., Kırcı, P., Ensari, T.: Comparison of machine learning methods for breast cancer diagnosis. In: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–3. IEEE (2019)

    Google Scholar 

  7. Chaurasia, V., Pal, S., Tiwari, B.: Prediction of benign and malignant breast cancer using data mining techniques. J. Algorithms Comput. Technol. 12(2), 119–126 (2018). https://doi.org/10.1177/1748301818756225

    Article  Google Scholar 

  8. Dou, W., et al.: An AutoML approach for predicting risk of progression to active tuberculosis based on its association with host genetic variations. In: Proceedings of the 2021 10th International Conference on Bioinformatics and Biomedical Science, pp. 82–88 (2021)

    Google Scholar 

  9. Fatima, M., Pasha, M.: Survey of machine learning algorithms for disease diagnostic. J. Intell. Learn. Syst. Appl. 9(1), 1–16 (2017)

    Google Scholar 

  10. Garrido-Merchán, E.C., Hernández-Lobato, D.: Dealing with categorical and integer-valued variables in Bayesian optimization with gaussian processes. Neurocomputing 380, 20–35 (2020)

    Article  Google Scholar 

  11. Gupta, P., Garg, S.: Breast cancer prediction using varying parameters of machine learning models. Procedia Comput. Sci. 171, 593–601 (2020). Third International Conference on Computing and Network Communications (CoCoNet 2019)

    Google Scholar 

  12. Kharya, S., Soni, S.: Weighted Naive Bayes classifier: a predictive model for breast cancer detection. Int. J. Comput. Appl. 133(9), 32–37 (2016). https://doi.org/10.5120/ijca2016908023

    Article  Google Scholar 

  13. Kobayashi, S., Kane, T.B., Paton, C.: The privacy and security implications of open data in healthcare. Yearb. Med. Inform. 27(01), 041–047 (2018)

    Article  Google Scholar 

  14. Liu, Z., et al.: Machine learning approaches to investigate the relationship between genetic factors and autism spectrum disorder. In: Proceedings of the 2021 4th International Conference on Machine Learning and Machine Intelligence, pp. 164–171 (2021)

    Google Scholar 

  15. TACS medical, editorial content team: Types of breast cancer (2021). https://www.cancer.org/cancer/types/breast-cancer/about/types-of-breast-cancer.html. Accessed 6 July 2023

  16. Ogundokun, R.O., Misra, S., Douglas, M., Damaševičius, R., Maskeliūnas, R.: Medical internet-of-things based breast cancer diagnosis using hyperparameter-optimized neural networks. Future Internet 14(5), 153 (2022)

    Article  Google Scholar 

  17. Omondiagbe, D.A., Veeramani, S., Sidhu, A.S.: Machine learning classification techniques for breast cancer diagnosis. In: IOP Conference Series: Materials Science and Engineering, vol. 495, no. 1, p. 012033 (2019). https://doi.org/10.1088/1757-899X/495/1/012033

  18. Sung, H., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71(3), 209–249 (2021)

    Article  Google Scholar 

  19. Thagaard, J., et al.: Pitfalls in machine learning-based assessment of tumor-infiltrating lymphocytes in breast cancer: a report of the international immuno-oncology biomarker working group. J. Pathol. (2023)

    Google Scholar 

  20. Thirumalaikolundusubramanian, P., et al.: Comparison of Bayes classifiers for breast cancer classification. Asian Pac. J. Cancer Prev.: APJCP 19(10), 2917 (2018)

    Google Scholar 

  21. Wolberg, W.: Breast Cancer Wisconsin (Original). UCI Machine Learning Repository (1992). https://doi.org/10.24432/C5HP4Z

  22. Wolberg, W., Street, W., Mangasarian, O.: Breast cancer Wisconsin (diagnostic). UCI Machine Learning Repository (1995). https://doi.org/10.24432/C5DW2B

  23. Zhenghan, N., Dib, O.: Agriculture stimulates Chinese GDP: a machine learning approach. In: Tang, L.C., Wang, H. (eds.) BDET 2022. LNDECT, vol. 150, pp. 21–36. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-17548-0_3

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omar Dib .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, M., Fan, W., Tang, W., Liu, T., Li, D., Dib, O. (2024). Review of Machine Learning Algorithms for Breast Cancer Diagnosis. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2023. Communications in Computer and Information Science, vol 2018. Springer, Singapore. https://doi.org/10.1007/978-981-97-0844-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0844-4_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0843-7

  • Online ISBN: 978-981-97-0844-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics