Skip to main content

Application of Machine Learning Techniques to Predict Breast Cancer Survival

  • Conference paper
  • First Online:
Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12832))

Abstract

Despite recent significant advances in big data analytics, there is substantial evidence of machine learning techniques that perform poorly when building prediction models. This research aimed to investigate the performance and effectiveness of machine learning techniques including Naive Bayes (NB), PART, Random Forest (RF), Support Vector Machine (SVM), Adaboost, and Bagging in order to advance existing understandings of model behavior with big data. A large dataset of hospital-based breast cancer from the SEER data file with diagnostic information was used from 2005 to 2014. To address outliers and imbalance issues, we used C4.5 and Synthetic Minority Oversampling TEchnique (SMOTE) to eliminate outliers and balance the dataset. Stratified 10-fold cross-validation was used to divide the dataset to reduce bias and variance of experimental results. Accuracy, G-mean (G), F-measure, and Matthews correlation coefficient (MCC) are employed as criteria to present the overall performance of the models. Moreover, sensitivity, specificity, and precision are utilized as criteria to show the insightful performance of the models. The experimental results indicate that RF is superior to Naive Bayes (NB), PART, Support Vector Machine (SVM), Adaboost, and Bagging in all criteria. Also, models generated from datasets with few outliers and balanced data outperform the original dataset in terms of insight and overall performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Current year estimates for breast cancer. https://www.cancer.org/cancer/breast-cancer/about/how-common-is-breast-cancer.html. Accessed 18 Jan 2021

  2. U.S. Breast cancer statistics. https://www.breastcancer.org/symptoms/understand_bc /statistics. Accessed 04 Feb 2021

  3. Bray, F., Ferlay J., Soerjomataram I., Siegel R.L., Torre L.A., Jemal, A.: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. clin. 68(6), 394–424 (2018)

    Google Scholar 

  4. Ekwueme, D.U., Guy, G.P., Rim, S.H., White, A., Hall, I.J., Fairley, T.L., et al.: Health and economic impact of breast cancer mortality in young woman. Am. J. Prev. Med. 46(1), 71–79 (2014)

    Article  Google Scholar 

  5. The financial burden of breast cancer. https://www.forbes.com/sites/nextavenue/2020/01/21 /the-financial-burden-of-breast-cancer/?sh=13f53854d217. Accessed 12 Feb 2021

  6. What are the risk factors for breast cancer?. https://www.cdc.gov/cancer/breast/basic_info /risk_factors.htm. Accessed 12 Jan 2021

  7. Momenimovahed, Z., Salehiniya, H.: Epidemiological characteristics of and risk factors for breast cancer in the world. Breast Cancer (Dove Med Press). 11, 151–164 (2019)

    Google Scholar 

  8. Tejera Hernández, A.A., Vega, B.V., M., Rocca Cardenas J.C., Gutiérrez Giner M.I., Díaz Chico J.C., Hernández Hernández J.R.: Factors predicting local relapse and survival in patients treated with surgery for breast cancer. Asian J. Surg. 42(7), 755–760 (2018)

    Article  Google Scholar 

  9. Tapak, L., Shirmohammadi-Khorram, N., Amini, P., Alafchi, B., Hamidi, O., Poorolajal, J.: Prediction of survival and metastasis in breast cancer patients using machine learning classifiers. Clin. Epidemiol. Glob. Health. 7(3), 293–299 (2019)

    Article  Google Scholar 

  10. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–45 (1995)

    Google Scholar 

  11. Liu, B., Blasch E., Chen Y., Shen D., Chen G.: Scalable sentiment classification for big data analysis using Naïve Bayes classifier. In: 2013 IEEE International Conference on Big Data, pp. 99–104. (2013)

    Google Scholar 

  12. Sun, N., Sun, B., Lin, J., Wu, M.Y.-C.: Lossless pruned naive Bayes for big data classifications. Big Data Res. 14, 27–36 (2018)

    Article  Google Scholar 

  13. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: 5th International Conference on Machine Learning, pp. 144–51 (1998)

    Google Scholar 

  14. Exarchos, T.P., Tzallas, A.T., Baga, D., Chaloglou, D., Fotiadis, D.I., Tsouli, S., et al.: Using partial decision trees to predict Parkinson’s symptoms: a new approach for diagnosis and therapy in patients suffering from Parkinson’s disease. Comput. Biol. Med. 42(2), 195–204 (2012)

    Article  Google Scholar 

  15. Chang, C., Lai, C., Wu, R.: Decision tree rules for insulation condition assessment of pre-molded power cable joints with artificial defects. IEEE Trans. Dielectr. Electr. Insul. 26(5), 1636–1644 (2019)

    Article  Google Scholar 

  16. Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using second order information for training SVM. Mach. Learn. Res. 6, 1889–1918 (2005)

    MathSciNet  MATH  Google Scholar 

  17. Zou, H., Jin Z.: Comparative study of big data classification algorithm based on SVM. In: 2018 Cross Strait Quad-Regional Radio Science and Wireless Technology Conference (CSQRWC), pp. 1–3 (2018)

    Google Scholar 

  18. Ganggayah, M.D., Taib, N.A., Har, Y.C., Lio, P., Dhillon, S.K.: Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med. Inform. Decis. Mak. 19(48), 1–17 (2019)

    Google Scholar 

  19. Wu, Z., Li, N., Peng, J., Cui, H., Liu, P., Li, H., et al.: Using an ensemble machine learning methodology-Bagging to predict occupants’ thermal comfort in buildings. Energy Build. 173, 117–127 (2018)

    Article  Google Scholar 

  20. Wu, Y., Ke Y., Chen Z., Liang S., Zhao H., Hong H.: Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping. CATENA. 187, 104396 (2020)

    Google Scholar 

  21. Selvathi, D., Selvaraj H.: Segmentation of brain tumor tissues in MR images using multiresolution transforms and random forest classifiers with Adaboost technique. In: 2018 26th International Conference on Systems Engineering (ICSEng), pp. 1–7 (2018)

    Google Scholar 

  22. Jia, W., Xia, H., Jia, L., Deng, Y., Liu, X.: The selection of wart treatment method based on synthetic minority over-sampling technique and axiomatic fuzzy set theory. Biocybern. Biomed. Eng. (2020)

    Google Scholar 

  23. Baldomero-Naranjo, M., Martínez-Merino, L.I., Rodríguez-Chía, A.M.: A robust SVM-based approach with feature selection and outliers detection for classification problems. Expert Syst. Appl. 178, 115017 (2021)

    Google Scholar 

  24. Trabelsi, S., Elouedi, Z., Mellouli, K.: Pruning belief decision tree methods in averaging and conjunctive approaches. Int. J. Approximate Reasoning 46(3), 568–595 (2007)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This research was financially supported by the faculty of Informatics, Mahasarakham University (Grant year 2019). The researchers would like to thanks the SEER website for providing the data used for analyzing the survival model of patients with breast cancer.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaree Thomgkam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thomgkam, J., Sukmak, V., Klangnok, P. (2021). Application of Machine Learning Techniques to Predict Breast Cancer Survival. In: Chomphuwiset, P., Kim, J., Pawara, P. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2021. Lecture Notes in Computer Science(), vol 12832. Springer, Cham. https://doi.org/10.1007/978-3-030-80253-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-80253-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80252-3

  • Online ISBN: 978-3-030-80253-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics