Skip to main content

Advertisement

Log in

Applying Data Mining Techniques to Improve Breast Cancer Diagnosis

  • Systems-Level Quality Improvement
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

In the field of breast cancer research, and more than ever, new computer aided diagnosis based systems have been developed aiming to reduce diagnostic tests false-positives. Within this work, we present a data mining based approach which might support oncologists in the process of breast cancer classification and diagnosis. The present study aims to compare two breast cancer datasets and find the best methods in predicting benign/malignant lesions, breast density classification, and even for finding identification (mass / microcalcification distinction). To carry out these tasks, two matrices of texture features extraction were implemented using Matlab, and classified using data mining algorithms, on WEKA. Results revealed good percentages of accuracy for each class: 89.3 to 64.7 % - benign/malignant; 75.8 to 78.3 % - dense/fatty tissue; 71.0 to 83.1 % - finding identification. Among the different tests classifiers, Naive Bayes was the best to identify masses texture, and Random Forests was the first or second best classifier for the majority of tested groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ferlay, J., Soerjomataram, I., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D.M., Forman, D., and Bray, F., Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 136(5):E359–E386, 2015. doi:10.1002/ijc.29210.

    Article  CAS  PubMed  Google Scholar 

  2. American Cancer Society, Cancer facts and figs. 2016. American Cancer Society, Atlanta, Ga, 2016.

    Google Scholar 

  3. de Oliveira, J.E., Machado, A.M., Chavez, G.C., Lopes, A.P., Deserno, T.M., and Araujo Ade, A., MammoSys: A content-based image retrieval system using breast density patterns. Comput Methods Prog Biomed. 99(3):289–297, 2010. doi:10.1016/j.cmpb.2010.01.005.

    Article  Google Scholar 

  4. Matheus, B., and Schiabel, H., A CADx scheme in mammography: considerations on a novel approach. In: ADVCOMP 2013. The Seventh International Conference on Advanced Engineering Computing and Applications in Sciences. 2013:15–18, 2013.

    Google Scholar 

  5. Moura, D.C., and Guevara Lopez, M.A., An evaluation of image descriptors combined with clinical data for breast cancer diagnosis. Int J Comput Assist Radiol Surg. 8(4):561–574, 2013. doi:10.1007/s11548-013-0838-2.

    Article  PubMed  Google Scholar 

  6. Dong, M., Lu, X., Ma, Y., Guo, Y., Ma, Y., and Wang, K., An efficient approach for automated mass segmentation and classification in mammograms. J Digit Imaging. 28(5):613–625, 2015. doi:10.1007/s10278-015-9778-4.

    Article  PubMed  Google Scholar 

  7. Ogiela, L., Computational intelligence in cognitive healthcare information systems. In: Bichindaritz, I., Vaidya, S., Jain, A., and Jain, L.C. (Eds.), Computational intelligence in healthcare 4: Advanced methodologies. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp. 347–369, 2010. doi:10.1007/978-3-642-14464-6_16.

    Chapter  Google Scholar 

  8. Ogiela, L., Semantic analysis and biological modelling in selected classes of cognitive information systems. Math Comput Model. 58(5–6):1405–1414, 2013. doi:10.1016/j.mcm.2012.12.001.

    Article  Google Scholar 

  9. Pérez, N., Silva, A., and Ramos, I., Ensemble features selection method as tool for breast cancer classification. Int J Image Min. 1(2–3):224–244, 2015. doi:10.1504/IJIM.2015.073019.

    Article  Google Scholar 

  10. Kuusisto, F., Dutra, I., Elezaby, M., Mendonça, E.A., Shavlik, J., and Burnside, E.S., Leveraging expert knowledge to improve machine-learned decision support systems. AMIA Summits Transl Sci Proceed. 2015:87–91, 2015.

    Google Scholar 

  11. Diz, J., Marreiros, G., and Freitas, A., Using data mining techniques to support breast cancer diagnosis. New Contributions in Information Systems and Technologies. Springer, In, pp. 689–700, 2015. doi:10.1007/978-3-319-16486-1_68.

    Google Scholar 

  12. Tseng, W.T., Chiang, W.F., Liu, S.Y., Roan, J., and Lin, C.N., The application of data mining techniques to oral cancer prognosis. J Med Syst. 39(5):59, 2015. doi:10.1007/s10916-015-0241-3.

    Article  PubMed  Google Scholar 

  13. Malucelli, A., Stein Junior, A., Bastos, L., Carvalho, D., Cubas, M.R., and Paraíso, E.C., Classification of risk micro-areas using data mining. Rev Saude Publica. 44(2):292–300, 2010. doi:10.1590/S0034-89102010000200009.

    Article  PubMed  Google Scholar 

  14. Force UPST, Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med. 151(10):716–726 , 2009. doi:10.7326/0003-4819-151-10-200911170-00008.W-236

    Article  Google Scholar 

  15. D’Orsi, C.J., Sickles, E.A., Mendelson, E.B., Morris, E.A., et al., ACR BI-RADS® atlas, breast imaging reporting and data system. Reston, VA, American College of Radiology, 2013.

    Google Scholar 

  16. Boyd, N.F., Martin, L.J., Bronskill, M., Yaffe, M.J., Duric, N., and Minkin, S., Breast tissue composition and susceptibility to breast cancer. J Nat Cancer Inst. 102(16):1224–1237, 2010. doi:10.1093/jnci/djq239.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Gierach, G.L., Ichikawa, L., Kerlikowske, K., Brinton, L.A., Farhat, G.N., Vacek, P.M., Weaver, D.L., Schairer, C., Taplin, S.H., and Sherman, M.E., Relationship between mammographic density and breast cancer death in the breast cancer surveillance consortium. J Nat Cancer Inst. 104(16):1218–1227, 2012. doi:10.1093/jnci/djs327.

    Article  PubMed  PubMed Central  Google Scholar 

  18. López MAG, Posada N, Moura DC, Pollán RR, Valiente JMF, Ortega CS, Solar M, Diaz-Herrero G, Ramos I, Loureiro J, Fernandes TC, Araújo BMF. (2012) BCDR: a breast cancer digital repository. In: 15th International Conference on Experimental Mechanics, FEUP-EURASEM-APAET, Porto/Portugal, 22–27 July 2012. ISBN: 978–972–8826-26-02.

  19. Suri JS, Wilson DL, Laxminarayan S (2005) Handbook of biomedical image analysis, vol 2. Springer Science & Business Media. doi:10.1007/b104806

  20. Carneiro P, Patrocinio (2014) A Análise de atributos de intensidade e textura na classificação de densidade mamária. In: XXIV Congresso Brasileiro de Engenharia Biomédica – CBEB 2014, pp 634–637

  21. Meselhy Eltoukhy, M., Faye, I., and Belhaouari Samir, B., A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation. Comput Biol Med. 42(1):123–128, 2012. doi:10.1016/j.compbiomed.2011.10.016.

    Article  PubMed  Google Scholar 

  22. Mohanty, A.K., Senapati, M.R., Beberta, S., and Lenka, S.K., Texture-based features for classification of mammograms using decision tree. Neural Comput Applic. 23(3–4):1011–1017, 2013. doi:10.1007/s00521-012-1025-z.

    Article  Google Scholar 

  23. Nanni, L., Brahnam, S., Ghidoni, S., Menegatti, E., and Barrier, T., Different approaches for extracting information from the co-occurrence matrix. PloS one. 8(12):e83554, 2013. doi:10.1371/journal.pone.0083554.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Hsu, J.L., Hung, P.C., Lin, H.Y., and Hsieh, C.H., Applying under-sampling techniques and cost-sensitive learning methods on risk assessment of breast cancer. J Med Syst. 39(4):210, 2015. doi:10.1007/s10916-015-0210-x.

    Article  PubMed  Google Scholar 

  25. Pérez N, Guevara MA, Silva A, Ramos I, Loureiro J (2014) Improving the performance of machine learning classifiers for Breast Cancer diagnosis based on feature selection. In: Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on. IEEE, pp 209–217. doi:10.15439/2014F249

  26. Bueno, G., Vállez, N., Déniz, O., Esteve, P., Rienda, M.A., Arias, M., and Pastor, C., Automatic breast parenchymal density classification integrated into a CADe system. Int J Comput Assist Radiol Surg. 6(3):309–318, 2011. doi:10.1007/s11548-010-0510-z.

    Article  CAS  PubMed  Google Scholar 

  27. Ramos-Pollán, R., Guevara-López, M.A., Suárez-Ortega, C., Díaz-Herrero, G., Franco-Valiente, J.M., Rubio-del-Solar, M., González-de-Posada, N., Vaz, M.A.P., Loureiro, J., and Ramos, I., Discovering mammography-based machine learning classifiers for breast cancer diagnosis. J Med Syst. 36(4):2259–2269, 2012. doi:10.1007/s10916-011-9693-2.

    Article  PubMed  Google Scholar 

  28. Oliver A, Freixenet J, Martí R, Zwiggelaar R (2006) A comparison of breast tissue classification techniques. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2006. Springer, pp 872–879. doi:10.1007/11866763_107

  29. Lesniak, J., Hupse, R., Blanc, R., Karssemeijer, N., and Székely, G., Comparative evaluation of support vector machine classification for computer aided detection of breast masses in mammography. Phys Med Biol. 57(16):5295–5307, 2012. doi:10.1088/0031-9155/57/16/5295.

    Article  CAS  PubMed  Google Scholar 

  30. Janitza, S., Strobl, C., and Boulesteix, A.-L., An AUC-based permutation variable importance measure for random forests. BMC bioinformatics. 14:119, 2013. doi:10.1186/1471-2105-14-119.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Luo, S.T., and Cheng, B.W., Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst. 36(2):569–577, 2012. doi:10.1007/s10916-010-9518-8.

    Article  PubMed  Google Scholar 

  32. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. ISBN:0120884070

  33. Castella, C., Kinkel, K., Eckstein, M.P., Sottas, P.-E., Verdun, F.R., and Bochud, F.O., Semiautomatic mammographic parenchymal patterns classification using multiple statistical features. Acad Radiol. 14(12):1486–1499, 2007. doi:10.1016/j.acra.2007.07.014.

    Article  PubMed  Google Scholar 

  34. Fonseca, J., Pre-CADs in breast cancer. FEUP, MSc Thesis in Engenharia Eletrotécnica e de Computadores, 2013.

    Google Scholar 

  35. Benndorf, M., Kotter, E., Langer, M., Herda, C., Wu, Y., and Burnside, E.S., Development of an online, publicly accessible naive Bayesian decision support tool for mammographic mass lesions based on the American College of Radiology (ACR) BI-RADS lexicon. Eur Radiol. 25(6):1768–1775, 2015. doi:10.1007/s00330-014-3570-6.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Timmers, J.M.H., van Doorne-Nagtegaal, H.J., Verbeek, A.L.M., den Heeten, G.J., and Broeders, M.J.M., A dedicated BI-RADS training programme: effect on the inter-observer variation among screening radiologists. Eur J Radiol. 81(9):2184–2188, 2012. doi:10.1016/j.ejrad.2011.07.011.

    Article  CAS  PubMed  Google Scholar 

  37. Obenauer, S., Hermann, K.P., and Grabbe, E., Applications and literature review of the BI-RADS classification. Eur Radiol. 15(5):1027–1036, 2005. doi:10.1007/s00330-004-2593-9.

    Article  CAS  PubMed  Google Scholar 

  38. Fischer EA, Lo JY, Markey MK (2004) Bayesian networks of BI-RADS descriptors for breast lesion classification. Annual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings 4:3031–3034. issn: 0589–1019

  39. Elter, M., Schulz-Wendtland, R., and Wittenberg, T., The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med Phys. 34(11):4164–4172, 2007. doi:10.1118/1.2786864.

    Article  CAS  PubMed  Google Scholar 

  40. Lacquement, M.A., Mitchell, D., and Hollingsworth, A.B., positive predictive value of the breast imaging reporting and data system. J Am Coll Surg. 189(1):34–40, 1999. doi:10.1016/S1072-7515(99)00080-0.

    Article  CAS  PubMed  Google Scholar 

  41. Burnside, E.S., Davis, J., Chhatwal, J., Alagoz, O., Lindstrom, M.J., Geller, B.M., Littenberg, B., Shaffer, K.A., Kahn Jr., C.E., and Page, C.D., Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. Radiology. 251(3):663–672, 2009. doi:10.1148/radiol.2513081346.

    Article  PubMed Central  Google Scholar 

  42. Mandelson, M.T., Oestreicher, N., Porter, P.L., White, D., Finder, C.A., Taplin, S.H., and White, E., Breast density as a predictor of mammographic detection: comparison of interval- and screen-detected cancers. J Natl Cancer Ins. 92(13):1081–1087, 2000. doi:10.1093/jnci/92.13.1081.

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the Breast Cancer Digital Repository Consortium of University of Porto and in particular Doctor Mário Vaz and Doctor Miguel Guevara who allowed the access to the BCDR dataset.

This work is supported by FEDER Funds through COMPETE program and by National Funds through FCT under the project UID/EEA/00760/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joana Diz.

Additional information

This article is part of the Topical Collection on Systems-Level Quality Improvement

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Diz, J., Marreiros, G. & Freitas, A. Applying Data Mining Techniques to Improve Breast Cancer Diagnosis. J Med Syst 40, 203 (2016). https://doi.org/10.1007/s10916-016-0561-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-016-0561-y

Keywords

Navigation