Applying Data Mining Techniques to Improve Breast Cancer Diagnosis

Diz, Joana; Marreiros, Goreti; Freitas, Alberto

doi:10.1007/s10916-016-0561-y

Applying Data Mining Techniques to Improve Breast Cancer Diagnosis

Systems-Level Quality Improvement
Published: 06 August 2016

Volume 40, article number 203, (2016)
Cite this article

Journal of Medical Systems Aims and scope Submit manuscript

Joana Diz¹,
Goreti Marreiros² &
Alberto Freitas^1,3

1513 Accesses
41 Citations
3 Altmetric
Explore all metrics

Abstract

In the field of breast cancer research, and more than ever, new computer aided diagnosis based systems have been developed aiming to reduce diagnostic tests false-positives. Within this work, we present a data mining based approach which might support oncologists in the process of breast cancer classification and diagnosis. The present study aims to compare two breast cancer datasets and find the best methods in predicting benign/malignant lesions, breast density classification, and even for finding identification (mass / microcalcification distinction). To carry out these tasks, two matrices of texture features extraction were implemented using Matlab, and classified using data mining algorithms, on WEKA. Results revealed good percentages of accuracy for each class: 89.3 to 64.7 % - benign/malignant; 75.8 to 78.3 % - dense/fatty tissue; 71.0 to 83.1 % - finding identification. Among the different tests classifiers, Naive Bayes was the best to identify masses texture, and Random Forests was the first or second best classifier for the majority of tested groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Data Mining Techniques to Support Breast Cancer Diagnosis

Feature Selection and Diagnosis Performance Evaluation of Breast Cancer

Breast Cancer Prediction and Trail Using Machine Learning and Image Processing

References

Ferlay, J., Soerjomataram, I., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D.M., Forman, D., and Bray, F., Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 136(5):E359–E386, 2015. doi:10.1002/ijc.29210.
Article CAS PubMed Google Scholar
American Cancer Society, Cancer facts and figs. 2016. American Cancer Society, Atlanta, Ga, 2016.
Google Scholar
de Oliveira, J.E., Machado, A.M., Chavez, G.C., Lopes, A.P., Deserno, T.M., and Araujo Ade, A., MammoSys: A content-based image retrieval system using breast density patterns. Comput Methods Prog Biomed. 99(3):289–297, 2010. doi:10.1016/j.cmpb.2010.01.005.
Article Google Scholar
Matheus, B., and Schiabel, H., A CADx scheme in mammography: considerations on a novel approach. In: ADVCOMP 2013. The Seventh International Conference on Advanced Engineering Computing and Applications in Sciences. 2013:15–18, 2013.
Google Scholar
Moura, D.C., and Guevara Lopez, M.A., An evaluation of image descriptors combined with clinical data for breast cancer diagnosis. Int J Comput Assist Radiol Surg. 8(4):561–574, 2013. doi:10.1007/s11548-013-0838-2.
Article PubMed Google Scholar
Dong, M., Lu, X., Ma, Y., Guo, Y., Ma, Y., and Wang, K., An efficient approach for automated mass segmentation and classification in mammograms. J Digit Imaging. 28(5):613–625, 2015. doi:10.1007/s10278-015-9778-4.
Article PubMed Google Scholar
Ogiela, L., Computational intelligence in cognitive healthcare information systems. In: Bichindaritz, I., Vaidya, S., Jain, A., and Jain, L.C. (Eds.), Computational intelligence in healthcare 4: Advanced methodologies. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp. 347–369, 2010. doi:10.1007/978-3-642-14464-6_16.
Chapter Google Scholar
Ogiela, L., Semantic analysis and biological modelling in selected classes of cognitive information systems. Math Comput Model. 58(5–6):1405–1414, 2013. doi:10.1016/j.mcm.2012.12.001.
Article Google Scholar
Pérez, N., Silva, A., and Ramos, I., Ensemble features selection method as tool for breast cancer classification. Int J Image Min. 1(2–3):224–244, 2015. doi:10.1504/IJIM.2015.073019.
Article Google Scholar
Kuusisto, F., Dutra, I., Elezaby, M., Mendonça, E.A., Shavlik, J., and Burnside, E.S., Leveraging expert knowledge to improve machine-learned decision support systems. AMIA Summits Transl Sci Proceed. 2015:87–91, 2015.
Google Scholar
Diz, J., Marreiros, G., and Freitas, A., Using data mining techniques to support breast cancer diagnosis. New Contributions in Information Systems and Technologies. Springer, In, pp. 689–700, 2015. doi:10.1007/978-3-319-16486-1_68.
Google Scholar
Tseng, W.T., Chiang, W.F., Liu, S.Y., Roan, J., and Lin, C.N., The application of data mining techniques to oral cancer prognosis. J Med Syst. 39(5):59, 2015. doi:10.1007/s10916-015-0241-3.
Article PubMed Google Scholar
Malucelli, A., Stein Junior, A., Bastos, L., Carvalho, D., Cubas, M.R., and Paraíso, E.C., Classification of risk micro-areas using data mining. Rev Saude Publica. 44(2):292–300, 2010. doi:10.1590/S0034-89102010000200009.
Article PubMed Google Scholar
Force UPST, Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med. 151(10):716–726 , 2009. doi:10.7326/0003-4819-151-10-200911170-00008.W-236
Article Google Scholar
D’Orsi, C.J., Sickles, E.A., Mendelson, E.B., Morris, E.A., et al., ACR BI-RADS® atlas, breast imaging reporting and data system. Reston, VA, American College of Radiology, 2013.
Google Scholar
Boyd, N.F., Martin, L.J., Bronskill, M., Yaffe, M.J., Duric, N., and Minkin, S., Breast tissue composition and susceptibility to breast cancer. J Nat Cancer Inst. 102(16):1224–1237, 2010. doi:10.1093/jnci/djq239.
Article PubMed PubMed Central Google Scholar
Gierach, G.L., Ichikawa, L., Kerlikowske, K., Brinton, L.A., Farhat, G.N., Vacek, P.M., Weaver, D.L., Schairer, C., Taplin, S.H., and Sherman, M.E., Relationship between mammographic density and breast cancer death in the breast cancer surveillance consortium. J Nat Cancer Inst. 104(16):1218–1227, 2012. doi:10.1093/jnci/djs327.
Article PubMed PubMed Central Google Scholar
López MAG, Posada N, Moura DC, Pollán RR, Valiente JMF, Ortega CS, Solar M, Diaz-Herrero G, Ramos I, Loureiro J, Fernandes TC, Araújo BMF. (2012) BCDR: a breast cancer digital repository. In: 15th International Conference on Experimental Mechanics, FEUP-EURASEM-APAET, Porto/Portugal, 22–27 July 2012. ISBN: 978–972–8826-26-02.
Suri JS, Wilson DL, Laxminarayan S (2005) Handbook of biomedical image analysis, vol 2. Springer Science & Business Media. doi:10.1007/b104806
Carneiro P, Patrocinio (2014) A Análise de atributos de intensidade e textura na classificação de densidade mamária. In: XXIV Congresso Brasileiro de Engenharia Biomédica – CBEB 2014, pp 634–637
Meselhy Eltoukhy, M., Faye, I., and Belhaouari Samir, B., A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation. Comput Biol Med. 42(1):123–128, 2012. doi:10.1016/j.compbiomed.2011.10.016.
Article PubMed Google Scholar
Mohanty, A.K., Senapati, M.R., Beberta, S., and Lenka, S.K., Texture-based features for classification of mammograms using decision tree. Neural Comput Applic. 23(3–4):1011–1017, 2013. doi:10.1007/s00521-012-1025-z.
Article Google Scholar
Nanni, L., Brahnam, S., Ghidoni, S., Menegatti, E., and Barrier, T., Different approaches for extracting information from the co-occurrence matrix. PloS one. 8(12):e83554, 2013. doi:10.1371/journal.pone.0083554.
Article PubMed PubMed Central Google Scholar
Hsu, J.L., Hung, P.C., Lin, H.Y., and Hsieh, C.H., Applying under-sampling techniques and cost-sensitive learning methods on risk assessment of breast cancer. J Med Syst. 39(4):210, 2015. doi:10.1007/s10916-015-0210-x.
Article PubMed Google Scholar
Pérez N, Guevara MA, Silva A, Ramos I, Loureiro J (2014) Improving the performance of machine learning classifiers for Breast Cancer diagnosis based on feature selection. In: Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on. IEEE, pp 209–217. doi:10.15439/2014F249
Bueno, G., Vállez, N., Déniz, O., Esteve, P., Rienda, M.A., Arias, M., and Pastor, C., Automatic breast parenchymal density classification integrated into a CADe system. Int J Comput Assist Radiol Surg. 6(3):309–318, 2011. doi:10.1007/s11548-010-0510-z.
Article CAS PubMed Google Scholar
Ramos-Pollán, R., Guevara-López, M.A., Suárez-Ortega, C., Díaz-Herrero, G., Franco-Valiente, J.M., Rubio-del-Solar, M., González-de-Posada, N., Vaz, M.A.P., Loureiro, J., and Ramos, I., Discovering mammography-based machine learning classifiers for breast cancer diagnosis. J Med Syst. 36(4):2259–2269, 2012. doi:10.1007/s10916-011-9693-2.
Article PubMed Google Scholar
Oliver A, Freixenet J, Martí R, Zwiggelaar R (2006) A comparison of breast tissue classification techniques. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2006. Springer, pp 872–879. doi:10.1007/11866763_107
Lesniak, J., Hupse, R., Blanc, R., Karssemeijer, N., and Székely, G., Comparative evaluation of support vector machine classification for computer aided detection of breast masses in mammography. Phys Med Biol. 57(16):5295–5307, 2012. doi:10.1088/0031-9155/57/16/5295.
Article CAS PubMed Google Scholar
Janitza, S., Strobl, C., and Boulesteix, A.-L., An AUC-based permutation variable importance measure for random forests. BMC bioinformatics. 14:119, 2013. doi:10.1186/1471-2105-14-119.
Article PubMed PubMed Central Google Scholar
Luo, S.T., and Cheng, B.W., Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst. 36(2):569–577, 2012. doi:10.1007/s10916-010-9518-8.
Article PubMed Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. ISBN:0120884070
Castella, C., Kinkel, K., Eckstein, M.P., Sottas, P.-E., Verdun, F.R., and Bochud, F.O., Semiautomatic mammographic parenchymal patterns classification using multiple statistical features. Acad Radiol. 14(12):1486–1499, 2007. doi:10.1016/j.acra.2007.07.014.
Article PubMed Google Scholar
Fonseca, J., Pre-CADs in breast cancer. FEUP, MSc Thesis in Engenharia Eletrotécnica e de Computadores, 2013.
Google Scholar
Benndorf, M., Kotter, E., Langer, M., Herda, C., Wu, Y., and Burnside, E.S., Development of an online, publicly accessible naive Bayesian decision support tool for mammographic mass lesions based on the American College of Radiology (ACR) BI-RADS lexicon. Eur Radiol. 25(6):1768–1775, 2015. doi:10.1007/s00330-014-3570-6.
Article PubMed PubMed Central Google Scholar
Timmers, J.M.H., van Doorne-Nagtegaal, H.J., Verbeek, A.L.M., den Heeten, G.J., and Broeders, M.J.M., A dedicated BI-RADS training programme: effect on the inter-observer variation among screening radiologists. Eur J Radiol. 81(9):2184–2188, 2012. doi:10.1016/j.ejrad.2011.07.011.
Article CAS PubMed Google Scholar
Obenauer, S., Hermann, K.P., and Grabbe, E., Applications and literature review of the BI-RADS classification. Eur Radiol. 15(5):1027–1036, 2005. doi:10.1007/s00330-004-2593-9.
Article CAS PubMed Google Scholar
Fischer EA, Lo JY, Markey MK (2004) Bayesian networks of BI-RADS descriptors for breast lesion classification. Annual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings 4:3031–3034. issn: 0589–1019
Elter, M., Schulz-Wendtland, R., and Wittenberg, T., The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med Phys. 34(11):4164–4172, 2007. doi:10.1118/1.2786864.
Article CAS PubMed Google Scholar
Lacquement, M.A., Mitchell, D., and Hollingsworth, A.B., positive predictive value of the breast imaging reporting and data system. J Am Coll Surg. 189(1):34–40, 1999. doi:10.1016/S1072-7515(99)00080-0.
Article CAS PubMed Google Scholar
Burnside, E.S., Davis, J., Chhatwal, J., Alagoz, O., Lindstrom, M.J., Geller, B.M., Littenberg, B., Shaffer, K.A., Kahn Jr., C.E., and Page, C.D., Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. Radiology. 251(3):663–672, 2009. doi:10.1148/radiol.2513081346.
Article PubMed Central Google Scholar
Mandelson, M.T., Oestreicher, N., Porter, P.L., White, D., Finder, C.A., Taplin, S.H., and White, E., Breast density as a predictor of mammographic detection: comparison of interval- and screen-detected cancers. J Natl Cancer Ins. 92(13):1081–1087, 2000. doi:10.1093/jnci/92.13.1081.
Article CAS Google Scholar

Download references

Acknowledgments

The authors would like to thank the Breast Cancer Digital Repository Consortium of University of Porto and in particular Doctor Mário Vaz and Doctor Miguel Guevara who allowed the access to the BCDR dataset.

This work is supported by FEDER Funds through COMPETE program and by National Funds through FCT under the project UID/EEA/00760/2013.

Author information

Authors and Affiliations

CINTESIS - Center for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
Joana Diz & Alberto Freitas
GECAD - Research Group on Intelligent Engineering and Computing for Advanced Innovation and Development, Institute of Engineering-Polytechnic of Porto, Porto, Portugal
Goreti Marreiros
CIDES - Department of Health Information and Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
Alberto Freitas

Authors

Joana Diz
View author publications
You can also search for this author in PubMed Google Scholar
Goreti Marreiros
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Freitas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joana Diz.

Additional information

This article is part of the Topical Collection on Systems-Level Quality Improvement

Rights and permissions

Reprints and permissions

About this article

Cite this article

Diz, J., Marreiros, G. & Freitas, A. Applying Data Mining Techniques to Improve Breast Cancer Diagnosis. J Med Syst 40, 203 (2016). https://doi.org/10.1007/s10916-016-0561-y

Download citation

Received: 01 March 2016
Accepted: 25 July 2016
Published: 06 August 2016
DOI: https://doi.org/10.1007/s10916-016-0561-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying Data Mining Techniques to Improve Breast Cancer Diagnosis

Abstract

Access this article

Similar content being viewed by others

Using Data Mining Techniques to Support Breast Cancer Diagnosis

Feature Selection and Diagnosis Performance Evaluation of Breast Cancer

Breast Cancer Prediction and Trail Using Machine Learning and Image Processing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Applying Data Mining Techniques to Improve Breast Cancer Diagnosis

Abstract

Access this article

Similar content being viewed by others

Using Data Mining Techniques to Support Breast Cancer Diagnosis

Feature Selection and Diagnosis Performance Evaluation of Breast Cancer

Breast Cancer Prediction and Trail Using Machine Learning and Image Processing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation