Skip to main content

Advertisement

Log in

Feature fusion based machine learning pipeline to improve breast cancer prediction

  • 1218: Engineering Tools and Applications in Medical Imaging
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Early detection of malignant breast cancer can significantly improve the survival chances of the involved patients. Analysis of a non-invasive and non-radioactive modality like ultrasound imaging with the help of Machine Learning(ML) and Artificial Intelligence(AI) techniques can be crucial for achieving such effective early-stage detection of the disease. A feature fusion based approach is proposed in this work, in conjunction with an ML pipeline that systematically deals with various problems like high dimensionality, class imbalance, and hyperparameter tuning, so that efficient benign vs. malignant classification can be performed. Experimental evaluation on two publicly available datasets reveals that the proposed approach is able to outperform state-of-the-art techniques on the classification task with an overall performance of above 95% for all the evaluation metrics under consideration and an AUC of \(\sim \)0.99. More specifically, an overall improvement of (1-4)%, (2-10)% and (2-7)% over the current state-of-the-art approaches could be obtained for the Accuracy, AUC and Sensitivity metrics respectively, on both the datasets. Such an efficient approach can provide the necessary real-time decision support to the involved radiologists, making better cancer patient care possible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Code Availability

All the code corresponding to the experimental evaluation of the proposed approach will be made available at https://github.com/arnkmish.

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: a system for large-scale machine learning. In: 12Th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), pp 265–283

  2. Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A (2020) Dataset of breast ultrasound images. Data Brief 28:104863

    Article  Google Scholar 

  3. Asraf A, Islam MZ, Haque MR, Islam MM (2020) Deep learning applications to combat novel coronavirus (covid-19) pandemic. SN Comput Sci 1(6):1–7

    Article  Google Scholar 

  4. Ayon SI, Islam M, et al. (2019) Diabetes prediction: a deep learning approach. Int J Inf Eng Electr Bus 11(2)

  5. Ayon SI, Islam MM, Hossain MR (2020) Coronary artery heart disease prediction: a comparative study of computational intelligence techniques. IETE J Res :1–20

  6. Bradski G, Kaehler A (2000) Opencv. Dr. Dobb’s. J Softw Tools 3

  7. Byra M, Galperin M, Ojeda-Fournier H, Olson L, O’Boyle M, Comstock C, Andre M (2019) Breast mass classification in sonography with transfer learning using a deep convolutional neural network and color conversion. Med Phys 46(2):746–755

    Article  Google Scholar 

  8. Cai L, Wang X, Wang Y, Guo Y, Yu J, Wang Y (2015) Robust phase-based texture descriptor for classification of breast ultrasound images. Biomed Eng Online 14(1):26

    Article  Google Scholar 

  9. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  10. Coelho LP (2012) Mahotas: Open source software for scriptable computer vision. arXiv:1211.4907

  11. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 886–893

  12. Daoud MI, Abdel-Rahman S, Bdair TM, Al-Najar MS, Al-Hawari FH, Alazrai R (2020) Breast tumor classification in ultrasound images using combined deep and handcrafted features. Sensors 20(23):6838

    Article  Google Scholar 

  13. Das A, Rana S (2021) Exploring residual networks for breast cancer detection from ultrasound images. In: 2021 12Th international conference on computing communication and networking technologies (ICCCNT), IEEE, pp 1–6

  14. Das S, Mishra A, Roy P (2018) Automatic diabetes prediction using tree based ensemble learners. In: Proceedings of international conference on computational intelligence and IoT (ICCIIoT)

  15. Das SK, Roy P, Mishra AK (2021) Deep learning techniques dealing with diabetes mellitus: a comprehensive study. In: Health informatics: a computational perspective in healthcare, Springer, pp 295–323

  16. Das SK, Roy P, Mishra AK (2021) Dfu_spnet: a stacked parallel convolution layers based cnn to improve diabetic foot ulcer classification ICT Express

  17. Das SK, Roy P, Mishra AK (2021) Fusion of handcrafted and deep convolutional neural network features for effective identification of diabetic foot ulcer. Concurr Comput Pract Experience :e6690

  18. Das SK, Roy P, Mishra AK (2021) Oversample-select-tune: a machine learning pipeline for improving diabetes identification. Concurr Comput Pract Experience :e6741

  19. Das SK, Roy P, Mishra AK (2022) Recognition of ischaemia and infection in diabetic foot ulcer: a deep convolutional neural network based approach. Int J Imaging Syst Technol 32(1):192–208

    Article  Google Scholar 

  20. Elreedy D, Atiya AF (2019) Acomprehensiveanalysisofsyntheticminorityoversamplingtechnique(smote)forhandlingclassimbalance. InfSci 505:32–64

    Google Scholar 

  21. Feurer M, Hutter F (2019) Hyperparameteroptimization. In: Automatedmachinelearning. Springer, Cham, pp 3–33

  22. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generativeadversarialnetworks.arXiv:1406.2661

  23. Haque MR, Islam MM, Iqbal H, Reza MS, Hasan MK (2018) Performanceevaluationofrandomforestsandartificialneuralnetworksfortheclassificationofliverdisorder. In: 2018Internationalconferenceoncomputer,communication,chemical,materialandelectronicengineering(IC4ME2),IEEE,pp 1–5

  24. Haralick RM, Shanmugam K, Dinstein IH (1973) Texturalfeaturesforimageclassification.IEEETransSystManCybern(6):610–621

  25. Harris CR, Millman KJ, vanderWalt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, etal. (2020) Arrayprogrammingwithnumpy. Nature 585(7825):357–362

    Article  Google Scholar 

  26. Hasan MK, Islam MM, Hashem M (2016) Mathematicalmodeldevelopmenttodetectbreastcancerusingmultigenegeneticprogramming. In: 20165Thinternationalconferenceoninformatics,electronicsandvision(ICIEV),IEEE,pp 574–579

  27. He K, Zhang X, Ren S, Sun J (2016) Deepresiduallearningforimagerecognition. In: ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pp 770–778

  28. Huang G, Liu Z, VanDerMaaten L, Weinberger KQ (2017) Denselyconnectedconvolutionalnetworks. In: ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pp 4700–4708

  29. Islam MM, Haque MR, Iqbal H, Hasan MM, Hasan M, Kabir MN (2020) Breastcancerprediction:acomparativestudyusingmachinelearningtechniques. SNComputSci 1(5):1–14

    Google Scholar 

  30. Islam MM, Iqbal H, Haque MR, Hasan MK (2017) Predictionofbreastcancerusingsupportvectormachineandk-nearestneighbors. In: 2017IEEERegion10humanitariantechnologyconference(r10-HTC),IEEE,pp 226–229

  31. Islam MM, Karray F, Alhajj R, Zeng J (2021) Areviewondeeplearningtechniquesforthediagnosisofnovelcoronavirus(covid-19). IEEEAccess 9:30551–30572

    Google Scholar 

  32. Islam MZ, Islam MM, Asraf A (2020) Acombineddeepcnn-lstmnetworkforthedetectionofnovelcoronavirus(covid-19)usingx-rayimages. InfMedUnlocked 20:100412

    Google Scholar 

  33. Jain D, Borah MD, Biswas A (2020) Fine-tuningtextrankforlegaldocumentsummarization:abayesianoptimizationbasedapproach. In: Forumforinformationretrievalevaluation,pp 41–48

  34. Jain D, Mishra AK, Das SK (2021) Machinelearningbasedautomaticpredictionofparkinson’sdiseaseusingspeechfeatures. In: Proceedingsofinternationalconferenceonartificialintelligenceandapplications,Springer,pp 351–362

  35. Kim SH, Kang BJ, Choi BG, Choi JJ, Lee JH, Song BJ, Choe BJ, Park S, Kim H (2013) Radiologists’performancefordetectinglesionsandtheinterobservervariabilityofautomatedwholebreastultrasound. KoreanJRadiol 14(2):154–163

    Google Scholar 

  36. Kingma DP, Ba J (2014) Adam:amethodforstochasticoptimization.arXiv:1412.6980

  37. Lazarus E, Mainiero MB, Schepps B, Koelliker SL, Livingston LS (2006) Bi-radslexiconforusandmammography:interobservervariabilityandpositivepredictivevalue. Radiology 239(2):385–391

    Article  Google Scholar 

  38. Li H, Xu Z, Taylor G, Studer C, Goldstein T (2017) Visualizingthelosslandscapeofneuralnets.arXiv:1712.09913

  39. Lo CM, Chang R, Huang C, Moon W (2015) Computer-aideddiagnosisofbreasttumorsusingtexturesfromintensitytransformedsonographicimages. In: 1Stglobalconferenceonbiomedicalengineering&9thasian-pacificconferenceonmedicalandbiologicalengineering,Springer,pp 124–127

  40. Löfstedt T, Brynolfsson P, Asklund T, Nyholm T, Garpebring A (2019) Gray-levelinvariantharalicktexturefeatures. PloSONE 14(2):e0212110

    Article  Google Scholar 

  41. Long J, Shelhamer E, Darrell T (2015) Fullyconvolutionalnetworksforsemanticsegmentation. In: ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pp 3431–3440

  42. Mishra AK, Das SK, Roy P, Bandyopadhyay S (2020) Identifyingcovid19fromchestctimages:adeepconvolutionalneuralnetworksbasedapproach.JHealthcEng:2020

  43. Mishra AK, Roy P, Bandyopadhyay S (2019) Geneticalgorithmbasedselectionofappropriatebiomarkersforimprovedbreastcancerprediction. In: ProceedingsofSAIintelligentsystemsconference,Springer,pp 724–732

  44. Mishra AK, Roy P, Bandyopadhyay S (2021) Binaryparticleswarmoptimizationbasedfeatureselection(bpso-fs)forimprovingbreastcancerprediction. In: Proceedingsofinternationalconferenceonartificialintelligenceandapplications,Springer,pp 373–384

  45. Mishra AK, Roy P, Bandyopadhyay S, Das SK (2021) Breastultrasoundtumourclassification:amachinelearning—radiomicsbasedapproach.ExpertSyst:e12713

  46. Mockus J (2012) Bayesianapproachtoglobaloptimization:theoryandapplications.SpringerSciBusMedia37

  47. Moon WK, Lee YW, Ke HH, Lee SH, Huang CS, Chang RF (2020) Computer-aideddiagnosisofbreastultrasoundimagesusingensemblelearningfromconvolutionalneuralnetworks. ComputMethodsProgBiomed 190:105361

    Google Scholar 

  48. Moura DC, López MAG (2013) Anevaluationofimagedescriptorscombinedwithclinicaldataforbreastcancerdiagnosis. IntJComputAssistRadiolSurg 8(4):561–574

    Google Scholar 

  49. Muhammad L, Islam MM, Usman SS, Ayon SI (2020) Predictivedataminingmodelsfornovelcoronavirus(covid-19)infectedpatients’recovery. SNComputSci 1(4):1–7

    Google Scholar 

  50. Park CS, Kim SH, Jung NY, Choi JJ, Kang BJ, Jung HS (2015) Interobservervariabilityofultrasoundelastographyandtheultrasoundbi-radslexiconofbreastlesions. BreastCancer 22(2):153–160

    Google Scholar 

  51. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn:machinelearninginPython. JMachLearnRes 12:2825–2830

    MATH  Google Scholar 

  52. Rahman MM, Islam M, Manik M, Hossen M, Al-Rakhami MS, etal. (2021) Machinelearningapproachesfortacklingnovelcoronavirus(covid-19)pandemic. SnComputSci 2(5):1–10

    Google Scholar 

  53. Rodriguez-Cristerna A, Guerrero-Cedillo C, Donati-Olvera G, Gómez-Flores W, Pereira W (2017) Studyoftheimpactofimagepreprocessingapproachesonthesegmentationandclassificationofbreastlesionsonultrasound. In: 201714Thinternationalconferenceonelectricalengineering,computingscienceandautomaticcontrol(CCE),IEEE,pp 1–4

  54. Ronneberger O, Fischer P, Brox T (2015) U-net:convolutionalnetworksforbiomedicalimagesegmentation. In: Internationalconferenceonmedicalimagecomputingandcomputer-assistedintervention,Springer,pp 234–241

  55. Sadad T, Hussain A, Munir A, Habib M, AliKhan S, Hussain S, Yang S, Alawairdhi M (2020) Identificationofbreastmalignancybymarker-controlledwatershedtransformationandhybridfeaturesetforhealthcare. ApplSci 10(6):1900

    Google Scholar 

  56. Sadoughi F, Kazemy Z, Hamedan F, Owji L, Rahmanikatigari M, Azadboni TT (2018) Artificialintelligencemethodsforthediagnosisofbreastcancerbyimageprocessing:areview. BreastCancerTargetsTher 10:219

    Google Scholar 

  57. Saha P, Sadi MS, Islam MM (2021) Emcnet:automatedcovid-19diagnosisfromx-rayimagesusingconvolutionalneuralnetworkandensembleofmachinelearningclassifiers. InfMedUnlocked 22:100505

    Google Scholar 

  58. Shi X, Cheng HD, Hu L, Ju W, Tian J (2010) Detectionandclassificationofmassesinbreastultrasoundimages. DigitSignalProcess 20(3):824–836

    Google Scholar 

  59. Simonyan K, Zisserman A (2014) Verydeepconvolutionalnetworksforlarge-scaleimagerecognition.arXiv:1409.1556

  60. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinkingtheinceptionarchitectureforcomputervision. In: ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pp 2818–2826

  61. Verma K, Singh BK, Tripathi P, Thoke A (2015) Reviewoffeatureselectionalgorithmsforbreastcancerultrasoundimage. In: Newtrendsinintelligentinformationanddatabasesystems,Springer,pp 23–32

  62. Victoria AH, Maragatham G (2021) Automatictuningofhyperparametersusingbayesianoptimization. EvolvingSyst 12:217–223

    Google Scholar 

  63. WCRF (2020) Breastcancerstatistics2018. https://www.wcrf.org/dietandcancer/cancer-trends/breast-cancer-statistics

  64. Wu J, Chen XY, Zhang H, Xiong LD, Lei H, Deng SH (2019) Hyperparameteroptimizationformachinelearningmodelsbasedonbayesianoptimization. JElectrSciTechnol 17(1):26–40. https://doi.org/10.11989/JEST.1674-862X.80904120.https://www.sciencedirect.com/science/article/pii/S1674862X19300047

    Google Scholar 

  65. Wu T, Sultan LR, Tian J, Cary TW, Sehgal CM (2019) Machinelearningfordiagnosticultrasoundoftriple-negativebreastcancer. BreastCancerResTreat 173(2):365–373

    Google Scholar 

  66. Xiao T, Liu L, Li K, Qin W, Yu S, Li Z (2018) Comparisonoftransferreddeepneuralnetworksinultrasonicbreastmassesdiscrimination.BioMedResInt:2018

  67. Xie J, Song X, Zhang W, Dong Q, Wang Y, Li F, Wan C (2020) Anovelapproachwithdual-samplingconvolutionalneuralnetworkforultrasoundimageclassificationofbreasttumors. PhysMedBiol 65(24):245001

    Google Scholar 

  68. Yang MC, Moon WK, Wang YCF, Bae MS, Huang CS, Chen JH, Chang RF (2013) Robusttextureanalysisusingmulti-resolutiongray-scaleinvariantfeaturesforbreastsonographictumordiagnosis. IEEETransMedImaging 32(12):2262–2273

    Google Scholar 

  69. Yap MH, Pons G, Martí J, Ganau S, Sentís M, Zwiggelaar R, Davison AK, Marti R (2017) Automatedbreastultrasoundlesionsdetectionusingconvolutionalneuralnetworks. IEEEJBiomedHealthInf 22(4):1218–1226

    Google Scholar 

  70. Zhang E, Seiler S, Chen M, Lu W, Gu X (2019) Boundary-awaresemi-superviseddeeplearningforbreastultrasoundcomputer-aideddiagnosis. In: 201941StannualinternationalconferenceoftheIEEEengineeringinmedicineandbiologysociety(EMBC),IEEE,pp 947–950

  71. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++:anestedu-netarchitectureformedicalimagesegmentation. In: Deeplearninginmedicalimageanalysisandmultimodallearningforclinicaldecisionsupport,Springer,pp 3–11

Download references

Acknowledgements

We thank Dr. Moi Hoon Yap, Reader in Computer Vision, Manchester Metropolitan University, Department of Computing, Mathematics and Digital Technology, John Dalton Building, Chester Street, Manchester M1 5GD, one of the PIs of the Breast Ultrasound Lesions Dataset(Dataset B) [69] for granting access to the UDIAT dataset to carry out academic research.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnab Kumar Mishra.

Ethics declarations

Competing interests

The authors declare that there are no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Data availability

The BUSI dataset utilized in this work for experimental evaluation purpose is a publicly available dataset introduced by [2]. The UDIAT dataset is another publicly available dataset utilized in this work which can be accessed by following official request procedure as suggested by the authors in [69].

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mishra, A.K., Roy, P., Bandyopadhyay, S. et al. Feature fusion based machine learning pipeline to improve breast cancer prediction. Multimed Tools Appl 81, 37627–37655 (2022). https://doi.org/10.1007/s11042-022-13498-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13498-4

Keywords

Navigation