Skip to main content

Advertisement

Log in

A new intelligent hybrid feature extraction model for automating cancer diagnosis: a focus on breast cancer

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Cancer, or malignant tumor, is a group of diseases that arises from the abnormal proliferation of body cells, which have the ability to invade or spread to other parts of the body. Many researchers have proposed various methods to detect breast cancer; however, the accuracy of these methods has often been insufficient due to ineffective features selection and a lack of appropriate analytical techniques. To address this issue, we need an accurate feature extraction model. In this paper, we propose an intelligent hybrid feature extraction model for automating cancer diagnosis (IHFEACD) with high accuracy. This mathematical model generates more efficient features based on the structure of previous feature formulas. Furthermore, the proposed model combines new features with existing ones to create a new feature space for early cancer detection. Although this model can be applied to detect different types of cancer, we focus on breast cancer in women as our case study. To validate our approach, we investigated the mammographic image analysis society (MIAS) database and curated the breast imaging subset of digital database for screening mammography (CBIS-DDSM). The results indicate that the proposed method effectively classifies normal/abnormal and benign/malignant cases. By optimizing the feature structure in this new space, we have achieved improved accuracy in breast cancer diagnosis. The simulation results demonstrate high performance, showing an accuracy of 99.8%, sensitivity of 98%, and specificity of 99.4% using the naive bayes (NB) classifier on the MIAS database. Additionally, the proposed IHFEACD approach outperforms other methods in terms of accuracy metrics, achieving a 0.8 training test rate on the MIAS database, along with improvements of 0.3%, 1%, 6.8%, and 0.5% compared to IAIS-ABC-CDS, CADx, OKMT-SGO, and ANN-t-SNE approaches, respectively. For the CBIS-DDSM database, the performance results for breast cancer detection are also remarkable, with an accuracy of 99.5%, sensitivity of 98.8%, and specificity of 99.3% using both simple and naive bayes classifiers. This research provides a clearer picture of the robustness of the model across different databases. The proposed approach demonstrates significant improvements compared to previous methods from various comparative perspectives. Finally, this model has the potential to assist medical professionals in making informed decisions regarding breast cancer diagnosis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

No datasets were generated or analyzed during the current study.

References

  1. Giaquinto AN, Sung H, Miller KD, Kramer JL, Newman LA, Minihan A, Jemal A, Siegel RL (2022) Breast cancer statistics. CA Cancer J Clin 72(6):524–541. https://doi.org/10.3322/caac.21754

    Article  Google Scholar 

  2. Boutry J, Tissot S, Ujvari B, Capp JP, Giraudeau M, Nedelcu AM (1877) Thomas F (2022) The evolution and ecology of benign tumors. Biochim Biophys Acta Rev Cancer 1:188643

    Google Scholar 

  3. Bisoyi A (2022) Ownership, liability, patentability, and creativity issues in artificial intelligence. Info Securit Jurnal 31(4):377–386. https://doi.org/10.1080/19393555.2022.2060879

    Article  Google Scholar 

  4. Nayak DSK, Mohapatra S, Al-Dabass D, Swarnkar T (2023) Deep learning approaches for high dimension cancer microarray data feature prediction: a review. Computational intelligence in cancer diagnosis. Elsevier, pp 13–41. https://doi.org/10.1016/B978-0-323-85240-1.00018-3

    Chapter  Google Scholar 

  5. Melekoodappattu JG, Subbian PS (2023) Automated breast cancer detection using hybrid extreme learning machine classifier. J Ambient Intell Humaniz Comput 14(5):5489–5498. https://doi.org/10.1007/s12652-020-02359-3

    Article  MATH  Google Scholar 

  6. Chaieb R, Kalti K (2018) Feature subset selection for classification of malignant and benign breast masses in digital mammography. Pattern Anal Appl 22:803–829

    Article  MathSciNet  MATH  Google Scholar 

  7. Srikantamurthy MM, Rallabandi VPS, Dudekula DB, Natarajan S, Park J (2023) Classification of benign and malignant subtypes of breast cancer histopathology imaging using hybrid CNN-LSTM based transfer learning. BMC Med Imag 23(1):1–15

    Article  Google Scholar 

  8. Talukder MA, Islam MM, Uddin MA, Akhter A, Hasan KF, Moni MA (2022) Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning. Expert System Application 205:117605

    Article  Google Scholar 

  9. Gupta S, Agrawal S, Singh SK, Kumar S (2023) A novel transfer learning-based model for ultrasound breast cancer image classification. In: Smys S, João MR, Tavares S, Shi F (eds) Computational Vision and bio-inspired computing: proceedings of ICCVBIC 2022. Springer Nature Singapore, Singapore, pp 511–523. https://doi.org/10.1007/978-981-19-9819-5_37

    Chapter  MATH  Google Scholar 

  10. Punitha S, Turjman FA, Stephan T (2021) An automated breast cancer diagnosis using feature selection and parameter optimization in ANN. Comput Electr Eng 90:106958

    Article  MATH  Google Scholar 

  11. Ogundokun RO, Misra S, Douglas M, Damaševičius R, Maskeliūnas R (2022) Medical Internet-of-Things based breast cancer diagnosis using hyperparameter-optimized neural networks. Future Internet 14(5):153

    Article  MATH  Google Scholar 

  12. Sharmin S, Tanvir Ahammad Md, Talukder A, Ghose P (2023) A hybrid dependable deep feature extraction and ensemble-based machine learning approach for breast cancer detection. IEEE Access 11:87694–87708. https://doi.org/10.1109/ACCESS.2023.3304628

    Article  Google Scholar 

  13. Keshta I, Deshpande PS, Shabaz M (2023) Multi-stage biomedical feature selection extraction algorithm for cancer detection. SN Appl. https://doi.org/10.1007/s42452-023-05339-2

    Article  Google Scholar 

  14. Kode H, Barkana BD (2024) Deep Learning- and Expert Knowledge-Based Feature Extraction and Performance Evaluation in Breast Histopathology Images. Cancers (Basel) 15(12):3075. https://doi.org/10.3390/cancers15123075

    Article  MATH  Google Scholar 

  15. Carvalho ED, Filho AOC, Silva RRV, Araújo FHD, Diniz JOB, Silva AC, Paiva AC, Gattass M (2020) Breast cancer diagnosis from histopathological images using textural features and CBIR. Artif Intell Med 105:101845

    Article  MATH  Google Scholar 

  16. Chandana CH, Krishna GB (2021) Breast cancer detection using random forest classifier. Materials Today: Proceedings

  17. Sahu Y, Tripathi A, Gupta RK (2023) A CNN-SVM based computer aided diagnosis of breast Cancer using histogram K-means segmentation technique. Multimedia Tools Applications 82:14055–14075. https://doi.org/10.1007/s11042-022-13807-x

    Article  MATH  Google Scholar 

  18. AlShorbajit I, Kachare P, Zogaan W (2022) Learning features using an optimized artificial neural network for breast cancer diagnosis. SN COMPUT SCI 3:229. https://doi.org/10.1007/s42979-022-01129-6

    Article  Google Scholar 

  19. Younis NK, Roumieh R, Bassil EP, Ghoubaira JA, Kobeissy F, Eid AH (2022) Nanoparticles: attractive tools to treat colorectal cancer. Seminars in Cancer Biology journal 86(2):1–13

    Google Scholar 

  20. Isosalo A, Inkinen SI, Turunen T, Ipatti PS, Reponen J, Nieminen MT (2023) Independent evaluation of a multi-view multi-task convolutional neural network breast cancer classification model using Finnish mammography screening data. Comput Biol Med 161:107023

    Article  Google Scholar 

  21. Kavitha T, Mathai PP, Karthikeyan C (2022) Deep Learning Based Capsule Neural Network Model for Breast Cancer Diagnosis Using Mammogram Images. Interdiscip Sci Comput Life Sci 14:113–129. https://doi.org/10.1007/s12539-021-00467-y

    Article  MATH  Google Scholar 

  22. Alickovic E, Subasi A (2020) Normalized Neural Networks for Breast Cancer Classification. In International Conference on Medical and Biological Engineering. pp. 519–524. https://doi.org/10.1007/978-3-030-17971-7-77

  23. Singh D, Nigam R, Mittal R (2023) Information retrieval using machine learning from breast cancer diagnosis. Multimedia Tools Applicatios 82:8581–8602. https://doi.org/10.1007/s11042-022-13550-3

    Article  Google Scholar 

  24. Chaieb R, Kalti K (2019) Feature subset selection for classification of malignant and benign breast masses in digital mammography. Pattern Anal Applic 22:803–829. https://doi.org/10.1007/s10044-018-0760-x

    Article  MathSciNet  MATH  Google Scholar 

  25. Gonzalez RC, Woods RE (2002) Digital image processing. Prentice- Hall Inc, New Jersey, pp 76–142

    MATH  Google Scholar 

  26. Galloway MM (1975) Texture classification using gray level run length. Computing Graph Image Process 4:172–179

    Article  MATH  Google Scholar 

  27. Tamura H, Mori S, Yamawaki T (1978) Texture features corresponding to visual perception. IEEE Trans Syst Man Cybernet Smc 8(6):460–473. https://doi.org/10.1109/TSMC.1978.4309999

    Article  Google Scholar 

  28. Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of large image data. IEEE Trans Pattern Anal Mach Intell (Spec Issue Digit Library) 18(8):837–842. https://doi.org/10.1109/34.531803

    Article  MATH  Google Scholar 

  29. Rodrigues JF Jr, Traina AJM, Traina C Jr (2005) Enhanced visual evaluation of feature extractors for image mining. In: The 3rd ACS/IEEE International Conference on Computer Systems and Applications

  30. Cheng HD, Shi XJ, Min R, Hu LM, Cai XP, Du HN (2006) Approaches for automated detection and classification of masses in mammograms. Pattern Recognit 39:646–668

    Article  MATH  Google Scholar 

  31. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intelligence 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159

    Article  MATH  Google Scholar 

  32. Ayadi W, Elhamzi W, Charfi I, Atri M (2019) A hybrid feature extraction approach for brain MRI classification based on Bag-of-words. Biomed Signal Process Control 48:144–152. https://doi.org/10.1016/j.bspc.2018.10.010

    Article  Google Scholar 

  33. Dhahbi S, Barhoumi W, Zagrouba E (2015) Breast cancer diagnosis in digitized mammograms using curvelet moments. Comput Biol Med 64:79–90

    Article  Google Scholar 

  34. Mojez H, Bidgoli AM, Javadi HHS (2023) Extended array model of star capacity-aware delay-based next controller placement problem for multiple controller failures in software-defined wide area networks. J Ambient Intell Human Comput 14:11039–11057. https://doi.org/10.1007/s12652-022-04384-w

    Article  MATH  Google Scholar 

  35. Mojez H, Bidgoli AM, Javadi HHS (2022) Star capacity-aware latency-based next controller placement problem with considering single controller failure in software-defined wide-area networks. J Supercomput 78:13205–13244. https://doi.org/10.1007/s11227-022-04360-3

    Article  MATH  Google Scholar 

  36. Manocha S, Girolami M (2007) An empirical analysis of the probabilistic k-nearest neighbor classifier. Pattern Recognit Lett 28:1818–1824. https://doi.org/10.1016/j.patrec.2007.05.018

    Article  MATH  Google Scholar 

  37. Suckling J (1994) The Mammographic Image Analysis Society Digital Mammogram Database. Exerpta Medica International Congress. pp. 375–378.

  38. Lee R, Gimenez F, Hoogi A (2017) A curated mammography data set for use in computer-aided detection and diagnosis research. Sci Data 4:170177. https://doi.org/10.1038/sdata.2017.177

    Article  Google Scholar 

  39. Al-Tam RM, Al-Hejri AM, Narangale SM, Samee NA, Mahmoud NF, Al-masni MA, Al-antari MA (2022) A hybrid workflow of residual convolutional transformer encoder for breast cancer classification using digital X-ray mammograms. Biomedicines 10(11):2971. https://doi.org/10.3390/biomedicines10112971

    Article  MATH  Google Scholar 

  40. Li Q (2007) Improvement of bias and generalizability for computer-aided diagnostic schemes. Computing Med Imaging Gr 31:338–345. https://doi.org/10.1016/j.compmedimag.2007.02.004

    Article  MATH  Google Scholar 

  41. Al-Hejri AM, Al-Tam RM, Fazea M, Sable AH, Lee S, Al-antari MA (2023) ETECADx: Ensemble Self-Attention Transformer Encoder for Breast Cancer Diagnosis Using Full-Field Digital X-ray Breast Images. Diagnostics 13(1):89. https://doi.org/10.3390/diagnostics13010089

    Article  Google Scholar 

  42. Archana R, Jeevaraj PSE (2024) Deep learning models for digital image processing: a review. Artif Intell Rev 57:11. https://doi.org/10.1007/s10462-023-10631-z

    Article  MATH  Google Scholar 

  43. Li L, Fan Y, Tse M, Lin KY (2020) A review of applications in federated learning. Comput Ind Eng 149:106854

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Dr. Hadi Mojez and Engineer Bahram Nazeri for their valuable guidance regarding MATLAB software training and simulation.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sector.

Author information

Authors and Affiliations

Authors

Contributions

All authors reviewed the manuscript.

Corresponding author

Correspondence to Shahin Akbarpour.

Ethics declarations

Conflict of interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rahmani, R., Akbarpour, S., Farzan, A. et al. A new intelligent hybrid feature extraction model for automating cancer diagnosis: a focus on breast cancer. J Supercomput 81, 651 (2025). https://doi.org/10.1007/s11227-025-07077-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-07077-1

Keywords