Skip to main content

New Approach to Support the Breast Cancer Diagnosis Process Using Frequent Pattern Growth and Stacking Based on Machine Learning Techniques

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2024 (IDEAL 2024)

Abstract

Breast cancer is one of the most common types of cancer in women, and its early detection significantly improves the survival rate. Although mammography is one of the least invasive and most widely used methods in the diagnostic process, its complexity and subjectivity in medical interpretation present significant challenges. In this article, we propose a new approach that supports the breast cancer diagnosis process by assisting in the classification of mammography images as malignant or benign, or through the BIRADS system. Our proposal consists of two phases. Initially, we implemented the FP-Growth algorithm on patients’ clinical data, analyzing variables such as age and sex to identify frequent patterns. This allows us to explore, group, and visually characterize shared findings and trends among clinical data, which is useful for doctors when creating risk groups or establishing a pre-diagnosis based on the patient’s profile. In this phase, we also prepared the images for training the different models. Subsequently, we combined the strengths of two models through stacking: the Random Forest (RF) model and Convolutional Neural Networks (CNN) with knowledge transfer, to improve image classification and diagnosis. We also explored other methods such as CNN and Support Vector Machine (SVM) to compare the accuracy of the proposed methodology against conventional techniques. The developed models were trained using public datasets: “The Chinese Mammography Database” [2] and “The INbreast database” [3]. The accuracy of the method is evaluated using various classification-related metrics, such as Accuracy, Precision, F1 Score, and Recall. The results show that combining base models using a stacking strategy achieves significantly superior performance compared to individual models, with ideal scores in accuracy, recall, and F1 score using k-fold cross-validation in the meta-model. These excellent results suggest that combining multiple base models more effectively captures the underlying complexities and patterns in the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. American Cancer Society. Breast cancer facts and figures 2021–2022 (2022). https://www.cancer.org/es/cancer/prevencion-del-riesgo/entender-el-riesgo-de-cancer/cancer-datos-factuales/informacion-sobre-el-cancer-para-mujeres.html

  2. Cui, C., et al.: Chinese mammography database (CMMD): a biopsy-confirmed mammography database online for automatic breast diagnosis. Cancer Imaging Archive (2021). https://doi.org/10.7937/tcia.eqde-4b16

  3. Holeček, M.: InBreast [Conjunto de datos] (2020). https://www.kaggle.com/datasets/martholi/inbreast

  4. Hurtado, R., Guzmán, S., Muñoz, A.: An architecture and a new deep learning method for head and neck cancer prognosis by analyzing serial positron emission tomography images. In: Naiouf, M., Rucci, E., Chichizola, F., De Giusti, L. (eds.) JCC-BD &ET 2023. Communications in Computer and Information Science, vol. 1828, pp. 129–140. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-40942-4_10

    Chapter  Google Scholar 

  5. Huang, M.-L., Lin, T.-Y.: Dataset of breast mammography images with masses. Data Brief 31(105928), 105928 (2020). https://doi.org/10.1016/j.dib.2020.105928

    Article  Google Scholar 

  6. Sanmartín, J., Azuero, P., Hurtado, R.: A modern approach to osteosarcoma tumor identification through integration of FP-growth, transfer learning and stacking model. In: Rocha, Á., Ferrás, C., Hochstetter Diez, J., Diéguez Rebolledo, M. (eds.) ICITS 2024. LNNS, vol. 932, pp. 298–307. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-54235-0_28

    Chapter  Google Scholar 

  7. Zhang, Y., et al.: Deep learning-based automatic diagnosis of breast cancer on MRI using mask R-CNN for detection followed by ResNet50 for classification. Acad. Radiolo. 30(Supplement 2), S161–S171 (2023). https://doi.org/10.1016/j.acra.2022.12.038. ISSN 1076-6332

  8. Caruana, R., et al.: Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015). https://doi.org/10.1145/2783258.2788613

  9. Huang, Y.: Prediction of breast cancer via deep learning. In: Patnaik, S., Kountchev, R., Tai, Y., Kountcheva, R. (eds.) 3D Imaging—Multidimensional Signal Processing and Deep Learning. Smart Innovation, Systems and Technologies, vol. 349, pp. 87–97. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-1230-8_8

    Chapter  Google Scholar 

  10. Novillo, E., Montesdeoca, M., Hurtado, R.: Cutting-edge advanced machine learning model for enhanced breast cancer diagnostics. In: Yang, X.S., Sherratt, S., Dey, N., Joshi, A. (eds.) ICICT 2024. LNNS, vol. 1003, pp. 463–472. Springer, Singapore (2024). https://doi.org/10.1007/978-981-97-3302-6_37

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Remigio Hurtado .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sanmartín, J., Azuero, P., Hurtado, R. (2025). New Approach to Support the Breast Cancer Diagnosis Process Using Frequent Pattern Growth and Stacking Based on Machine Learning Techniques. In: Julian, V., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2024. IDEAL 2024. Lecture Notes in Computer Science, vol 15347. Springer, Cham. https://doi.org/10.1007/978-3-031-77738-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-77738-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-77737-0

  • Online ISBN: 978-3-031-77738-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics