New Approach to Support the Breast Cancer Diagnosis Process Using Frequent Pattern Growth and Stacking Based on Machine Learning Techniques

Sanmartín, John; Azuero, Paulina; Hurtado, Remigio

doi:10.1007/978-3-031-77738-7_4

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15347))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

256 Accesses

Abstract

Breast cancer is one of the most common types of cancer in women, and its early detection significantly improves the survival rate. Although mammography is one of the least invasive and most widely used methods in the diagnostic process, its complexity and subjectivity in medical interpretation present significant challenges. In this article, we propose a new approach that supports the breast cancer diagnosis process by assisting in the classification of mammography images as malignant or benign, or through the BIRADS system. Our proposal consists of two phases. Initially, we implemented the FP-Growth algorithm on patients’ clinical data, analyzing variables such as age and sex to identify frequent patterns. This allows us to explore, group, and visually characterize shared findings and trends among clinical data, which is useful for doctors when creating risk groups or establishing a pre-diagnosis based on the patient’s profile. In this phase, we also prepared the images for training the different models. Subsequently, we combined the strengths of two models through stacking: the Random Forest (RF) model and Convolutional Neural Networks (CNN) with knowledge transfer, to improve image classification and diagnosis. We also explored other methods such as CNN and Support Vector Machine (SVM) to compare the accuracy of the proposed methodology against conventional techniques. The developed models were trained using public datasets: “The Chinese Mammography Database” [2] and “The INbreast database” [3]. The accuracy of the method is evaluated using various classification-related metrics, such as Accuracy, Precision, F1 Score, and Recall. The results show that combining base models using a stacking strategy achieves significantly superior performance compared to individual models, with ideal scores in accuracy, recall, and F1 score using k-fold cross-validation in the meta-model. These excellent results suggest that combining multiple base models more effectively captures the underlying complexities and patterns in the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

American Cancer Society. Breast cancer facts and figures 2021–2022 (2022). https://www.cancer.org/es/cancer/prevencion-del-riesgo/entender-el-riesgo-de-cancer/cancer-datos-factuales/informacion-sobre-el-cancer-para-mujeres.html
Cui, C., et al.: Chinese mammography database (CMMD): a biopsy-confirmed mammography database online for automatic breast diagnosis. Cancer Imaging Archive (2021). https://doi.org/10.7937/tcia.eqde-4b16
Holeček, M.: InBreast [Conjunto de datos] (2020). https://www.kaggle.com/datasets/martholi/inbreast
Hurtado, R., Guzmán, S., Muñoz, A.: An architecture and a new deep learning method for head and neck cancer prognosis by analyzing serial positron emission tomography images. In: Naiouf, M., Rucci, E., Chichizola, F., De Giusti, L. (eds.) JCC-BD &ET 2023. Communications in Computer and Information Science, vol. 1828, pp. 129–140. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-40942-4_10
Chapter Google Scholar
Huang, M.-L., Lin, T.-Y.: Dataset of breast mammography images with masses. Data Brief 31(105928), 105928 (2020). https://doi.org/10.1016/j.dib.2020.105928
Article Google Scholar
Sanmartín, J., Azuero, P., Hurtado, R.: A modern approach to osteosarcoma tumor identification through integration of FP-growth, transfer learning and stacking model. In: Rocha, Á., Ferrás, C., Hochstetter Diez, J., Diéguez Rebolledo, M. (eds.) ICITS 2024. LNNS, vol. 932, pp. 298–307. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-54235-0_28
Chapter Google Scholar
Zhang, Y., et al.: Deep learning-based automatic diagnosis of breast cancer on MRI using mask R-CNN for detection followed by ResNet50 for classification. Acad. Radiolo. 30(Supplement 2), S161–S171 (2023). https://doi.org/10.1016/j.acra.2022.12.038. ISSN 1076-6332
Caruana, R., et al.: Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015). https://doi.org/10.1145/2783258.2788613
Huang, Y.: Prediction of breast cancer via deep learning. In: Patnaik, S., Kountchev, R., Tai, Y., Kountcheva, R. (eds.) 3D Imaging—Multidimensional Signal Processing and Deep Learning. Smart Innovation, Systems and Technologies, vol. 349, pp. 87–97. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-1230-8_8
Chapter Google Scholar
Novillo, E., Montesdeoca, M., Hurtado, R.: Cutting-edge advanced machine learning model for enhanced breast cancer diagnostics. In: Yang, X.S., Sherratt, S., Dey, N., Joshi, A. (eds.) ICICT 2024. LNNS, vol. 1003, pp. 463–472. Springer, Singapore (2024). https://doi.org/10.1007/978-981-97-3302-6_37
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Politécnica Salesiana, Cuenca, Ecuador
John Sanmartín, Paulina Azuero & Remigio Hurtado

Authors

John Sanmartín
View author publications
You can also search for this author in PubMed Google Scholar
Paulina Azuero
View author publications
You can also search for this author in PubMed Google Scholar
Remigio Hurtado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Remigio Hurtado .

Editor information

Editors and Affiliations

Technical University of Valencia, Valencia, Valencia, Spain
Vicente Julian
Technical University of Madrid, Madrid, Spain
David Camacho
The University of Manchester, Manchester, UK
Hujun Yin
Universitat Politècnica de València, Valencia, Valencia, Spain
Juan M. Alberola
University of Evora, Evora, Portugal
Vitor Beires Nogueira
Universidade do Minho, Braga, Portugal
Paulo Novais
University of Huelva, Huelva, Spain
Antonio Tallón-Ballesteros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sanmartín, J., Azuero, P., Hurtado, R. (2025). New Approach to Support the Breast Cancer Diagnosis Process Using Frequent Pattern Growth and Stacking Based on Machine Learning Techniques. In: Julian, V., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2024. IDEAL 2024. Lecture Notes in Computer Science, vol 15347. Springer, Cham. https://doi.org/10.1007/978-3-031-77738-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-77738-7_4
Published: 14 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77737-0
Online ISBN: 978-3-031-77738-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

New Approach to Support the Breast Cancer Diagnosis Process Using Frequent Pattern Growth and Stacking Based on Machine Learning Techniques