Abstract
Throughout history, people’s health has been linked to internal and external factors that influence their socio-economic environment. A clear example is breast cancer, which has its origins in risk factors related to physical inactivity, weight gain and alcohol consumption, among others. The majority of predicted cases are female and a small proportion are male. The penetration of technology in most sciences and fields of work has increased the positive progress in solving complex problems. Big data applied to health allows the discovery of relevant information derived from data related to diseases, prognoses and treatments. The main objective of this research is to determine the diagnosis of breast cancer based on the results of classification models applied to genomic data. The research methodology will be quantitative, with measurable and verifiable results. The development methodology used is a modification of the incremental methodology, with flexible steps to verify and modify the results obtained in each activity. The experimentation tool used is RStudio together with the R programming language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agresti, A.: An introduction to categorical data analysis: Second edition. pp. 1–356 (8 2006). https://doi.org/10.1002/0470114754, https://onlinelibrary.wiley.com/doi/book/10.1002/0470114754
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Cervantes, M.: Salud y enfermedad, una realidad compleja. Contribuciones desde Coatepec pp. 101–116 (2011). https://revistacoatepec.uaemex.mx/article/view/218/213
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
European Commission: Communication from the commission to the council and the European parliament: ehealth action plan 2004-2011. Official Journal of the European Union C 229(5), 1–35 (2004). https://ec.europa.eu/information_society/doc/qualif/health/COM_2004_0356_F_EN_ACTE.pdf
Giaquinto, A.N., et al.: Breast cancer statistics, 2022. Cancer J. Clin. 72, 524–541 (2022). https://doi.org/10.3322/CAAC.21754, https://acsjournals.onlinelibrary.wiley.com/doi/epdf/10.3322/caac.21754
Larose, D.T., Larose, C.D.: K-nearest neighbor algorithm (2014). https://onlinelibrary.wiley.com/doi/10.1002/0471687545.ch5
Li, S.Z., Jain, A. (eds.): LDA (Linear Discriminant Analysis), pp. 899–899. Springer US, Boston, MA (2009). https://doi.org/10.1007/978-0-387-73003-5_349
Naseem, U., et al.: An automatic detection of breast cancer diagnosis and prognosis based on machine learning using ensemble of classifiers. IEEE Access 10, 78242–78252 (2022). https://doi.org/10.1109/ACCESS.2022.3174599
Organización Panamericana de la Salud (OPS): Cáncer de mama. https://www.paho.org/es/temas/cancer-mama (2021). Accessed 11 Aug 2023
Patrício, M., et al.: Using resisting, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer 18, 1–18 (2018). https://doi.org/10.1186/s12885-017-3877-1
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009). https://doi.org/10.4249/scholarpedia.1883
Ren, J., Lee, S.D., Chen, X., Kao, B., Cheng, R., Cheung, D.: Naive Bayes classification of uncertain data. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 944–949 (2009). https://doi.org/10.1109/ICDM.2009.90
Scott, A.J., Hosmer, D.W., Lemeshow, S.: Applied logistic regression. Biometrics 47, 1632 (1991). https://doi.org/10.2307/2532419
de Salud, S., de México, G.: Información estadística cáncer de mama. https://www.gob.mx/salud%7Ccnegsr/acciones-y-programas/informacion-estadistica-cancer-de-mama (2016). Accessed 11 Aug 2023
Sen, P.C., Hajra, M., Ghosh, M.: Supervised classification algorithms in machine learning: a survey and review. Adv. Intell. Syst. Comput. 937, 99–111 (2020)
Si, S., Zhang, H., Keerthi, S.S., Mahajan, D., Dhillon, I.S., Hsieh, C.J.: Gradient boosted decision trees for high dimensional sparse output. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 3182–3190. PMLR, 06–11 August 2017. https://proceedings.mlr.press/v70/si17a.html
Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018). https://doi.org/10.1016/J.PROCS.2018.05.122
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 1, 45–66 (2002). https://doi.org/10.1162/153244302760185243
Zhou, Z.: Breast cancer diagnosis with machine learning. Highlights. Sci. Eng. Technol. 9, 73–75 (2022). https://doi.org/10.54097/HSET.V9I.1718
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chichande, B.S.C., Pino, A.V., Ordoñez, J.P. (2025). Implementation of Classification Algorithms on Genomic Data in Order to Determine the Diagnosis of Patients at Risk of Developing Breast Cancer. In: Guarda, T., Portela, F., Gatica, G. (eds) Advanced Research in Technologies, Information, Innovation and Sustainability. ARTIIS 2024. Communications in Computer and Information Science, vol 2346. Springer, Cham. https://doi.org/10.1007/978-3-031-83210-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-83210-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-83209-3
Online ISBN: 978-3-031-83210-9
eBook Packages: Computer ScienceComputer Science (R0)