Abstract
Breast cancer is one of the most common types of cancer among Jordanian women. Recently, healthcare organizations in Jordan have adopted electronic health records, which makes it feasible for researchers to access huge amounts of medical records. The goal of this study is to predict the recurrence of breast cancer using machine learning algorithms. We developed a Natural Language Processing algorithm to extract key features about breast cancer from medical records at King Abdullah University Hospital (KAUH) in Jordan. We integrated these features and built a medical dictionary for breast cancer. We applied multiple machine learning algorithms on the extracted information to predict the recurrence of breast cancer in patients. Our predicted results were approved by specialist physicians from KAUH. The medical dictionary was created and the accuracy of the data had been validated by targeted users (physicians, researchers). This dictionary can be used for personalized medicine. All machine learning algorithms had a nice performance. OneR algorithm has the best balance of sensitivity and specificity. The medical dictionary will help physicians to choose the most appropriate treatment plan in a short time. The machine learning prediction results can help physicians to make the correct clinical decision regarding their treatment options.
Similar content being viewed by others
References
Abdel-Razeq H, Attiga F, Mansour A (2015) Cancer care in Jordan. Hematol Oncol Stem Cell Ther 8(2):64–70
Abualigah L (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Studies in computational intelligence. Springer International Publishing, Berlin
Abualigah L (2020) Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications. Neural Comput Applic 32:12381–12401
Abualigah L, Khader A (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:4773–4795. https://doi.org/10.1007/s11227-017-2046-2
Ahmad L, Eshlaghy A, Poorebrahimi A, Ebrahimi M, Razavi A (2013) Using three machine learning techniques for predicting breast cancer recurrence. J Health Med Inform 4(2). https://doi.org/10.4172/2157-7420.1000124
Al-Adwan A, Berger H (2015) Exploring physicians’ behavioural intention toward the toward the adoption of electronic health records. Int J Healthc Technol. Manag 15(2):89–111
Alzu’bi A, Zhou L, Watzlaf V (2014) Personal genomic information management and personalized medicine: challenges, current solutions, and roles of HIM professionals. Perspect Health Inf Manag 11(Spring):1c
Amin M et al (2017) The eighth edition ajcc cancer staging manual: Continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA Cancer J Clin 67(2):93–99
Bagaria S et al (2014) Personalizing breast cancer staging by the inclusion of ER, PR, and HER2. JAMA Surg 149(2):125–9
Bakre M et al (2019) Clinical validation of an immunohistochemistry-based canassist-breast test for distant recurrence prediction in hormone receptor-positive breast cancer patients. Cancer Med 8(4):1755–1764
Battineni G et al (2020) Applications of machine learning predictive models in the chronic disease diagnosis. J Perinat Med 10(2):21
Boeri C et al (2020) Machine Learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer Med 9(9):3234–3243
Chae S, Kwon S, Lee D (2018) Predicting infectious disease using deep learning and big data. Int J Environ Res Public Health 15(8):1596
Chang C, Chen S (2019) Developing a novel machine learning-based classification scheme for predicting spcs in breast cancer survivors. Front Genet 10(848). https://doi.org/10.3389/fgene.2019.00848
Chung S et al (2019) Prognostic factors predicting recurrence in in- vasive breast cancer: An analysis of radiological and clinicopathological factors. Asian J Surg 42(5):613–620
Dahiwade D, Patle G, Meshram E (2019) Designing disease prediction model using machine learning approach, in 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). IEEE, New York
Dawes T et al (2017) Machine learning of three-dimensional right ventricular motion enables outcome prediction in pulmonary hypertension: a cardiac MR imaging study. Radiology 283(2):381–390
Eidemüller M et al (2019) Long-term health risk after breast-cancer radiotherapy: overview of passos methodology and software. Radiat Prot Dosim 183:259–263
Falck A, Fernö M, Bendahl P, Rydén L (2013) St Gallen molecular subtypes in primary breast cancer and matched lymph node metastases–aspects on distribution and prognosis for patients with luminal A tumours: results from a prospective randomised trial. BMC Cancer 13(558). https://doi.org/10.1186/1471-2407-13-558
Feliciano E et al (2017) Body mass index, pam50 subtype, recurrence, and survival among patients with nonmetastatic breast cancer. Cancer 123(13):2535–2542
Filipits M et al (2011) A new molecular predictor of distant recurrence in er-positive, her2-negative breast cancer adds independent information to conventional clinical risk factors. Clin Cancer Res 17(18):6012–6020
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA (2016) Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 23(5):1007–1015
Gerhard W. The diagnosis, pathology, and treatment of the diseases of the chestchest. Philadelphia: E. Barrington and G.D. Haswell, 1850. http://resource.nlm.nih.gov/101505669
Guo J, Sun Z, Tang H, Jia X, Wang S, Yan X, Ye G, Wu G (2016) Hybrid optimization algorithm of particle swarm optimization and cuckoo search for preventive maintenance period optimization. Discret Dyn Nat Soc. https://doi.org/10.1155/2016/1516271
Hardavella J et al (2017) Top tips to deal with challenging situations: doctor–patient interactions. Breathe 13(2):129–135
Hong W et al (2011) SVR with Hybrid chaotic immune algorithm for seasonal load demand forecasting. Energies 4:960–977
Huang E et al (2003) Gene expression predictors of breast cancer outcomes. Lancet 361(9369):1590–1596
Kundra H, Sadawarti H (2015) Hybrid algorithm of cuckoo search and particle swarm optimization for natural terrain feature extraction. Res J Inf Technol 7(1):58–69
Lafourcade A et al (2018) Factors associated with breast cancer recurrences or mortality and dynamic prediction of death using history of cancer recurrences: the french e3n cohort. BMC Cancer 18(1):171
Meric F et al (2003) Positive surgical margins and ipsilateral breast tumor recurrence predict disease-specific survival after breast-conserving therapy. Cancer 97(4):926–933
Meystre S, Haug P (2006) Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. J Biomed Inform 39(6):589–599
Partridge S et al (2005) MRI measurements of breast tumor volume predict response to neoadjuvant chemotherapy and recurrence-free survival. Am J Roentgenol 184(6):1774–1781
Sada Y et al (2016) Validation of case finding algorithms for hepatocellular cancer from administrative data and electronic health records using natural language processing. Med Care 54(2):e9-14
Sharma H, Rizvi M (2017) Prediction of heart disease using machine learning algorithms: A survey. Int J Recent Innov Trends Comput Commun 5(8):99–104
Shim H et al (2014) Breast cancer recurrence according to molecular subtype. Asian Pac J Cancer Prev 15(14):5539–44
Song W et al (2012) The risk factors influencing between the early and late recurrence in systemic recurrent breast cancer. J Breast Cancer 15(2):218–223
Stenkvist B et al (1982) Predicting breast cancer recurrence. Cancer 50(15):2884–2893
Tseng Y et al (2019) Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies. Int J Med Inform 128:79–86
Vinitha S, Hao Y, Hwang K, Wang Lu, Wang Li (2019) Disease prediction by machine learning over big data from healthcare communities. Comput Sci Eng 8(1). https://doi.org/10.1109/ACCESS.2017.2694446
Young I, Luz S, Lone N (2019) A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis. Int J Med Inform 132(103971). https://doi.org/10.1016/j.ijmedinf.2019.103971
Yousefi M et al (2018) Organ-specific metastasis of breast cancer: molecular and cellular mechanisms underlying lung metastasis. Cell Oncol 41(2):123–140
Zhang Z, Hong W, Li J (2020) Electric load forecasting by hybrid self-recurrent support vector regression model with variational mode decomposition and improved cuckoo search algorithm. IEEE Access 8:14642–14658
Zhou M et al (2016) Discovery of potential prognostic long non-coding rna biomarkers for predicting the risk of tumor recurrence of breast cancer patients. Sci Rep 6(3):1038
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Alzu’bi, A., Najadat, H., Doulat, W. et al. Predicting the recurrence of breast cancer using machine learning algorithms. Multimed Tools Appl 80, 13787–13800 (2021). https://doi.org/10.1007/s11042-020-10448-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10448-w