Stroke Prediction Using Machine Learning in a Distributed Environment

Rajora, Maihul; Rathod, Mansi; Naik, Nenavath Srinivas

doi:10.1007/978-3-030-65621-8_15

Stroke Prediction Using Machine Learning in a Distributed Environment

Maihul Rajora¹⁰,
Mansi Rathod¹⁰ &
Nenavath Srinivas Naik¹¹

Conference paper
First Online: 12 December 2020

1111 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12582))

Abstract

As with our changing lifestyles, certain biological dimensions of human lives are changing, making people more vulnerable towards stroke problem. Stroke is a medical condition in which parts of the brain do not get blood supply and a person attains stroke condition which can be fatal at times. As these stroke cases are increasing at an alarming rate, there is a need to analyze about factors affecting the growth rate of these cases. There is a need to design an approach to predict whether a person will be affected by stroke or not. This paper analyse different machine learning algorithms for better prediction of stroke problem. The algorithms used for analysis include Naive Bayes, Logistic Regression, Decision Tree, Random Forest and Gradient Boosting. We use dataset, which consists of 11 features such as age, gender, BMI (body mass index), etc. The analysis of these features is done using univariate and multivariate plots to observe the correlation between these different features. The analysis also shows how some features such as age, gender, smoking status are important factors and some feature like residence are of less importance. The proposed work is implemented using Apache Spark, which is a distributed general-purpose cluster-computing framework. The Receiver Operating Curve (ROC) of each algorithm is compared and it shows that the Gradient Boosting algorithm gives the best results with the ROC area score of 0.90. After fine-tuning, certain parameters in Gradient Boosting algorithm like optimization of the learning rate, depth of the tree, the number of trees and minimum sample split. The obtained ROC area score is 0.94. Other performance parameters such as Accuracy, Precision, Recall and F1 score values before fine-tuning are 0.867, 0.8673, 0.866 and 0.8659 respectively and after fine-tuning the values are 0.9449, 0.9453, 0.9449 and 0.9448 respectively.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bates, D.W., Saria, S., Ohno-Machado, L., Shah, A., Escobar, G.: Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 33(7), 1123–1131 (2014)
Article Google Scholar
Borthakur, D.: The Hadoop distributed file system: architecture and design. Hadoop Proj. Website 11(2007), 21 (2007)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chen, M., Hao, Y., Hwang, K., Wang, L., Wang, L.: Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5, 8869–8879 (2017)
Article Google Scholar
Donaldson, M.S., Corrigan, J.M., Kohn, L.T., et al.: To Err is Human: Building a Safer Health System, vol. 6. National Academies Press, Washington, D.C. (2000)
Google Scholar
Hafermehl, K.T.: High spatial resolution diffusion-weighted imaging (DWI) of ischemic stroke and transient ischemic attack (TIA) (2016)
Google Scholar
Haihong, E., Zhou, K., Song, M.: Spark-based machine learning pipeline construction method. In: 2019 International Conference on Machine Learning and Data Engineering (iCMLDE), pp. 1–6. IEEE (2019)
Google Scholar
Kansadub, T., Thammaboosadee, S., Kiattisin, S., Jalayondeja, C.: Stroke risk prediction model based on demographic data. In: 2015 8th Biomedical Engineering International Conference (BMEiCON), pp. 1–3. IEEE (2015)
Google Scholar
Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analysis. O’Reilly Media, Inc., Sebastopol (2015)
Google Scholar
Roger, V.L., et al.: Heart disease and stroke statistics—2012 update: a report from the American heart association. Circulation 125(1), e2 (2012). Writing Group Members
Article Google Scholar
Nwosu, C.S., Dev, S., Bhardwaj, P., Veeravalli, B., John, D.: Predicting stroke from electronic health records. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5704–5707. IEEE (2019)
Google Scholar
World Health Organization, et al.: Global status report on noncommunicable diseases 2014. No. WHO/NMH/NVI/15.1. World Health Organization (2014)
Google Scholar
Shanthi, D., Sahoo, G., Saravanan, N.: Designing an artificial neural network model for the prediction of thrombo-embolic stroke. Int. J. Biometric Bioinform. (IJBB) 3(1), 10–18 (2009)
Google Scholar
Singh, M.S., Choudhary, P., Thongam, K.: A comparative analysis for various stroke prediction techniques. In: Nain, N., Vipparthi, S.K., Raman, B. (eds.) CVIP 2019. CCIS, vol. 1148, pp. 98–106. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-4018-9_9
Chapter Google Scholar
Apache Spark: Apache spark: lightning-fast cluster computing, pp. 2168–7161 (2016). http://spark.apache.org
Subha, P.P., Geethakumari, S.M.P., Athira, M., Nujum, Z.T.: Pattern and risk factors of stroke in the young among stroke patients admitted in medical college hospital, Thiruvananthapuram. Ann. Indian Acad. Neurol. 18(1), 20 (2015)
Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2012)
Google Scholar
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Presented as Part of the 9th \(\{\)USENIX\(\}\) Symposium on Networked Systems Design and Implementation (\(\{\)NSDI\(\}\) 2012), pp. 15–28 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, IIIT Naya Raipur, Naya Raipur, India
Maihul Rajora & Mansi Rathod
Department of Computer Science and Engineering, IIIT Naya Raipur, Naya Raipur, India
Nenavath Srinivas Naik

Authors

Maihul Rajora
View author publications
You can also search for this author in PubMed Google Scholar
Mansi Rathod
View author publications
You can also search for this author in PubMed Google Scholar
Nenavath Srinivas Naik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mansi Rathod .

Editor information

Editors and Affiliations

Indian Institute of Technology Guwahati, Guwahati, India
Diganta Goswami
University of Engineering and Technology, Hanoi, Vietnam
Truong Anh Hoang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajora, M., Rathod, M., Naik, N.S. (2021). Stroke Prediction Using Machine Learning in a Distributed Environment. In: Goswami, D., Hoang, T.A. (eds) Distributed Computing and Internet Technology. ICDCIT 2021. Lecture Notes in Computer Science(), vol 12582. Springer, Cham. https://doi.org/10.1007/978-3-030-65621-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-65621-8_15
Published: 12 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65620-1
Online ISBN: 978-3-030-65621-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics