Skip to main content

Stroke Prediction Using Machine Learning in a Distributed Environment

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12582))

Abstract

As with our changing lifestyles, certain biological dimensions of human lives are changing, making people more vulnerable towards stroke problem. Stroke is a medical condition in which parts of the brain do not get blood supply and a person attains stroke condition which can be fatal at times. As these stroke cases are increasing at an alarming rate, there is a need to analyze about factors affecting the growth rate of these cases. There is a need to design an approach to predict whether a person will be affected by stroke or not. This paper analyse different machine learning algorithms for better prediction of stroke problem. The algorithms used for analysis include Naive Bayes, Logistic Regression, Decision Tree, Random Forest and Gradient Boosting. We use dataset, which consists of 11 features such as age, gender, BMI (body mass index), etc. The analysis of these features is done using univariate and multivariate plots to observe the correlation between these different features. The analysis also shows how some features such as age, gender, smoking status are important factors and some feature like residence are of less importance. The proposed work is implemented using Apache Spark, which is a distributed general-purpose cluster-computing framework. The Receiver Operating Curve (ROC) of each algorithm is compared and it shows that the Gradient Boosting algorithm gives the best results with the ROC area score of 0.90. After fine-tuning, certain parameters in Gradient Boosting algorithm like optimization of the learning rate, depth of the tree, the number of trees and minimum sample split. The obtained ROC area score is 0.94. Other performance parameters such as Accuracy, Precision, Recall and F1 score values before fine-tuning are 0.867, 0.8673, 0.866 and 0.8659 respectively and after fine-tuning the values are 0.9449, 0.9453, 0.9449 and 0.9448 respectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bates, D.W., Saria, S., Ohno-Machado, L., Shah, A., Escobar, G.: Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 33(7), 1123–1131 (2014)

    Article  Google Scholar 

  2. Borthakur, D.: The Hadoop distributed file system: architecture and design. Hadoop Proj. Website 11(2007), 21 (2007)

    Google Scholar 

  3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  4. Chen, M., Hao, Y., Hwang, K., Wang, L., Wang, L.: Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5, 8869–8879 (2017)

    Article  Google Scholar 

  5. Donaldson, M.S., Corrigan, J.M., Kohn, L.T., et al.: To Err is Human: Building a Safer Health System, vol. 6. National Academies Press, Washington, D.C. (2000)

    Google Scholar 

  6. Hafermehl, K.T.: High spatial resolution diffusion-weighted imaging (DWI) of ischemic stroke and transient ischemic attack (TIA) (2016)

    Google Scholar 

  7. Haihong, E., Zhou, K., Song, M.: Spark-based machine learning pipeline construction method. In: 2019 International Conference on Machine Learning and Data Engineering (iCMLDE), pp. 1–6. IEEE (2019)

    Google Scholar 

  8. Kansadub, T., Thammaboosadee, S., Kiattisin, S., Jalayondeja, C.: Stroke risk prediction model based on demographic data. In: 2015 8th Biomedical Engineering International Conference (BMEiCON), pp. 1–3. IEEE (2015)

    Google Scholar 

  9. Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analysis. O’Reilly Media, Inc., Sebastopol (2015)

    Google Scholar 

  10. Roger, V.L., et al.: Heart disease and stroke statistics—2012 update: a report from the American heart association. Circulation 125(1), e2 (2012). Writing Group Members

    Article  Google Scholar 

  11. Nwosu, C.S., Dev, S., Bhardwaj, P., Veeravalli, B., John, D.: Predicting stroke from electronic health records. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5704–5707. IEEE (2019)

    Google Scholar 

  12. World Health Organization, et al.: Global status report on noncommunicable diseases 2014. No. WHO/NMH/NVI/15.1. World Health Organization (2014)

    Google Scholar 

  13. Shanthi, D., Sahoo, G., Saravanan, N.: Designing an artificial neural network model for the prediction of thrombo-embolic stroke. Int. J. Biometric Bioinform. (IJBB) 3(1), 10–18 (2009)

    Google Scholar 

  14. Singh, M.S., Choudhary, P., Thongam, K.: A comparative analysis for various stroke prediction techniques. In: Nain, N., Vipparthi, S.K., Raman, B. (eds.) CVIP 2019. CCIS, vol. 1148, pp. 98–106. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-4018-9_9

    Chapter  Google Scholar 

  15. Apache Spark: Apache spark: lightning-fast cluster computing, pp. 2168–7161 (2016). http://spark.apache.org

  16. Subha, P.P., Geethakumari, S.M.P., Athira, M., Nujum, Z.T.: Pattern and risk factors of stroke in the young among stroke patients admitted in medical college hospital, Thiruvananthapuram. Ann. Indian Acad. Neurol. 18(1), 20 (2015)

    Google Scholar 

  17. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2012)

    Google Scholar 

  18. Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Presented as Part of the 9th \(\{\)USENIX\(\}\) Symposium on Networked Systems Design and Implementation (\(\{\)NSDI\(\}\) 2012), pp. 15–28 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mansi Rathod .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rajora, M., Rathod, M., Naik, N.S. (2021). Stroke Prediction Using Machine Learning in a Distributed Environment. In: Goswami, D., Hoang, T.A. (eds) Distributed Computing and Internet Technology. ICDCIT 2021. Lecture Notes in Computer Science(), vol 12582. Springer, Cham. https://doi.org/10.1007/978-3-030-65621-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65621-8_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65620-1

  • Online ISBN: 978-3-030-65621-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics