Skip to main content

A Cluster-Based Machine Learning Model for Large Healthcare Data Analysis

  • Conference paper
  • First Online:
Big Data Innovations and Applications (Innovate-Data 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1054))

Included in the following conference series:

Abstract

There is huge growth in the amount of patient survey data being generated in healthcare industries and hospitals. Curse of dimensionality is a barrier to extracting useful information from patient survey data which can help in the treatment and care of patients. It is paramount to have methods to find importance of features based on such huge volumes of stored information for the desired outputs. The health-related quality of life (HRQOL) is a powerful paradigm to help reaching such a desired output, measuring as patient satisfaction. In such scenarios, it is difficult to investigate the features, out of such high-dimensional data, that could best represent desired output and explain them so that such features can be used in the future at the point f care. In this paper we propose a Cluster-based Random Forest (CB-RF) method to particularly exploit the most important features for the desired output, which is Expanded Prostate Index Composite-26 (EPIC-26) domain scores. EPIC-26 is being used for assessing a range of HRQOL issues related to the diagnosis and treatment of prostate cancer. Different feature extraction methods are applied to extract features and the best method is the proposed CB-RF model which could find the most important features (10 or less) out of over 1500 features that can be used to accurately estimate patient with their EPIC-26 values with on average 85% coefficient of correlation between predicted and observed values of real dataset including 5093 patients.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. APCARI: Home-apcari. https://apcari.ca/

  2. Basch, E., et al.: Adverse symptom event reporting by patients vs clinicians: relationships with clinical outcomes. J. Natl. Cancer Inst. 101(23), 1624–1632 (2009)

    Article  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  4. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman and Hall, New York (1984)

    MATH  Google Scholar 

  5. Canadian Cancer Society: Prostate cancer statistics - Canadian Cancer Society. http://www.cancer.ca/en/cancer-information/cancer-type/prostate/statistics/?region=ab

  6. Chan, J.C.W., Paelinckx, D.: Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sens. Environ. 112(6), 2999–3011 (2008)

    Article  Google Scholar 

  7. Garson, G.D.: Interpreting neural-network connection weights. AI Expert 6(4), 46–51 (1991)

    Google Scholar 

  8. Gedeon, T.D.: Data mining of inputs: analysing magnitude and functional measures. Int. J. Neural Syst. 8(02), 209–218 (1997)

    Article  Google Scholar 

  9. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  10. H2O.ai: Home - h2o.ai. https://www.h2o.ai/

  11. Harris, P.A., Taylor, R., Thielke, R., Payne, J., Gonzalez, N., Conde, J.G.: Research electronic data capture (REDcap)–a metadata-driven methodology and workflow process for providing translational research informatics support. J. Biomed. Inform. 42(2), 377–381 (2009)

    Article  Google Scholar 

  12. Henry, J., Pylypchuk, Y., Searcy, T., Patel, V.: Adoption of electronic health record systems among us non-federal acute care hospitals: 2008–2015. ONC Data Brief 35, 1–9 (2016)

    Google Scholar 

  13. Herschorn, S., Gajewski, J., Schulz, J., Corcos, J.: A population-based study of urinary symptoms and incontinence: the Canadian urinary bladder survey. BJU Int. 101(1), 52–58 (2008)

    Google Scholar 

  14. Korfage, I.J., Essink-Bot, M.L., Janssens, A.C.J.W., Schröder, F.H., De Koning, H.J.: Anxiety and depression after prostate cancer diagnosis and treatment: 5-year follow-up. Br. J. Cancer 94(8), 1093 (2006)

    Article  Google Scholar 

  15. Memorial Sloan Kettering Cancer Center: Prostate cancer nomograms | memorial sloan kettering cancer center. https://www.mskcc.org/nomograms/prostate

  16. Michaelson, M.D., Cotter, S.E., Gargollo, P.C., Zietman, A.L., Dahl, D.M., Smith, M.R.: Management of complications of prostate cancer treatment. CA: A Cancer J. Clin. 58(4), 196–213 (2008)

    Google Scholar 

  17. Office of National Coordinator: Office of the national coordinator for health information technology (2016). https://dashboard.healthit.gov/quickstats/pages/FIG-Hospital-Progress-to-Meaningful-Use-by-size-practice-setting-area-type.php

  18. Ng, A.: Clustering with the k-means algorithm. Machine Learning (2012)

    Google Scholar 

  19. Rosenblatt, F.: Principles of neurodynamics. Perceptrons and the theory of brain mechanisms. Cornell Aeronautical Lab Inc., Buffalo, NY (1961)

    Google Scholar 

  20. Sanda, M., Wei, J., Litwin, M.: Scoring instructions for the expanded prostate cancer index composite short form (EPIC-26). https://medicine.umich.edu/sites/default/files/content/downloads.Scoring%20Instructions%20for%20the%20EPIC%2026

  21. Stokes, M.E., Black, L., Benedict, A., Roehrborn, C.G., Albertsen, P.: Long-term medical-care costs related to prostate cancer: estimates from linked seer-medicare data. Prostate Cancer Prostatic Dis. 13(3), 278 (2010)

    Article  Google Scholar 

  22. Szymanski, K.M., Wei, J.T., Dunn, R.L., Sanda, M.G.: Development and validation of an abbreviated version of the expanded prostate cancer index composite instrument for measuring health-related quality of life among prostate cancer survivors. Urology 76(5), 1245–1250 (2010)

    Article  Google Scholar 

  23. Velikova, G., et al.: Measuring quality of life in routine oncology practice improves communication and patient well-being: a randomized controlled trial. J. Clin. Oncol. 22(4), 714–724 (2004)

    Article  Google Scholar 

  24. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008)

    Google Scholar 

  25. Wei, J.T., Dunn, R.L., Litwin, M.S., Sandler, H.M., Sanda, M.G.: Development and validation of the expanded prostate cancer index composite (EPIC) for comprehensive assessment of health-related quality of life in men with prostate cancer. Urology 56(6), 899–905 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatemeh Sharifi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharifi, F., Mohammed, E., Crump, T., Far, B.H. (2019). A Cluster-Based Machine Learning Model for Large Healthcare Data Analysis. In: Younas, M., Awan, I., Benbernou, S. (eds) Big Data Innovations and Applications. Innovate-Data 2019. Communications in Computer and Information Science, vol 1054. Springer, Cham. https://doi.org/10.1007/978-3-030-27355-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27355-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27354-5

  • Online ISBN: 978-3-030-27355-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics