Skip to main content

Self-Service Data Science for Adverse Event Prediction in Electronic Healthcare Records

  • Conference paper
  • First Online:
Research and Innovation Forum 2020 (RIIFORUM 2020)

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Included in the following conference series:

  • 825 Accesses

Abstract

Healthcare is a data intensive industry in which data mining has a great potential for improving the wellbeing of patients. However, a multitude of barriers impedes the application of machine learning. This work focuses on medical adverse event prediction by domain experts. In this research we present AutoCrisp as a self-service data science prototype for multivariate sequential classification on electronic healthcare records to facilitate self-service data science by domain experts, without requiring any sophisticated data mining knowledge. We performed an empirical case study with the objective to predict bleedings with the use of AutoCrisp. Our results show that multivariate sequential classification for medical adverse event prediction can indeed be made accessible to healthcare professionals by providing appropriate tooling support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. W.A. Omta et al., HC StratoMineR: A web-based tool for the rapid analysis of high-content datasets. Assay Drug Dev. Technol. 14(8), 439–452 (2016). https://doi.org/10.1089/adt.2016.726

    Article  Google Scholar 

  2. D. Tomar, S. Agarwal, A survey on data mining approaches for healthcare. Int. J. Bio-Sci. Bio-Technol. 5(5), 241–266 (2013). https://doi.org/10.14257/ijbsbt.2013.5.5.25

    Article  Google Scholar 

  3. K. Srinivas, B. Rani, A. Govrdhan ‘Applications of data mining techniques in healthcare and prediction of heart attacks’. International Journal on Computer Science and Engineering (02 Jan 2010), pp. 250–255. 10.1.1.163.4924

    Google Scholar 

  4. M. Durairaj, V. Ranjani, Data mining applications in healthcare sector: A study. Int. J. Sci. Technol. 2(10), 29–35 (2013)

    Google Scholar 

  5. V. Marx, Biology: The big challenges of big data. Nature 498(7453), 255–260 (2013). https://doi.org/10.1038/498255a

    Article  Google Scholar 

  6. W. Raghupathi, V. Raghupathi, Big data analytics in healthcare: Promise and potential. Health Inform. Sci. Syst. 2(1), 3 (2014). https://doi.org/10.1186/2047-2501-2-3

    Article  Google Scholar 

  7. G. Neff, Why big data won’t cure us. Big Data 1(3), 117–123 (2013). https://doi.org/10.1089/big.2013.0029

    Article  Google Scholar 

  8. W.A. Omta et al., PurifyR: An R package for highly automated, reproducible variable extraction and standardization. Syst. Med. 3(1), 1–7 (2020). https://doi.org/10.1089/sysm.2019.0007

    Article  Google Scholar 

  9. T.H. Davenport, D.J. Patil, ‘Data scientist: The sexiest job of the 21st century: Meet the people who can coax treasure out of messy, unstructured data’, Harvard Business Review, (Oct 2012), p. 9. https://doi.org/10.1007/978-1-4302-4873-6_9

  10. I. Yoo et al., Data mining in healthcare and biomedicine: A survey of the literature. J. Med. Syst. 36(4), 2431–2448 (2012). https://doi.org/10.1007/s10916-011-9710-5

    Article  Google Scholar 

  11. M.F. Ghalwash, Z. Obradovic, Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinform. 13(1), 195 (2012). https://doi.org/10.1186/1471-2105-13-195

    Article  Google Scholar 

  12. M. Hauskrecht et al., Conditional outlier detection for clinical alerting, AMIA … annual symposium proceedings/AMIA symposium. AMIA Symp. 2010, 286–290 (2010)

    Google Scholar 

  13. D. Kagen, C. Theobald, M. Freeman ‘CLINICIAN’S CORNER Risk prediction models for hospital readmission a systematic review’. 306 (15) (2015)

    Google Scholar 

  14. I. Batal et al., ‘Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data’, Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (2012) pp. 280–288 https://doi.org/10.1002/oby.21042.Prevalence

  15. C. Rubinos, S. Ruland, ‘Neurologic complications in the intensive care unit’. Curr. Neurol. Neurosci. Rep. 16(6). (2016) https://doi.org/10.1007/s11910-016-0651-8.

  16. M. Hall et al., ‘The WEKA Data Mining Software: An Update the WEKA Data Mining Software: An Update’ (Nov 2008)

    Google Scholar 

  17. K. Chauhan et al., ‘Automated machine learning: The new wave of machine learning’, in 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA). (IEEE, 2020) pp. 205–212.

    Google Scholar 

  18. R. Ooms, M. Spruit, ‘Self-Service Data Science in Healthcare with Automated Machine Learning’ (2020), pp. 1–18. https://doi.org/10.3390/app10092992

  19. R. Wieringa, ‘Design Science as Nested Problem Solving’, International Conference on Design Science Research in Information Systems and Technology, (2009) pp. 1–12. https://doi.org/10.1145/1555619.1555630

  20. D.M. Rubio et al., ‘Defining translational research: implications for training’, Academic medicine: Journal of the association of American medical colleges. NIH Public Access 85(3), 470 (2010)

    Google Scholar 

  21. M. Spruit, R. Jagesar, ‘Power to the People!—Meta-algorithmic modelling in applied data science’, Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 1(Ic3k) (2016), pp. 400–406. https://doi.org/10.5220/0006081604000406

  22. M. Spruit, M. Lytras, ‘Applied data science in patient-centric healthcare: Adaptive analytic systems for empowering physicians and patients’, Telematics and Informatics (2018), pp. 643–653. https://doi.org/10.1016/j.tele.2018.04.002

  23. M.R. Spruit, T. Dedding, D. Vijlbrief, ‘Self-service data science for healthcare professionals: A data preparation approach’, in Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020)—Volume 5: HEALTHINF. (Valetta: ScitePress, 2020), pp. 724–734

    Google Scholar 

  24. C. Baru et al., Report of the First Translational Data Science (TDS) Workshop (Illinois, Chicago, 2017)

    Google Scholar 

  25. J. Demšar et al., Orange: Data mining toolbox in python. J. Mach. Learn. Res. 14, 23492353 (2013)

    Google Scholar 

  26. R. Wirth, J. Hipp, ‘CRISP-DM : Towards a Standard Process Model for Data Mining’, Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, (24959), (2000) pp. 29–39. https://doi.org/10.1.1.198.5133

    Google Scholar 

  27. J. Sun, K.R. Chandan, ‘Big Data Analytics for Healthcare’, Kdd. (2013)

    Google Scholar 

  28. C.S. Kruse et al., Challenges and opportunities of big data in health care: A systematic review. JMIR Med. Inform. 4(4), e38 (2016). https://doi.org/10.2196/medinform.5359

    Article  Google Scholar 

  29. S.R. Sukumar, N. Ramachandran, R.K. Ferrell, ‘Data Quality Challenges in Healthcare Claims Data: Experiences and Remedies’ (April 2014), (2016)

    Google Scholar 

  30. H.V.V. Jagadish et al., Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014). https://doi.org/10.1145/2611567

    Article  Google Scholar 

  31. B.D. Fulcher, N.S. Jones, Highly comparative feature-based time-series classification. IEEE Trans. Knowl. Data Eng. 26(12), 3026–3037 (2014). https://doi.org/10.1109/TKDE.2014.2316504

    Article  Google Scholar 

  32. S. Van Buuren, K. Groothuis-Oudshoorn, ‘MICE: Multivariate imputation by chained equations in R’. J. Stat. Softw. VV(II), 1–68 (2010)

    Google Scholar 

  33. J. Honaker, G. King, M. Blackwell ‘Amelia II: A program for missing data, R package version 1.5., 2012’, Available at https://gking.harvard.edu/amelia/, (2012) pp. 1–116

  34. D.J. Stekhoven, P. Bühlmann, Missforest-Non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012). https://doi.org/10.1093/bioinformatics/btr597

    Article  Google Scholar 

  35. A. Nanopoulos, R.O.B. Alcock, Y. Manolopoulos, Feature-based classi cation of time-series data. Int. J. Comput. Res. 10(3) (2001)

    Google Scholar 

  36. B. Esmael et al., A Statistical Feature-Based Approach for Operations Recognition in Drilling Time Series, vol. 5 (2013) pp. 454–461

    Google Scholar 

  37. M.B. Kursa, W.R. Rudnicki, Feature selection with the Boruta package. J. Stat. Softw. 36(11), 1–13 (2010)

    Article  Google Scholar 

  38. R. Blagus, L. Lusa, Joint use of over-and under-sampling techniques and cross-validation for the development and assessment of prediction models. BMC Bioinform. 16(1), 1–10 (2015). https://doi.org/10.1186/s12859-015-0784-9

    Article  Google Scholar 

  39. N. Lunardon, G. Menardi, N. Torelli, ROSE: A package for binary imbalanced learning. R J. 6(June), 79–89 (2014)

    Article  Google Scholar 

  40. M. Bekkar, H.K. Djemaa, T.A. Alitouche, Evaluation measures for models assessment over imbalanced data sets. J. Inform. Eng. Appl. 3(10), 27–38 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Spruit .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Spruit, M., de Vries, N. (2021). Self-Service Data Science for Adverse Event Prediction in Electronic Healthcare Records. In: Visvizi, A., Lytras, M.D., Aljohani, N.R. (eds) Research and Innovation Forum 2020. RIIFORUM 2020. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-030-62066-0_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62066-0_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62065-3

  • Online ISBN: 978-3-030-62066-0

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics