Self-Service Data Science for Adverse Event Prediction in Electronic Healthcare Records

Spruit, Marco; de Vries, Niels

doi:10.1007/978-3-030-62066-0_39

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Included in the following conference series:

The International Research & Innovation Forum

825 Accesses

Abstract

Healthcare is a data intensive industry in which data mining has a great potential for improving the wellbeing of patients. However, a multitude of barriers impedes the application of machine learning. This work focuses on medical adverse event prediction by domain experts. In this research we present AutoCrisp as a self-service data science prototype for multivariate sequential classification on electronic healthcare records to facilitate self-service data science by domain experts, without requiring any sophisticated data mining knowledge. We performed an empirical case study with the objective to predict bleedings with the use of AutoCrisp. Our results show that multivariate sequential classification for medical adverse event prediction can indeed be made accessible to healthcare professionals by providing appropriate tooling support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Clinical Data Intelligence Project

Article 04 August 2015

Data Mining Electronic Health Records to Support Evidence-Based Clinical Decisions

Enhancing clinical data retrieval with Smart Watchers: a NiFi-based ETL pipeline for Elasticsearch queries

Article Open access 16 September 2024

References

W.A. Omta et al., HC StratoMineR: A web-based tool for the rapid analysis of high-content datasets. Assay Drug Dev. Technol. 14(8), 439–452 (2016). https://doi.org/10.1089/adt.2016.726
Article Google Scholar
D. Tomar, S. Agarwal, A survey on data mining approaches for healthcare. Int. J. Bio-Sci. Bio-Technol. 5(5), 241–266 (2013). https://doi.org/10.14257/ijbsbt.2013.5.5.25
Article Google Scholar
K. Srinivas, B. Rani, A. Govrdhan ‘Applications of data mining techniques in healthcare and prediction of heart attacks’. International Journal on Computer Science and Engineering (02 Jan 2010), pp. 250–255. 10.1.1.163.4924
Google Scholar
M. Durairaj, V. Ranjani, Data mining applications in healthcare sector: A study. Int. J. Sci. Technol. 2(10), 29–35 (2013)
Google Scholar
V. Marx, Biology: The big challenges of big data. Nature 498(7453), 255–260 (2013). https://doi.org/10.1038/498255a
Article Google Scholar
W. Raghupathi, V. Raghupathi, Big data analytics in healthcare: Promise and potential. Health Inform. Sci. Syst. 2(1), 3 (2014). https://doi.org/10.1186/2047-2501-2-3
Article Google Scholar
G. Neff, Why big data won’t cure us. Big Data 1(3), 117–123 (2013). https://doi.org/10.1089/big.2013.0029
Article Google Scholar
W.A. Omta et al., PurifyR: An R package for highly automated, reproducible variable extraction and standardization. Syst. Med. 3(1), 1–7 (2020). https://doi.org/10.1089/sysm.2019.0007
Article Google Scholar
T.H. Davenport, D.J. Patil, ‘Data scientist: The sexiest job of the 21st century: Meet the people who can coax treasure out of messy, unstructured data’, Harvard Business Review, (Oct 2012), p. 9. https://doi.org/10.1007/978-1-4302-4873-6_9
I. Yoo et al., Data mining in healthcare and biomedicine: A survey of the literature. J. Med. Syst. 36(4), 2431–2448 (2012). https://doi.org/10.1007/s10916-011-9710-5
Article Google Scholar
M.F. Ghalwash, Z. Obradovic, Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinform. 13(1), 195 (2012). https://doi.org/10.1186/1471-2105-13-195
Article Google Scholar
M. Hauskrecht et al., Conditional outlier detection for clinical alerting, AMIA … annual symposium proceedings/AMIA symposium. AMIA Symp. 2010, 286–290 (2010)
Google Scholar
D. Kagen, C. Theobald, M. Freeman ‘CLINICIAN’S CORNER Risk prediction models for hospital readmission a systematic review’. 306 (15) (2015)
Google Scholar
I. Batal et al., ‘Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data’, Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (2012) pp. 280–288 https://doi.org/10.1002/oby.21042.Prevalence
C. Rubinos, S. Ruland, ‘Neurologic complications in the intensive care unit’. Curr. Neurol. Neurosci. Rep. 16(6). (2016) https://doi.org/10.1007/s11910-016-0651-8.
M. Hall et al., ‘The WEKA Data Mining Software: An Update the WEKA Data Mining Software: An Update’ (Nov 2008)
Google Scholar
K. Chauhan et al., ‘Automated machine learning: The new wave of machine learning’, in 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA). (IEEE, 2020) pp. 205–212.
Google Scholar
R. Ooms, M. Spruit, ‘Self-Service Data Science in Healthcare with Automated Machine Learning’ (2020), pp. 1–18. https://doi.org/10.3390/app10092992
R. Wieringa, ‘Design Science as Nested Problem Solving’, International Conference on Design Science Research in Information Systems and Technology, (2009) pp. 1–12. https://doi.org/10.1145/1555619.1555630
D.M. Rubio et al., ‘Defining translational research: implications for training’, Academic medicine: Journal of the association of American medical colleges. NIH Public Access 85(3), 470 (2010)
Google Scholar
M. Spruit, R. Jagesar, ‘Power to the People!—Meta-algorithmic modelling in applied data science’, Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 1(Ic3k) (2016), pp. 400–406. https://doi.org/10.5220/0006081604000406
M. Spruit, M. Lytras, ‘Applied data science in patient-centric healthcare: Adaptive analytic systems for empowering physicians and patients’, Telematics and Informatics (2018), pp. 643–653. https://doi.org/10.1016/j.tele.2018.04.002
M.R. Spruit, T. Dedding, D. Vijlbrief, ‘Self-service data science for healthcare professionals: A data preparation approach’, in Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020)—Volume 5: HEALTHINF. (Valetta: ScitePress, 2020), pp. 724–734
Google Scholar
C. Baru et al., Report of the First Translational Data Science (TDS) Workshop (Illinois, Chicago, 2017)
Google Scholar
J. Demšar et al., Orange: Data mining toolbox in python. J. Mach. Learn. Res. 14, 23492353 (2013)
Google Scholar
R. Wirth, J. Hipp, ‘CRISP-DM : Towards a Standard Process Model for Data Mining’, Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, (24959), (2000) pp. 29–39. https://doi.org/10.1.1.198.5133
Google Scholar
J. Sun, K.R. Chandan, ‘Big Data Analytics for Healthcare’, Kdd. (2013)
Google Scholar
C.S. Kruse et al., Challenges and opportunities of big data in health care: A systematic review. JMIR Med. Inform. 4(4), e38 (2016). https://doi.org/10.2196/medinform.5359
Article Google Scholar
S.R. Sukumar, N. Ramachandran, R.K. Ferrell, ‘Data Quality Challenges in Healthcare Claims Data: Experiences and Remedies’ (April 2014), (2016)
Google Scholar
H.V.V. Jagadish et al., Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014). https://doi.org/10.1145/2611567
Article Google Scholar
B.D. Fulcher, N.S. Jones, Highly comparative feature-based time-series classification. IEEE Trans. Knowl. Data Eng. 26(12), 3026–3037 (2014). https://doi.org/10.1109/TKDE.2014.2316504
Article Google Scholar
S. Van Buuren, K. Groothuis-Oudshoorn, ‘MICE: Multivariate imputation by chained equations in R’. J. Stat. Softw. VV(II), 1–68 (2010)
Google Scholar
J. Honaker, G. King, M. Blackwell ‘Amelia II: A program for missing data, R package version 1.5., 2012’, Available at https://gking.harvard.edu/amelia/, (2012) pp. 1–116
D.J. Stekhoven, P. Bühlmann, Missforest-Non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012). https://doi.org/10.1093/bioinformatics/btr597
Article Google Scholar
A. Nanopoulos, R.O.B. Alcock, Y. Manolopoulos, Feature-based classi cation of time-series data. Int. J. Comput. Res. 10(3) (2001)
Google Scholar
B. Esmael et al., A Statistical Feature-Based Approach for Operations Recognition in Drilling Time Series, vol. 5 (2013) pp. 454–461
Google Scholar
M.B. Kursa, W.R. Rudnicki, Feature selection with the Boruta package. J. Stat. Softw. 36(11), 1–13 (2010)
Article Google Scholar
R. Blagus, L. Lusa, Joint use of over-and under-sampling techniques and cross-validation for the development and assessment of prediction models. BMC Bioinform. 16(1), 1–10 (2015). https://doi.org/10.1186/s12859-015-0784-9
Article Google Scholar
N. Lunardon, G. Menardi, N. Torelli, ROSE: A package for binary imbalanced learning. R J. 6(June), 79–89 (2014)
Article Google Scholar
M. Bekkar, H.K. Djemaa, T.A. Alitouche, Evaluation measures for models assessment over imbalanced data sets. J. Inform. Eng. Appl. 3(10), 27–38 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computing Sciences, Utrecht University, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
Marco Spruit & Niels de Vries
Department of Public Health and Primary Care, Leiden University Medical Centre, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
Marco Spruit

Authors

Marco Spruit
View author publications
You can also search for this author in PubMed Google Scholar
Niels de Vries
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Spruit .

Editor information

Editors and Affiliations

Research & Innovation Institute (Rii), Warsaw, Poland
Anna Visvizi
Effat University, Jeddah, Saudi Arabia
Miltiadis D. Lytras
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Naif R. Aljohani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spruit, M., de Vries, N. (2021). Self-Service Data Science for Adverse Event Prediction in Electronic Healthcare Records. In: Visvizi, A., Lytras, M.D., Aljohani, N.R. (eds) Research and Innovation Forum 2020. RIIFORUM 2020. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-030-62066-0_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-62066-0_39
Published: 12 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62065-3
Online ISBN: 978-3-030-62066-0
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics