Abstract
Healthcare is a data intensive industry in which data mining has a great potential for improving the wellbeing of patients. However, a multitude of barriers impedes the application of machine learning. This work focuses on medical adverse event prediction by domain experts. In this research we present AutoCrisp as a self-service data science prototype for multivariate sequential classification on electronic healthcare records to facilitate self-service data science by domain experts, without requiring any sophisticated data mining knowledge. We performed an empirical case study with the objective to predict bleedings with the use of AutoCrisp. Our results show that multivariate sequential classification for medical adverse event prediction can indeed be made accessible to healthcare professionals by providing appropriate tooling support.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
W.A. Omta et al., HC StratoMineR: A web-based tool for the rapid analysis of high-content datasets. Assay Drug Dev. Technol. 14(8), 439–452 (2016). https://doi.org/10.1089/adt.2016.726
D. Tomar, S. Agarwal, A survey on data mining approaches for healthcare. Int. J. Bio-Sci. Bio-Technol. 5(5), 241–266 (2013). https://doi.org/10.14257/ijbsbt.2013.5.5.25
K. Srinivas, B. Rani, A. Govrdhan ‘Applications of data mining techniques in healthcare and prediction of heart attacks’. International Journal on Computer Science and Engineering (02 Jan 2010), pp. 250–255. 10.1.1.163.4924
M. Durairaj, V. Ranjani, Data mining applications in healthcare sector: A study. Int. J. Sci. Technol. 2(10), 29–35 (2013)
V. Marx, Biology: The big challenges of big data. Nature 498(7453), 255–260 (2013). https://doi.org/10.1038/498255a
W. Raghupathi, V. Raghupathi, Big data analytics in healthcare: Promise and potential. Health Inform. Sci. Syst. 2(1), 3 (2014). https://doi.org/10.1186/2047-2501-2-3
G. Neff, Why big data won’t cure us. Big Data 1(3), 117–123 (2013). https://doi.org/10.1089/big.2013.0029
W.A. Omta et al., PurifyR: An R package for highly automated, reproducible variable extraction and standardization. Syst. Med. 3(1), 1–7 (2020). https://doi.org/10.1089/sysm.2019.0007
T.H. Davenport, D.J. Patil, ‘Data scientist: The sexiest job of the 21st century: Meet the people who can coax treasure out of messy, unstructured data’, Harvard Business Review, (Oct 2012), p. 9. https://doi.org/10.1007/978-1-4302-4873-6_9
I. Yoo et al., Data mining in healthcare and biomedicine: A survey of the literature. J. Med. Syst. 36(4), 2431–2448 (2012). https://doi.org/10.1007/s10916-011-9710-5
M.F. Ghalwash, Z. Obradovic, Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinform. 13(1), 195 (2012). https://doi.org/10.1186/1471-2105-13-195
M. Hauskrecht et al., Conditional outlier detection for clinical alerting, AMIA … annual symposium proceedings/AMIA symposium. AMIA Symp. 2010, 286–290 (2010)
D. Kagen, C. Theobald, M. Freeman ‘CLINICIAN’S CORNER Risk prediction models for hospital readmission a systematic review’. 306 (15) (2015)
I. Batal et al., ‘Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data’, Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (2012) pp. 280–288 https://doi.org/10.1002/oby.21042.Prevalence
C. Rubinos, S. Ruland, ‘Neurologic complications in the intensive care unit’. Curr. Neurol. Neurosci. Rep. 16(6). (2016) https://doi.org/10.1007/s11910-016-0651-8.
M. Hall et al., ‘The WEKA Data Mining Software: An Update the WEKA Data Mining Software: An Update’ (Nov 2008)
K. Chauhan et al., ‘Automated machine learning: The new wave of machine learning’, in 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA). (IEEE, 2020) pp. 205–212.
R. Ooms, M. Spruit, ‘Self-Service Data Science in Healthcare with Automated Machine Learning’ (2020), pp. 1–18. https://doi.org/10.3390/app10092992
R. Wieringa, ‘Design Science as Nested Problem Solving’, International Conference on Design Science Research in Information Systems and Technology, (2009) pp. 1–12. https://doi.org/10.1145/1555619.1555630
D.M. Rubio et al., ‘Defining translational research: implications for training’, Academic medicine: Journal of the association of American medical colleges. NIH Public Access 85(3), 470 (2010)
M. Spruit, R. Jagesar, ‘Power to the People!—Meta-algorithmic modelling in applied data science’, Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 1(Ic3k) (2016), pp. 400–406. https://doi.org/10.5220/0006081604000406
M. Spruit, M. Lytras, ‘Applied data science in patient-centric healthcare: Adaptive analytic systems for empowering physicians and patients’, Telematics and Informatics (2018), pp. 643–653. https://doi.org/10.1016/j.tele.2018.04.002
M.R. Spruit, T. Dedding, D. Vijlbrief, ‘Self-service data science for healthcare professionals: A data preparation approach’, in Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020)—Volume 5: HEALTHINF. (Valetta: ScitePress, 2020), pp. 724–734
C. Baru et al., Report of the First Translational Data Science (TDS) Workshop (Illinois, Chicago, 2017)
J. Demšar et al., Orange: Data mining toolbox in python. J. Mach. Learn. Res. 14, 23492353 (2013)
R. Wirth, J. Hipp, ‘CRISP-DM : Towards a Standard Process Model for Data Mining’, Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, (24959), (2000) pp. 29–39. https://doi.org/10.1.1.198.5133
J. Sun, K.R. Chandan, ‘Big Data Analytics for Healthcare’, Kdd. (2013)
C.S. Kruse et al., Challenges and opportunities of big data in health care: A systematic review. JMIR Med. Inform. 4(4), e38 (2016). https://doi.org/10.2196/medinform.5359
S.R. Sukumar, N. Ramachandran, R.K. Ferrell, ‘Data Quality Challenges in Healthcare Claims Data: Experiences and Remedies’ (April 2014), (2016)
H.V.V. Jagadish et al., Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014). https://doi.org/10.1145/2611567
B.D. Fulcher, N.S. Jones, Highly comparative feature-based time-series classification. IEEE Trans. Knowl. Data Eng. 26(12), 3026–3037 (2014). https://doi.org/10.1109/TKDE.2014.2316504
S. Van Buuren, K. Groothuis-Oudshoorn, ‘MICE: Multivariate imputation by chained equations in R’. J. Stat. Softw. VV(II), 1–68 (2010)
J. Honaker, G. King, M. Blackwell ‘Amelia II: A program for missing data, R package version 1.5., 2012’, Available at https://gking.harvard.edu/amelia/, (2012) pp. 1–116
D.J. Stekhoven, P. Bühlmann, Missforest-Non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012). https://doi.org/10.1093/bioinformatics/btr597
A. Nanopoulos, R.O.B. Alcock, Y. Manolopoulos, Feature-based classi cation of time-series data. Int. J. Comput. Res. 10(3) (2001)
B. Esmael et al., A Statistical Feature-Based Approach for Operations Recognition in Drilling Time Series, vol. 5 (2013) pp. 454–461
M.B. Kursa, W.R. Rudnicki, Feature selection with the Boruta package. J. Stat. Softw. 36(11), 1–13 (2010)
R. Blagus, L. Lusa, Joint use of over-and under-sampling techniques and cross-validation for the development and assessment of prediction models. BMC Bioinform. 16(1), 1–10 (2015). https://doi.org/10.1186/s12859-015-0784-9
N. Lunardon, G. Menardi, N. Torelli, ROSE: A package for binary imbalanced learning. R J. 6(June), 79–89 (2014)
M. Bekkar, H.K. Djemaa, T.A. Alitouche, Evaluation measures for models assessment over imbalanced data sets. J. Inform. Eng. Appl. 3(10), 27–38 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Spruit, M., de Vries, N. (2021). Self-Service Data Science for Adverse Event Prediction in Electronic Healthcare Records. In: Visvizi, A., Lytras, M.D., Aljohani, N.R. (eds) Research and Innovation Forum 2020. RIIFORUM 2020. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-030-62066-0_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-62066-0_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62065-3
Online ISBN: 978-3-030-62066-0
eBook Packages: EducationEducation (R0)