Abstract
The development of technologies such as cloud computing, IoT, and social networks caused the amount of data generated daily to grow at an incredible rate, giving birth to the trend of Big Data. Big data has emerged in the healthcare field, thanks to the introduction of new tools producing massive amounts of structured and unstructured data. For this reason, medical institutions are moving towards a data-based healthcare, with the goal of leveraging this data to support clinical decision-making through suitable information systems. This comes with the need to evaluate their performance. One of the techniques commonly used is modeling, which consists in performing an evaluation of a model of the system under analysis, without actually implementing it. However, to make an adequate performance assessment of Big Data systems, we need a diversity of volumes and speeds that, due to the sensitivity of data concerning healthcare, is not available. While in other fields this problem is usually solved through the use of synthetic data generators, in healthcare these are few and not specialized in performance evaluation. Therefore, this work focuses on the creation of a synthetic data generator for evaluating the performance of a Big Data system model for healthcare. The dataset used as a reference for creating the generator is MIMIC-III, which contains the digital health records of thousands of patients collected over a time span of multiple years. First, we perform an analysis of the dataset, adopting multiple distribution fitting techniques (e.g., phase-type fitting) to model the temporal distribution of the data. Then, we develop a generator structured as a multi-module library to allow the customization of each component, specifically we propose a multiformalism model to reproduce the patient behavior inside the hospital. Finally, we test the generator by evaluating the performance in different scenarios. Through these experiments, we show the granular control that the generator offers over the synthetic data produced, and the simplicity with which it can be adapted to different uses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barbierato, E., Bobbio, A., Gribaudo, M., Iacono, M.: Multiformalism to support software rejuvenation modeling. In: 2012 IEEE 23rd International Symposium on Software Reliability Engineering Workshops, pp. 271–276. IEEE (2012)
Bause, F., Buchholz, P., Kemper, P.: A toolbox for functional and quantitative analysis of DEDS. In: Puigjaner, R., Savino, N.N., Serra, B. (eds.) TOOLS 1998. LNCS, vol. 1469, pp. 356–359. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-68061-6_32
Berger, V.W., Zhou, Y.: Kolmogorov-Smirnov Tests. In: Encyclopedia of Statistics in Behavioral Science. John Wiley & Sons, Ltd. (2005)
Bladt, M.: A review on phase-type distributions and their use in risk theory. ASTIN Bull.: J. IAA 35(1), 145–161 (2005)
Chin-Cheong, K., Sutter, T., Vogt, J.E.: Generation of heterogeneous synthetic electronic health records using GANs. In: Workshop on Machine Learning for Health at the 33rd Conference on Neural Information Processing Systems (2019)
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete patient records using generative adversarial networks. In: Machine Learning for Healthcare Conference, pp. 286–305. PMLR (2017)
Ciardo, G., Jones, R.L., III., Miner, A.S., Siminiceanu, R.I.: Logic and stochastic modeling with SMART. Perform. Eval. 63, 578–608 (2006)
Clark, G., et al.: The mobius modeling tool. In: Proceedings 9th International Workshop on Petri Nets and Performance Models, pp. 241–250 (2001)
Franceschinis, F., Gribaudo, M., Iacono, M., Mazzocca, N., Vittorini, V.: Towards an object based multi-formalism multi-solution modeling approach. In: Proceedings of the Second International Workshop on Modelling of Objects, Components, and Agents, pp. 47–66 (2002)
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Han, R., Lu, X., Xu, J.: On big data benchmarking. In: Big Data Benchmarks, Performance Optimization, and Emerging Hardware, pp. 3–18 (2014)
Harchol-Balter, M.: Real-world workloads: high variability and heavy tails. Performance modeling and design of computer systems: Queueing theory in action, pp. 347–348 (2013)
Health, M.o.: National Minimum Dataset (hospital events) (2012)
Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Lara, J.d., Vangheluwe, H.: Atom 3: A tool for multi-formalism and meta-modelling. In: Fundamental Approaches to Software Engineering: 5th International Conference, pp. 174–188 (2002)
Legato, M.J., Bilezikian, J.P.: Principles of Gender-specific Medicine. Gulf Professional Publishing (2004)
Moody, B., Moody, G., Villarroel, M., Clifford, G., Silva III, I.: Mimic-iii waveform database matched subset. PhysioNet (2017)
Rashidian, S., et al.: SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. In: Artificial Intelligence in Medicine, pp. 37–48 (2020)
Reinecke, P., Krauß, T., Wolter, K.: Phase-type fitting using hyperstar. In: Balsamo, M.S., Knottenbelt, W.J., Marin, A. (eds.) EPEW 2013. LNCS, vol. 8168, pp. 164–175. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40725-3_13
Serazzi, G., Casale, G., Bertoli, M.: Java modelling tools: an open source suite for queueing network modelling and workload analysis. In: Third International Conference on the Quantitative Evaluation of Systems, pp. 119–120 (2006)
Tang, H.: Confronting ethnicity-specific disease risk. Nat. Genet. 38(1), 13–15 (2006)
Trivedi, K.S.: SHARPE 2002: symbolic hierarchical automated reliability and performance evaluator. In: Proceedings of the 2002 International Conference on Dependable Systems and Networks, p. 544 (2002)
Walonoski, J., et al.: Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Am. Med. Inform. Assoc. 25(3), 230–238 (Mar2018)
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106 (2017)
Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Generation and evaluation of privacy preserving synthetic health data. Neurocomputing 416, 244–255 (2020)
Acknowledgments
This work has been partially supported by the Health Big Data Project (CCR-2018-23669122), funded by the Italian Ministry of Economy and Finance and coordinated by the Italian Ministry of Health and the network Alleanza Contro il Cancro. Additionally, we are grateful to Letizia Tanca and Giuseppe Serazzi for their advice during the definition of this work and the support in the revision of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Covioli, T., Dolci, T., Azzalini, F., Piantella, D., Barbierato, E., Gribaudo, M. (2023). Workflow Characterization of a Big Data System Model for Healthcare Through Multiformalism. In: Iacono, M., Scarpa, M., Barbierato, E., Serrano, S., Cerotti, D., Longo, F. (eds) Computer Performance Engineering and Stochastic Modelling. EPEW ASMTA 2023 2023. Lecture Notes in Computer Science, vol 14231. Springer, Cham. https://doi.org/10.1007/978-3-031-43185-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-43185-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43184-5
Online ISBN: 978-3-031-43185-2
eBook Packages: Computer ScienceComputer Science (R0)