Skip to main content

Workflow Characterization of a Big Data System Model for Healthcare Through Multiformalism

  • Conference paper
  • First Online:
Computer Performance Engineering and Stochastic Modelling (EPEW 2023, ASMTA 2023)

Abstract

The development of technologies such as cloud computing, IoT, and social networks caused the amount of data generated daily to grow at an incredible rate, giving birth to the trend of Big Data. Big data has emerged in the healthcare field, thanks to the introduction of new tools producing massive amounts of structured and unstructured data. For this reason, medical institutions are moving towards a data-based healthcare, with the goal of leveraging this data to support clinical decision-making through suitable information systems. This comes with the need to evaluate their performance. One of the techniques commonly used is modeling, which consists in performing an evaluation of a model of the system under analysis, without actually implementing it. However, to make an adequate performance assessment of Big Data systems, we need a diversity of volumes and speeds that, due to the sensitivity of data concerning healthcare, is not available. While in other fields this problem is usually solved through the use of synthetic data generators, in healthcare these are few and not specialized in performance evaluation. Therefore, this work focuses on the creation of a synthetic data generator for evaluating the performance of a Big Data system model for healthcare. The dataset used as a reference for creating the generator is MIMIC-III, which contains the digital health records of thousands of patients collected over a time span of multiple years. First, we perform an analysis of the dataset, adopting multiple distribution fitting techniques (e.g., phase-type fitting) to model the temporal distribution of the data. Then, we develop a generator structured as a multi-module library to allow the customization of each component, specifically we propose a multiformalism model to reproduce the patient behavior inside the hospital. Finally, we test the generator by evaluating the performance in different scenarios. Through these experiments, we show the granular control that the generator offers over the synthetic data produced, and the simplicity with which it can be adapted to different uses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barbierato, E., Bobbio, A., Gribaudo, M., Iacono, M.: Multiformalism to support software rejuvenation modeling. In: 2012 IEEE 23rd International Symposium on Software Reliability Engineering Workshops, pp. 271–276. IEEE (2012)

    Google Scholar 

  2. Bause, F., Buchholz, P., Kemper, P.: A toolbox for functional and quantitative analysis of DEDS. In: Puigjaner, R., Savino, N.N., Serra, B. (eds.) TOOLS 1998. LNCS, vol. 1469, pp. 356–359. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-68061-6_32

    Chapter  Google Scholar 

  3. Berger, V.W., Zhou, Y.: Kolmogorov-Smirnov Tests. In: Encyclopedia of Statistics in Behavioral Science. John Wiley & Sons, Ltd. (2005)

    Google Scholar 

  4. Bladt, M.: A review on phase-type distributions and their use in risk theory. ASTIN Bull.: J. IAA 35(1), 145–161 (2005)

    Google Scholar 

  5. Chin-Cheong, K., Sutter, T., Vogt, J.E.: Generation of heterogeneous synthetic electronic health records using GANs. In: Workshop on Machine Learning for Health at the 33rd Conference on Neural Information Processing Systems (2019)

    Google Scholar 

  6. Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete patient records using generative adversarial networks. In: Machine Learning for Healthcare Conference, pp. 286–305. PMLR (2017)

    Google Scholar 

  7. Ciardo, G., Jones, R.L., III., Miner, A.S., Siminiceanu, R.I.: Logic and stochastic modeling with SMART. Perform. Eval. 63, 578–608 (2006)

    Google Scholar 

  8. Clark, G., et al.: The mobius modeling tool. In: Proceedings 9th International Workshop on Petri Nets and Performance Models, pp. 241–250 (2001)

    Google Scholar 

  9. Franceschinis, F., Gribaudo, M., Iacono, M., Mazzocca, N., Vittorini, V.: Towards an object based multi-formalism multi-solution modeling approach. In: Proceedings of the Second International Workshop on Modelling of Objects, Components, and Agents, pp. 47–66 (2002)

    Google Scholar 

  10. Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  11. Han, R., Lu, X., Xu, J.: On big data benchmarking. In: Big Data Benchmarks, Performance Optimization, and Emerging Hardware, pp. 3–18 (2014)

    Google Scholar 

  12. Harchol-Balter, M.: Real-world workloads: high variability and heavy tails. Performance modeling and design of computer systems: Queueing theory in action, pp. 347–348 (2013)

    Google Scholar 

  13. Health, M.o.: National Minimum Dataset (hospital events) (2012)

    Google Scholar 

  14. Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)

    Google Scholar 

  15. Lara, J.d., Vangheluwe, H.: Atom 3: A tool for multi-formalism and meta-modelling. In: Fundamental Approaches to Software Engineering: 5th International Conference, pp. 174–188 (2002)

    Google Scholar 

  16. Legato, M.J., Bilezikian, J.P.: Principles of Gender-specific Medicine. Gulf Professional Publishing (2004)

    Google Scholar 

  17. Moody, B., Moody, G., Villarroel, M., Clifford, G., Silva III, I.: Mimic-iii waveform database matched subset. PhysioNet (2017)

    Google Scholar 

  18. Rashidian, S., et al.: SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. In: Artificial Intelligence in Medicine, pp. 37–48 (2020)

    Google Scholar 

  19. Reinecke, P., Krauß, T., Wolter, K.: Phase-type fitting using hyperstar. In: Balsamo, M.S., Knottenbelt, W.J., Marin, A. (eds.) EPEW 2013. LNCS, vol. 8168, pp. 164–175. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40725-3_13

    Chapter  Google Scholar 

  20. Serazzi, G., Casale, G., Bertoli, M.: Java modelling tools: an open source suite for queueing network modelling and workload analysis. In: Third International Conference on the Quantitative Evaluation of Systems, pp. 119–120 (2006)

    Google Scholar 

  21. Tang, H.: Confronting ethnicity-specific disease risk. Nat. Genet. 38(1), 13–15 (2006)

    Article  MathSciNet  Google Scholar 

  22. Trivedi, K.S.: SHARPE 2002: symbolic hierarchical automated reliability and performance evaluator. In: Proceedings of the 2002 International Conference on Dependable Systems and Networks, p. 544 (2002)

    Google Scholar 

  23. Walonoski, J., et al.: Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Am. Med. Inform. Assoc. 25(3), 230–238 (Mar2018)

    Google Scholar 

  24. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106 (2017)

    Google Scholar 

  25. Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Generation and evaluation of privacy preserving synthetic health data. Neurocomputing 416, 244–255 (2020)

    Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the Health Big Data Project (CCR-2018-23669122), funded by the Italian Ministry of Economy and Finance and coordinated by the Italian Ministry of Health and the network Alleanza Contro il Cancro. Additionally, we are grateful to Letizia Tanca and Giuseppe Serazzi for their advice during the definition of this work and the support in the revision of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enrico Barbierato .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Covioli, T., Dolci, T., Azzalini, F., Piantella, D., Barbierato, E., Gribaudo, M. (2023). Workflow Characterization of a Big Data System Model for Healthcare Through Multiformalism. In: Iacono, M., Scarpa, M., Barbierato, E., Serrano, S., Cerotti, D., Longo, F. (eds) Computer Performance Engineering and Stochastic Modelling. EPEW ASMTA 2023 2023. Lecture Notes in Computer Science, vol 14231. Springer, Cham. https://doi.org/10.1007/978-3-031-43185-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43185-2_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43184-5

  • Online ISBN: 978-3-031-43185-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics