Abstract
The collection of data during the routine delivery of care is changing the healthcare sector. Indeed, only from the clinical trial data it is difficult to obtain such a complete picture of the status of a patient as that provided by real-world data. However, the creation of valuable real-word evidence requires the adoption of an appropriate solution to ingest, store, and process the enormous amount of information coming from all the involved, typically heterogeneous data sources.
Data lake technologies are depicted as promising solutions for enhancing data management and analysis capabilities in the healthcare domain: we can rely on them to manage the complexity of big data volume and variety, providing data analysts with a self-service environment in which advanced analytics can be applied. In this paper we envision the adoption of a data lake federation through which organizations could achieve further benefits by sharing data. Exchanging data adds new research challenges related to guaranteeing data reliability and sovereignty. For instance, the collected data should be accurately described in order to document their quality, facilitate their discovery, define security and privacy policies. On the basis of the experience in Health Big Data, we are going to present an architecture for gathering real-world evidence, also identifying the research challenges from an IT perspective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
For privacy purposes it is not possible at this stage to give additional information about the case study, such as, for instance, the actors involved or details on the considered population.
- 3.
For simplicity, we can suppose that cold storage will be based on the cloud.
References
Decreto Legislativo 196/2003, integrated with D.lgs 101/2018. Gazzetta ufficiale (2018)
Bermbach, D., et al.: A research perspective on fog computing. In: Braubach, L., et al. (eds.) ICSOC 2017. LNCS, vol. 10797, pp. 198–210. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91764-1_16
Cappiello, C., et al.: Improving health monitoring with adaptive data movement in fog computing. Front. Robot. AI 7 (2020). https://doi.org/10.3389/frobt.2020.00096, https://www.frontiersin.org/article/10.3389/frobt.2020.00096
Diogo, M., Cabral, B., Bernardino, J.: Consistency models of nosql databases. Futur. Internet 11(2) (2019). https://doi.org/10.3390/fi11020043, https://www.mdpi.com/1999-5903/11/2/43
Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann, Burlington (2012). http://research.cs.wisc.edu/dibook/
European Commission: Regulation of the european parliament and of the council on european data governance (data governance act), November 2020. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52020PC0767
European Commission: Regulation of the European parliament and of the council on Harmonised rules on fair access to and use of data (data act), February 2022. https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:52022PC0068
European Parliament and Council of the European Union: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union (2016)
Geisler, S., et al.: Knowledge-driven data ecosystems toward data transparency. ACM J. Data Inf. Qual. 14(1), 3:1–3:12 (2022). https://doi.org/10.1145/3467022
Gorelik, A.: The Enteprise Big Data Lake. O’ Reilly, Sebastopol (2019)
ISO Central Secretary: Information security management. Standard ISO/IEC 27001, International Organization for Standardization, Geneva, CH (2018). https://www.iso.org/isoiec-27001-information-security.html
ISO Central Secretary: Information technology - security techniques - code of practice for protection of personally identifiable information (pii) in public clouds acting as pii processors. Standard ISO/IEC 27018, International Organization for Standardization, Geneva, CH (2019). https://www.iso.org/standard/76559.html
Kondylakis, H., Koumakis, L., Tsiknakis, M., Marias, K.: Implementing a data management infrastructure for big healthcare data. In: 2018 IEEE EMBS International Conference on Biomedical Health Informatics (BHI), pp. 361–364 (2018). https://doi.org/10.1109/BHI.2018.8333443
Lenzerini, M.: Data integration: a theoretical perspective. In: Popa, L., Abiteboul, S., Kolaitis, P.G. (eds.) Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 3–5 June, Madison, Wisconsin, USA, pp. 233–246. ACM (2002). https://doi.org/10.1145/543613.543644
Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–526 (2009)
Lins, S., Schneider, S., Sunyaev, A.: Trust is good, control is better: creating secure clouds by continuous auditing. IEEE Trans. Cloud Comput. 6(3), 890–903 (2018). https://doi.org/10.1109/TCC.2016.2522411
Salnitri, M., Jürjens, J., Mouratidis, H., Mancini, L., Giorgini, P. (eds.): Visual Privacy Management. LNCS, vol. 12030. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59944-7
Sookhak, M., et al.: Remote data auditing in cloud computing environments: a survey, taxonomy, and open issues. ACM Comput. Surv. 47(4) (2015). https://doi.org/10.1145/2764465
The Open Group: Soa reference architecture. https://www.opengroup.org/soa/source-book/soa_refarch/index.htm
U.S. Food and Drug Administration: Framework for fda’s real-word evidence program, December 2018. https://www.fda.gov/media/120060/download
Dehghani, Z.: How to move beyond a monolithic data lake to a distributed data mesh, May 2019. https://martinfowler.com/articles/data-monolith-to-mesh.html
Acknowledgment
This work has been partially supported by the Health Big Data Project (CCR-2018-23669122), funded by the Italian Ministry of Economy and Finance and coordinated by the Italian Ministry of Health and the network Alleanza Contro il Cancro.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cappiello, C., Gribaudo, M., Plebani, P., Salnitri, M., Tanca, L. (2022). Enabling Real-World Medicine with Data Lake Federation: A Research Perspective. In: Rezig, E.K., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2022 2022. Lecture Notes in Computer Science, vol 13814. Springer, Cham. https://doi.org/10.1007/978-3-031-23905-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-23905-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23904-5
Online ISBN: 978-3-031-23905-2
eBook Packages: Computer ScienceComputer Science (R0)