Skip to main content

Enabling Real-World Medicine with Data Lake Federation: A Research Perspective

  • Conference paper
  • First Online:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare (DMAH 2022, Poly 2022)

Abstract

The collection of data during the routine delivery of care is changing the healthcare sector. Indeed, only from the clinical trial data it is difficult to obtain such a complete picture of the status of a patient as that provided by real-world data. However, the creation of valuable real-word evidence requires the adoption of an appropriate solution to ingest, store, and process the enormous amount of information coming from all the involved, typically heterogeneous data sources.

Data lake technologies are depicted as promising solutions for enhancing data management and analysis capabilities in the healthcare domain: we can rely on them to manage the complexity of big data volume and variety, providing data analysts with a self-service environment in which advanced analytics can be applied. In this paper we envision the adoption of a data lake federation through which organizations could achieve further benefits by sharing data. Exchanging data adds new research challenges related to guaranteeing data reliability and sovereignty. For instance, the collected data should be accurately described in order to document their quality, facilitate their discovery, define security and privacy policies. On the basis of the experience in Health Big Data, we are going to present an architecture for gathering real-world evidence, also identifying the research challenges from an IT perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.alleanzacontroilcancro.it/progetti/health-big-data/.

  2. 2.

    For privacy purposes it is not possible at this stage to give additional information about the case study, such as, for instance, the actors involved or details on the considered population.

  3. 3.

    For simplicity, we can suppose that cold storage will be based on the cloud.

References

  1. Decreto Legislativo 196/2003, integrated with D.lgs 101/2018. Gazzetta ufficiale (2018)

    Google Scholar 

  2. Bermbach, D., et al.: A research perspective on fog computing. In: Braubach, L., et al. (eds.) ICSOC 2017. LNCS, vol. 10797, pp. 198–210. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91764-1_16

    Chapter  Google Scholar 

  3. Cappiello, C., et al.: Improving health monitoring with adaptive data movement in fog computing. Front. Robot. AI 7 (2020). https://doi.org/10.3389/frobt.2020.00096, https://www.frontiersin.org/article/10.3389/frobt.2020.00096

  4. Diogo, M., Cabral, B., Bernardino, J.: Consistency models of nosql databases. Futur. Internet 11(2) (2019). https://doi.org/10.3390/fi11020043, https://www.mdpi.com/1999-5903/11/2/43

  5. Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann, Burlington (2012). http://research.cs.wisc.edu/dibook/

  6. European Commission: Regulation of the european parliament and of the council on european data governance (data governance act), November 2020. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52020PC0767

  7. European Commission: Regulation of the European parliament and of the council on Harmonised rules on fair access to and use of data (data act), February 2022. https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:52022PC0068

  8. European Parliament and Council of the European Union: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union (2016)

    Google Scholar 

  9. Geisler, S., et al.: Knowledge-driven data ecosystems toward data transparency. ACM J. Data Inf. Qual. 14(1), 3:1–3:12 (2022). https://doi.org/10.1145/3467022

  10. Gorelik, A.: The Enteprise Big Data Lake. O’ Reilly, Sebastopol (2019)

    Google Scholar 

  11. ISO Central Secretary: Information security management. Standard ISO/IEC 27001, International Organization for Standardization, Geneva, CH (2018). https://www.iso.org/isoiec-27001-information-security.html

  12. ISO Central Secretary: Information technology - security techniques - code of practice for protection of personally identifiable information (pii) in public clouds acting as pii processors. Standard ISO/IEC 27018, International Organization for Standardization, Geneva, CH (2019). https://www.iso.org/standard/76559.html

  13. Kondylakis, H., Koumakis, L., Tsiknakis, M., Marias, K.: Implementing a data management infrastructure for big healthcare data. In: 2018 IEEE EMBS International Conference on Biomedical Health Informatics (BHI), pp. 361–364 (2018). https://doi.org/10.1109/BHI.2018.8333443

  14. Lenzerini, M.: Data integration: a theoretical perspective. In: Popa, L., Abiteboul, S., Kolaitis, P.G. (eds.) Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 3–5 June, Madison, Wisconsin, USA, pp. 233–246. ACM (2002). https://doi.org/10.1145/543613.543644

  15. Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–526 (2009)

    Google Scholar 

  16. Lins, S., Schneider, S., Sunyaev, A.: Trust is good, control is better: creating secure clouds by continuous auditing. IEEE Trans. Cloud Comput. 6(3), 890–903 (2018). https://doi.org/10.1109/TCC.2016.2522411

    Article  Google Scholar 

  17. Salnitri, M., Jürjens, J., Mouratidis, H., Mancini, L., Giorgini, P. (eds.): Visual Privacy Management. LNCS, vol. 12030. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59944-7

    Book  Google Scholar 

  18. Sookhak, M., et al.: Remote data auditing in cloud computing environments: a survey, taxonomy, and open issues. ACM Comput. Surv. 47(4) (2015). https://doi.org/10.1145/2764465

  19. The Open Group: Soa reference architecture. https://www.opengroup.org/soa/source-book/soa_refarch/index.htm

  20. U.S. Food and Drug Administration: Framework for fda’s real-word evidence program, December 2018. https://www.fda.gov/media/120060/download

  21. Dehghani, Z.: How to move beyond a monolithic data lake to a distributed data mesh, May 2019. https://martinfowler.com/articles/data-monolith-to-mesh.html

Download references

Acknowledgment

This work has been partially supported by the Health Big Data Project (CCR-2018-23669122), funded by the Italian Ministry of Economy and Finance and coordinated by the Italian Ministry of Health and the network Alleanza Contro il Cancro.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cinzia Cappiello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cappiello, C., Gribaudo, M., Plebani, P., Salnitri, M., Tanca, L. (2022). Enabling Real-World Medicine with Data Lake Federation: A Research Perspective. In: Rezig, E.K., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2022 2022. Lecture Notes in Computer Science, vol 13814. Springer, Cham. https://doi.org/10.1007/978-3-031-23905-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23905-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23904-5

  • Online ISBN: 978-3-031-23905-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics