Abstract
Data synchronization and content delivery services are key to supporting healthcare dataflows built by organizations. These types of services must prepare and process the data to accomplish mandatory non-functional requirements, such as security and reliability. This is a challenge as multiple applications, infrastructures, and platforms participate in healthcare dataflows. This paper presents FedFlow, a federated content distribution platform to build infrastructure-agnostic health data sharing and synchronization services to support healthcare dataflows. FedFlow creates secure and efficient data sharing and synchronization patterns for intra-dataflows and inter-dataflows by using implicit parallel data preparation schemes. A prototype of FedFlow was developed to conduct a case study about the building of inter-dataflows for delivering synchronized health data to multiple organizations by using combinations of non-functional requirements algorithms to accomplish governmental rules related to health data management. The experimental evaluation in a multi-cloud federated environment showed that FedFlow is around 90% faster than a traditional pipeline implementation, around 40% faster than Jenkins workflow management, and almost 30% faster than duplicity.
Similar content being viewed by others
References
Bala C (2012) Fault tolerance-challenges, techniques and implementation in cloud computing. Int J Comput Sci Issues 9(1):288
Bartík Ubik K (2015) Lz4 compression algorithm on fpga. In: ICECS. IEEE, pp 179–182
Bhushan G (2017) Security challenges in cloud computing: state-of-art. Int J Big Data Intell 4(2):81–107
Carrizales S-G, Reyes G-C, Morales-Sandoval C, Galaviz-Mosqueda A (2019) A data preparation approach for cloud storage based on containerized parallel patterns. In: IDCS. Springer, pp 478–490
Carrizales-Espinoza D, Sánchez-Gallegos DD, Gonzalez-Compean J, Carretero J (2021) A federated content distribution system to build health data synchronization services. In: PDP. IEEE, pp 1–8
Chervyakov N, Babenko M, Tchernykh A, Kucherov N, Miranda-López V, Cortés-Mendoza JM (2019) Ar-rrns: configurable reliable distributed data storage systems for internet of things to ensure security. FGCS 92:1080–1092
CloudFront A (2014) Amazon cloudfront. http://aws.amazon.com/cloudfront. Accessed 15 July 2019
Cook-Deegan R, Majumder MA, McGuire AL (2019) Introduction: sharing data in a medical information commons. J Law Med Ethics 47(1):7–11
Davami F, Adabi S, Rezaee A, Rahmani AM (2021) Fog-based architecture for scheduling multiple workflows with high availability requirement. Computing, pp 1–40
Deryabin M, Chervyakov N, Tchernykh A, Berezhnoy V, Djurabaev A, Nazarov A, Babenko M (2019) Comparative performance analysis of information dispersal methods. In: 24th FRUCT. IEEE, pp 67–74
Domadiya N, Rao UP (2020) Improving healthcare services using source anonymous scheme with privacy preserving distributed healthcare data collection and mining. Computing, pp 1–23
DuMont Schütte A, Hetzel J, Gatidis S, Hepp T, Dietz B, Bauer S, Schwab P (2021) Overcoming barriers to data sharing with medical image generation: a comprehensive evaluation. npj Dig Med 4(1):1–14
Duplicity (2021) duplicity encrypted bandwidth-efficient backup using the rsync algorithm. http://duplicity.nongnu.org/. Accessed 28 April 2021
Fan K, Wang S, Ren Y, Li H, Yang Y (2018) Medblock: Efficient and secure medical data sharing via blockchain. J Med Syst 42(8):1–11
Fang L, Yin C, Zhu J, Ge C, Tanveer M, Jolfaei A, Cao Z (2020) Privacy protection for medical data sharing in smart healthcare. ACM Trans Multim Comput Commun Appl 16(3s):1–18
Ford T et al (2021) The challenges and opportunities of mental health data sharing in the uk. Lancet Dig Health 3(6):e333–e336
French-Baidoo A (2018) Oppong: achieving confidentiality in electronic health records using cloud systems. IJCNIS 10(1):18
Gonzalez P, Sosa-Sosa SB (2015) Skycds: a resilient content delivery service based on diversified cloud storage. SIMPAT 54:64–85
Gonzalez S, Diaz CY (2018) Sacbe: a building block approach for constructing efficient and flexible end-to-end cloud storage. JSS 135:143–156
Gunawi Hao OS, Laksono S, Adityatama E (2016) Why does the cloud stop computing? Lessons from hundreds of service outages. In: SoCC. ACM, pp 1–16
Herrmann MD, Clunie DA, Fedorov A, Doyle SW, Pieper S, Klepeis V, Le LP, Mutter GL, Milstone DS, Schultz TJ et al (2018) Implementing the dicom standard for digital pathology. J Pathol Inform 9
Jan MA, Zhang W, Usman M, Tan Z, Khan F, Luo E (2019) Smartedge: An end-to-end encryption framework for an edge-enabled smart city application. J Netw Comput Appl 137:1–10
joe42: joe42/cloudfusion (2021). https://github.com/joe42/CloudFusion
Kim DO, Kim HY, Kim YK, Kim JJ (2019) Efficient techniques of parallel recovery for erasure-coding-based distributed file systems. Computing 101(12):1861–1884
Li Abramson A et al (2016) Data from qin-breast. Cancer Imaging Archive
Liu J, Li X, Ye L, Zhang H, Du X, Guizani M (2018) Bpds: a blockchain based privacy-preserving data sharing for electronic medical records. In: 2018 IEEE global communications conference (GLOBECOM). IEEE, pp 1–6
Mao Wu J (2015) Improving storage availability in cloud-of-clouds with hybrid redundant data distribution. In: IPDPS 2015m. IEEE, pp 633–642
Marcelín-Jiménez R, Ramírez-Ortíz JL, De La Colina ER, Pascoe-Chalke M, González-Compeán JL (2020) On the complexity and performance of the information dispersal algorithm. IEEE Access 8:159284–159290
Mathew Varia (2014) Overview of amazon web services. Amazon Whitepapers
Mayan JA, Anand DK, Sadhvi N (2017) Efficient and secure server migration on cloud storage with vsm and dropbox services. In: ICICES. IEEE, pp 1–5
McAfee (2019) Cloud adoption and risk report
Meister B (2009) Multi-level comparison of data deduplication in a backup scenario. In: Proceedings of SYSTOR. ACM, p 8
Meister B (2010) dedupv1: improving deduplication throughput using solid state drives (ssd). In: MSST 2010. IEEE, pp 1–6
Mier H, Delgadillo T (2018) Regulación del acceso al expediente clínico con fines de investigación en méxico. Revista CONAMED 22(1):27–31
Miller K (2018) Storreduce
Mitzenmacher M (2001) The power of two choices in randomized load balancing. IEEE Trans Parallel Distrib Syst 12(10):1094–1104
Mohamed SM, Wang Y (2021) A survey on novel classification of deduplication storage systems. Distrib Parallel Databases 39(1):201–230
Morales G, Diaz S (2018) A pairing-based cryptographic approach for data security in the cloud. IJISP 17(4):441–461
Morales-Ferreira P, Santiago-Duran M, Gaytan-Diaz C, Gonzalez-Compean J, Sosa-Sosa VJ, Lopez-Arevalo I (2018) A data distribution service for cloud and containerized storage based on information dispersal. In: SOSE. IEEE, pp 86–95
Odelu R, Kumari, Khan C (2017) Pairing-based cp-abe with constant-size ciphertexts and secret keys for cloud environment. Comput Stand Interf 54:3–9
Opara-Martins J, Sahandi R, Tian F (2016) Critical analysis of vendor lock-in and its impact on cloud computing migration: a business perspective. JoCCASA 5(1):4
Packer M (2018) Data sharing in medical research
Patel V (2019) A framework for secure and decentralized sharing of medical imaging data via blockchain consensus. Health Inform J 25(4):1398–1411
Phillips (2018) International data-sharing norms: from the oecd to the general data protection regulation (gdpr). Hum Genet 137(8):575–582
Reyes-Anastacio HG, Gonzalez-Compean J, Sosa-Sosa VJ, Carretero J, Garcia-Blas J (2020) Kulla, a container-centric construction model for building infrastructure-agnostic distributed and parallel applications. JSS 168:110665
Riazul Islam SM, Daehan K, Humaun Kabir M et al (2015) The internet of things for health care: a comprehensive survey. IEEE Access 3:678–708
Robin.io: Cloud native kubernetes storage. https://robin.io/
Roukounaki A, Efremidis S, Soldatos J, Neises J, Walloschke T, Kefalakis N (2019) Scalable and configurable end-to-end collection and analysis of iot security data: Towards end-to-end security in iot systems. In: GIoTS. IEEE, pp 1–6
Rowhani-Farid et al (2017) What incentives increase data sharing in health and medical research? a systematic review. Res Integ Peer Rev 2(1):1–10
rsync (2021) rsync. https://rsync.samba.org/. Accessed 28 April 2021
Rydning DRJGJ (2018) The digitization of the world from edge to core. International Data Corporation, Framingham
Sakellariou G, Gounaris A (2019) Homomorphically encrypted k-means on cloud-hosted servers with low client-side load. Computing 101(12):1813–1836
Samant SS, Chhetri MB, Vo QB, Kowalczyk R, Nepal S (2018) Towards end-to-end qos and cost-aware resource scaling in cloud-based iot data processing pipelines. In: SCC. IEEE, pp 287–290
Sánchez-Gallegos D, Carrizales-Espinoza A, Reyes-Anastacio, Gonzalez-Compean, Morales-Sandoval C, Galaviz-Mosqueda (2020) From the edge to the cloud: a continuous delivery and preparation model for processing big iot data. SIMPAT, p 102136
Sánchez-Gallegos G-M, Gonzalez-Compean V-R, Perez-Ramos C-E, Carretero (2020) On the continuous processing of health data in edge-fog-cloud computing by using micro/nanoservice composition. IEEE Access 8:120255–120281
Satti FA, Ali T, Hussain J, Khan WA, Khattak AM, Lee S (2020) Ubiquitous health profile (uhpr): a big data curation platform for supporting health data interoperability. Computing 102(11):2409–2444
Sayood K (2017) Introduction to data compression. Morgan Kaufmann
Shuaib M, Samad A, Alam S, Siddiqui ST (2019) Why adopting cloud is still a challenge?—a review on issues and challenges for cloud migration in organizations. Amb Commun Comput Syst 387–399
Spillner J, Müller J, Schill A (2013) Creating optimal cloud storage systems. Futur Gener Comput Syst 29(4):1062–1072
Tan CB, Hijazi MHA, Lim Y, Gani A (2018) A survey on proof of retrievability for cloud data integrity and availability: cloud storage state-of-the-art, issues, solutions and future trends. J Netw Comput Appl 110:75–86
Tan L, Yu K, Shi N, Yang C, Wei W, Lu H (2021) Towards secure and privacy-preserving data sharing for covid-19 medical records: a blockchain-empowered approach. IEEE Trans Netw Sci Eng
Uthayakumar J, Vengattaraman T, Dhavachelvan P (2018) A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications. J King Saud Univ Comput Inform Sci
Xia Q et al (2017) Medshare: trust-less medical data sharing among cloud service providers via blockchain. IEEE Access 5:14757–14767
Xu Z, Zhang J, Song Z, Liu Y, Li J, Zhou J (2021) A scheme for intelligent blockchain-based manufacturing industry supply chain management. Computing 1–20
Yang J, Sharp G, Veeraraghavan H, van Elmpt W, Dekker A, Lustberg T, Gooding M (2017) Data from lung ct segmentation challenge. Cancer Imaging Arch
Yang JJ et al (2015) A hybrid solution for privacy preserving medical data sharing in the cloud environment. Futur Gener Comput Syst 43:74–86
Zhang Z (2015) Secure and efficient data-sharing in clouds. CCPE 27(8):2125–2143
Zhao Y, Ren M, Jiang S, Zhu G, Xiong H (2019) An efficient and revocable storage cp-abe scheme in the cloud computing. Computing 101(8):1041–1065
Acknowledgements
This work has been partially supported by the grant “CABAHLA-CM: Convergencia Big data-HPC: de Los sensores a las Aplicaciones” (Ref: S2018/TCS-4423) of Madrid Regional Government; the Spanish Ministry of Science and Innovation Project “New Data Intensive Computing Methods for High-End and Edge Computing Platforms (DECIDE)”. Ref. PID2019-107858GB-I00; and by the project 41756 “Plataforma tecnológica para la gestión, aseguramiento, intercambio y preservación de grandes volúmenes de datos en salud y construcción de un repositorio nacional de servicios de análisis de datos de salud” by the FORDECYT-PRONACES.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Carrizales-Espinoza, D., Sanchez-Gallegos, D.D., Gonzalez-Compean, J.L. et al. FedFlow: a federated platform to build secure sharing and synchronization services for health dataflows. Computing 105, 1019–1037 (2023). https://doi.org/10.1007/s00607-021-01044-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-021-01044-3