Abstract
With the explosive growth of medical data, the tendency to store medical data on cloud is extremely widespread. However, large-scale medical data have put great pressure on cloud storage systems since redundancy of data wastes much storage space and increases economic expense. Besides, security is also highly important for medical data stored in the cloud. In order to reduce redundancy and ensure security of medical data simultaneously, this study proposes an efficient and secure deduplication scheme ESDedup. Compared with existing works, the redundancy of medical data is summarized and denoising of deduplication is firstly presented to decrease the storage overhead. Afterward, the rewriting algorithm based on similarity instead of the time attribute eliminates more fragments. Meanwhile, the auditing strategy of blockchain is designed to promote the auditing efficiency and security. Experiments demonstrate that ESDedup not only promotes the scalability and system performance, but also improves the deduplication ratio by 55.9% compared with the state-of-the-art method.















Similar content being viewed by others
Data availability
The datasets of Influenza, Pathogen and Interpro analyzed during the current study are available in the repositories of ftp://download.nmdc.cn/Influenza/, ftp://download.nmdc.cn/pathogen/, and ftp://download.nmdc.cn/interpro/, respectively. The Emotions dataset is available from PhysioNet but restrictions apply to the availability of the data, which was used under license for this study. The Emotions dataset is available from https://doi.org/10.13026/cdb3-8925 with the permission of PhysioNet.
References
Lv Z, Qiao L (2020) Analysis of healthcare big data. Future Generat Comput Syst 109:103–110
Liao H, Tang M, Luo L, Li C, Chiclana F, Zeng X-J (2018) A bibliometric analysis and visualization of medical big data research. Sustainability 10(1):166
Saharan S, Somani G, Gupta G, Verma R, Gaur MS, Buyya R (2020) Quickdedup: Efficient vm deduplication in cloud computing environments. J Parallel and Distribut Comput 139:18–31
Kaur R, Chana I, Bhattacharya J (2018) Data deduplication techniques for efficient cloud storage management: a systematic review. J Supercomput 74(5):2035–2085
Pooranian Z, Chen K-C, Yu C-M, Conti M (2018) Rare: Defeating side channels based on data-deduplication in cloud storage. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 444–449
Stanford medicine health trends report. https://med.stanford.edu/content/dam/sm/sm-news/documents/StanfordMedicineHealthTrendsWhitePaper2017.pdf (2017)
Cogo V, Paulo J, Bessani A (2020) Genodedup: Similarity-based deduplication and delta-encoding for genome sequencing data. IEEE Transact Comput 70(5):669–681
Taghizadeh RG, Taghizadeh RG, Khakpash F, Marvasti MB, Asghari SA (2020) Ca-dedupe: Content-aware deduplication in ssds. J Supercomput 76(11):8901–8921
Liu J, Yun-Peng C, Xiao Q, Yao-Hong L (2018) Endurable ssd-based read cache for improving the performance of selective restore from deduplication systems. J comput sci technol 33(1):58–78
Wu S, Du C, Zhang W, Mao B, Jiang H (2021) Deduphr: Exploiting content locality to alleviate read/write interference in deduplication-based flash storage. IEEE Transact Comput. https://doi.org/10.1109/TC.2021.3084116
Mao B, Jiang H, Wu S, Tian L (2015) Leveraging data deduplication to improve the performance of primary storage systems in the cloud. IEEE transact comput 65(6):1775–1788
Wu S, Li K-C, Mao B, Liao M (2017) Dac: improving storage availability with deduplication-assisted cloud-of-clouds. Future Generat Comput Syst 74:190–198
Xia W, Feng D, Jiang H, Zhang Y, Chang V, Zou X (2019) Accelerating content-defined-chunking based data deduplication by exploiting parallelism. Future Generat Comput Syst 98:406–418
Xia W, Zou X, Jiang H, Zhou Y, Liu C, Feng D, Hua Y, Hu Y, Zhang Y (2020) The design of fast content-defined chunking for data deduplication based storage systems. IEEE Transact Parallel Distributed Syst 31(9):2017–2031
Xia W, Zhou Y, Jiang H, Feng D, Hua Y, Hu Y, Liu Q, Zhang Y (2016) Fastcdc: a fast and efficient content-defined chunking approach for data deduplication. In: 2016 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 16), pp. 101–114
Tan Y, Wang B, Wen J, Yan Z, Jiang H, Srisa-an W (2018) Improving restore performance in deduplication-based backup systems via a fine-grained defragmentation approach. IEEE Transact on Parallel Distribut Syst 29(10):2254–2267
Zhang Y, Fu M, Wu X, Wang F, Wang Q, Wang C, Dong X, Han H (2020) Improving restore performance of packed datasets in deduplication systems via reducing persistent fragmented chunks. IEEE Transact Parallel Distribut Syst 31(7):1651–1664
Zou X, Yuan J, Shilane P, Xia W, Zhang H, Wang X (2021) The dilemma between deduplication and locality: Can both be achieved? In: 19th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 21), pp. 171–185
Price WN, Cohen IG (2019) Privacy in the age of medical big data. Nat med 25(1):37–43
Yang X, Lu R, Shao J, Tang X, Ghorbani A (2020) Achieving efficient secure deduplication with user-defined access control in cloud. IEEE Transact Dependable Secure Comput 19(1):591–606
Bellare M, Keelveedhi S, Ristenpart T (2013) Message-locked encryption and secure deduplication. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 296–312
Oham C, Michelin RA, Jurdak R, Kanhere SS, Jha S (2021) B-ferl: Blockchain based framework for securing smart vehicles. Inform Process & Manag 58(1):102426
Hardin T, Kotz D (2021) Amanuensis: Information provenance for health-data systems. Inform Process Manag 58(2):102460
Cao S, Zhang G, Liu P, Zhang X, Neri F (2019) Cloud-assisted secure ehealth systems for tamper-proofing ehr via blockchain. Inform Sci 485:427–440
Zhao H, Wang L, Wang Y, Shu M, Liu J (2018) Feasibility study on security deduplication of medical cloud privacy data. EURASIP J Wireless Communicat Netw 2018(1):1–15
Li J, Yang Z, Ren Y, Lee PP, Zhang X (2020) Balancing storage efficiency and data confidentiality with tunable encrypted deduplication. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–15
Zhang Y, Xu C, Li H, Yang K, Zhou J, Lin X (2018) Healthdep: An efficient and secure deduplication scheme for cloud-assisted ehealth systems. IEEE Transact on Indu Inform 14(9):4101–4112
Zhang G, Yang Z, Xie H, Liu W (2021) A secure authorized deduplication scheme for cloud data based on blockchain. Inform Process Manag 58(3):102510
Zhang Y, Feng D, Jiang H, Xia W, Fu M, Huang F, Zhou Y (2016) A fast asymmetric extremum content defined chunking algorithm for data deduplication in backup storage systems. IEEE Transact Comput 66(2):199–211
Godavari A, Sudhakar C, Ramesh T (2020) Hybrid deduplication system-a block-level similarity-based approach. IEEE Syst J 15(3):3860–3870
Li H, Xia Q, Wang Y et al (2017) Research and improvement of kruskal algorithm. J Comput Communicat 5(12):63
Ma F, Ren M, Fu Y, Wang M, Li H, Song H, Jiang Y (2021) Security reinforcement for ethereum virtual machine. Inform Process Manage 58(4):102565
Acknowledgements
This research is supported by the National Key R &D Program of China (No.2018AAA0102100), the Fundamental Research Funds for the Central Universities of Central South University (No.2020zzts143), the Scientific and Technological Innovation Leading Plan of High-tech Industry of Hunan Province (No.2020GK2021).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict Of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiao, L., Zou, B., Zhu, C. et al. ESDedup: An efficient and secure deduplication scheme based on data similarity and blockchain for cloud-assisted medical storage systems. J Supercomput 79, 2932–2960 (2023). https://doi.org/10.1007/s11227-022-04746-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04746-3