Skip to main content
Log in

ESDedup: An efficient and secure deduplication scheme based on data similarity and blockchain for cloud-assisted medical storage systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the explosive growth of medical data, the tendency to store medical data on cloud is extremely widespread. However, large-scale medical data have put great pressure on cloud storage systems since redundancy of data wastes much storage space and increases economic expense. Besides, security is also highly important for medical data stored in the cloud. In order to reduce redundancy and ensure security of medical data simultaneously, this study proposes an efficient and secure deduplication scheme ESDedup. Compared with existing works, the redundancy of medical data is summarized and denoising of deduplication is firstly presented to decrease the storage overhead. Afterward, the rewriting algorithm based on similarity instead of the time attribute eliminates more fragments. Meanwhile, the auditing strategy of blockchain is designed to promote the auditing efficiency and security. Experiments demonstrate that ESDedup not only promotes the scalability and system performance, but also improves the deduplication ratio by 55.9% compared with the state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data availability

The datasets of Influenza, Pathogen and Interpro analyzed during the current study are available in the repositories of ftp://download.nmdc.cn/Influenza/, ftp://download.nmdc.cn/pathogen/, and ftp://download.nmdc.cn/interpro/, respectively. The Emotions dataset is available from PhysioNet but restrictions apply to the availability of the data, which was used under license for this study. The Emotions dataset is available from https://doi.org/10.13026/cdb3-8925 with the permission of PhysioNet.

References

  1. Lv Z, Qiao L (2020) Analysis of healthcare big data. Future Generat Comput Syst 109:103–110

    Article  Google Scholar 

  2. Liao H, Tang M, Luo L, Li C, Chiclana F, Zeng X-J (2018) A bibliometric analysis and visualization of medical big data research. Sustainability 10(1):166

    Article  Google Scholar 

  3. Saharan S, Somani G, Gupta G, Verma R, Gaur MS, Buyya R (2020) Quickdedup: Efficient vm deduplication in cloud computing environments. J Parallel and Distribut Comput 139:18–31

    Article  Google Scholar 

  4. Kaur R, Chana I, Bhattacharya J (2018) Data deduplication techniques for efficient cloud storage management: a systematic review. J Supercomput 74(5):2035–2085

    Article  Google Scholar 

  5. Pooranian Z, Chen K-C, Yu C-M, Conti M (2018) Rare: Defeating side channels based on data-deduplication in cloud storage. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 444–449

  6. Stanford medicine health trends report. https://med.stanford.edu/content/dam/sm/sm-news/documents/StanfordMedicineHealthTrendsWhitePaper2017.pdf (2017)

  7. Cogo V, Paulo J, Bessani A (2020) Genodedup: Similarity-based deduplication and delta-encoding for genome sequencing data. IEEE Transact Comput 70(5):669–681

    Article  MathSciNet  MATH  Google Scholar 

  8. Taghizadeh RG, Taghizadeh RG, Khakpash F, Marvasti MB, Asghari SA (2020) Ca-dedupe: Content-aware deduplication in ssds. J Supercomput 76(11):8901–8921

    Article  Google Scholar 

  9. Liu J, Yun-Peng C, Xiao Q, Yao-Hong L (2018) Endurable ssd-based read cache for improving the performance of selective restore from deduplication systems. J comput sci technol 33(1):58–78

    Article  Google Scholar 

  10. Wu S, Du C, Zhang W, Mao B, Jiang H (2021) Deduphr: Exploiting content locality to alleviate read/write interference in deduplication-based flash storage. IEEE Transact Comput. https://doi.org/10.1109/TC.2021.3084116

    Article  MATH  Google Scholar 

  11. Mao B, Jiang H, Wu S, Tian L (2015) Leveraging data deduplication to improve the performance of primary storage systems in the cloud. IEEE transact comput 65(6):1775–1788

    Article  MathSciNet  MATH  Google Scholar 

  12. Wu S, Li K-C, Mao B, Liao M (2017) Dac: improving storage availability with deduplication-assisted cloud-of-clouds. Future Generat Comput Syst 74:190–198

    Article  Google Scholar 

  13. Xia W, Feng D, Jiang H, Zhang Y, Chang V, Zou X (2019) Accelerating content-defined-chunking based data deduplication by exploiting parallelism. Future Generat Comput Syst 98:406–418

    Article  Google Scholar 

  14. Xia W, Zou X, Jiang H, Zhou Y, Liu C, Feng D, Hua Y, Hu Y, Zhang Y (2020) The design of fast content-defined chunking for data deduplication based storage systems. IEEE Transact Parallel Distributed Syst 31(9):2017–2031

    Article  Google Scholar 

  15. Xia W, Zhou Y, Jiang H, Feng D, Hua Y, Hu Y, Liu Q, Zhang Y (2016) Fastcdc: a fast and efficient content-defined chunking approach for data deduplication. In: 2016 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 16), pp. 101–114

  16. Tan Y, Wang B, Wen J, Yan Z, Jiang H, Srisa-an W (2018) Improving restore performance in deduplication-based backup systems via a fine-grained defragmentation approach. IEEE Transact on Parallel Distribut Syst 29(10):2254–2267

    Article  Google Scholar 

  17. Zhang Y, Fu M, Wu X, Wang F, Wang Q, Wang C, Dong X, Han H (2020) Improving restore performance of packed datasets in deduplication systems via reducing persistent fragmented chunks. IEEE Transact Parallel Distribut Syst 31(7):1651–1664

    Article  Google Scholar 

  18. Zou X, Yuan J, Shilane P, Xia W, Zhang H, Wang X (2021) The dilemma between deduplication and locality: Can both be achieved? In: 19th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 21), pp. 171–185

  19. Price WN, Cohen IG (2019) Privacy in the age of medical big data. Nat med 25(1):37–43

    Article  Google Scholar 

  20. Yang X, Lu R, Shao J, Tang X, Ghorbani A (2020) Achieving efficient secure deduplication with user-defined access control in cloud. IEEE Transact Dependable Secure Comput 19(1):591–606

    Article  Google Scholar 

  21. Bellare M, Keelveedhi S, Ristenpart T (2013) Message-locked encryption and secure deduplication. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 296–312

  22. Oham C, Michelin RA, Jurdak R, Kanhere SS, Jha S (2021) B-ferl: Blockchain based framework for securing smart vehicles. Inform Process & Manag 58(1):102426

    Article  Google Scholar 

  23. Hardin T, Kotz D (2021) Amanuensis: Information provenance for health-data systems. Inform Process Manag 58(2):102460

    Article  Google Scholar 

  24. Cao S, Zhang G, Liu P, Zhang X, Neri F (2019) Cloud-assisted secure ehealth systems for tamper-proofing ehr via blockchain. Inform Sci 485:427–440

    Article  Google Scholar 

  25. Zhao H, Wang L, Wang Y, Shu M, Liu J (2018) Feasibility study on security deduplication of medical cloud privacy data. EURASIP J Wireless Communicat Netw 2018(1):1–15

    Google Scholar 

  26. Li J, Yang Z, Ren Y, Lee PP, Zhang X (2020) Balancing storage efficiency and data confidentiality with tunable encrypted deduplication. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–15

  27. Zhang Y, Xu C, Li H, Yang K, Zhou J, Lin X (2018) Healthdep: An efficient and secure deduplication scheme for cloud-assisted ehealth systems. IEEE Transact on Indu Inform 14(9):4101–4112

    Article  Google Scholar 

  28. Zhang G, Yang Z, Xie H, Liu W (2021) A secure authorized deduplication scheme for cloud data based on blockchain. Inform Process Manag 58(3):102510

    Article  Google Scholar 

  29. Zhang Y, Feng D, Jiang H, Xia W, Fu M, Huang F, Zhou Y (2016) A fast asymmetric extremum content defined chunking algorithm for data deduplication in backup storage systems. IEEE Transact Comput 66(2):199–211

    MathSciNet  MATH  Google Scholar 

  30. Godavari A, Sudhakar C, Ramesh T (2020) Hybrid deduplication system-a block-level similarity-based approach. IEEE Syst J 15(3):3860–3870

    Article  Google Scholar 

  31. Li H, Xia Q, Wang Y et al (2017) Research and improvement of kruskal algorithm. J Comput Communicat 5(12):63

    Article  Google Scholar 

  32. Ma F, Ren M, Fu Y, Wang M, Li H, Song H, Jiang Y (2021) Security reinforcement for ethereum virtual machine. Inform Process Manage 58(4):102565

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the National Key R &D Program of China (No.2018AAA0102100), the Fundamental Research Funds for the Central Universities of Central South University (No.2020zzts143), the Scientific and Technological Innovation Leading Plan of High-tech Industry of Hunan Province (No.2020GK2021).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengzhang Zhu.

Ethics declarations

Conflict Of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, L., Zou, B., Zhu, C. et al. ESDedup: An efficient and secure deduplication scheme based on data similarity and blockchain for cloud-assisted medical storage systems. J Supercomput 79, 2932–2960 (2023). https://doi.org/10.1007/s11227-022-04746-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04746-3

Keywords

Navigation