Processing math: 0%
The Design of a Lossless Deduplication Scheme to Eliminate Fine-Grained Redundancy for JPEG Image Storage Systems | IEEE Journals & Magazine | IEEE Xplore

The Design of a Lossless Deduplication Scheme to Eliminate Fine-Grained Redundancy for JPEG Image Storage Systems


Abstract:

Image data storage has grown explosively, so image deduplication is used to save storage by eliminating redundancy between different images. However, traditional image de...Show More

Abstract:

Image data storage has grown explosively, so image deduplication is used to save storage by eliminating redundancy between different images. However, traditional image deduplication cannot eliminate fine-grained redundancy nor guarantee lossless results. In this work, we propose imDedup, a lossless and fine-grained deduplication scheme for JPEG image storage systems. Specifically, imDedup uses a novel sampling hash method, Feature Bitmap, to detect similar images in a fast way by utilizing the information distribution of JPEG data. Meanwhile, it uses Idelta, a novel delta encoder that incorporates image compression into deduplication, to guarantee the non-redundant data can be re-compressed via image encoding and thus improves the compression ratio. Besides, we propose the DCHash and Fixed-Point Matching (FPM) techniques to further speed up Idelta. We also propose imDedup-plus, which dynamically chooses the DCHash-based or FPM-based compressor to achieve higher throughputs without sacrificing the compression ratio. Experimental results demonstrate the superiority of the imDedup-based methods on five datasets. Compared with the state-of-the-art similarity detector and delta encoder, imDedup achieves 1.8–4.4\boldsymbol{\times} higher throughputs and 1.3–1.7\boldsymbol{\times} higher compression ratios, respectively. Besides, imDedup-plus can further achieve 1.3–2.9\boldsymbol{\times} higher throughputs than imDedup without sacrificing the compression ratio.
Published in: IEEE Transactions on Computers ( Volume: 73, Issue: 5, May 2024)
Page(s): 1385 - 1399
Date of Publication: 07 February 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.