Loading web-font TeX/Math/Italic
MGRM: A Multi-Segment Greedy Rewriting Method to Alleviate Data Fragmentation in Deduplication-Based Cloud Backup Systems | IEEE Journals & Magazine | IEEE Xplore

MGRM: A Multi-Segment Greedy Rewriting Method to Alleviate Data Fragmentation in Deduplication-Based Cloud Backup Systems


Abstract:

Data deduplication has been broadly used in Cloud due to its storage space saving ability. Capping methods that rewrite the data chunks of low Container Reference Ratio (...Show More

Abstract:

Data deduplication has been broadly used in Cloud due to its storage space saving ability. Capping methods that rewrite the data chunks of low Container Reference Ratio (CRR) containers are developed to alleviate the data fragmentation in Cloud. We analyze and observe from real traces that a number of segments only point to low CRR containers, while some others only contain high CRR containers. This interesting observation is ignored by the existing capping methods. To address this problem, we propose a multi-segment greedy rewriting method named MGRM. MGRM sorts containers of segments in a sequential way. More specifically, given the ith segment currently being processed, MGRM will sort all the containers in the top ith segments. This salient searching feature enables MGRM to select and rewrite the true low-reference container set. Moreover, to achieve a good balance between deduplication ratio and restore performance, MGRM has two working modes: an optimal rewriting mode and a radical rewriting mode. When working in the optimal rewriting mode, MGRM aims to improve the deduplication ratio; when the radical rewriting mode, MGRM strives to improve the restore performance. MGRM adaptively switches the working mode according to workload. Furthermore, unlike the existing capping methods that improve restore performance at the cost of the deduplication ratio, MGRM pays attention to both aspects. Our extensive experimental results show that MGRM achieves high restore performance, coupled with a high deduplication ratio. In particular, compared with the two state-of-art schemes FC and FLC, MGRM improves the deduplication ratio and restore performance by up to 114.83% and 99.34%, respectively.
Published in: IEEE Transactions on Cloud Computing ( Volume: 11, Issue: 3, 01 July-Sept. 2023)
Page(s): 2503 - 2516
Date of Publication: 17 October 2022

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.