Loading [MathJax]/extensions/MathMenu.js
Crocus: Enabling Computing Resource Orchestration for Inline Cluster-Wide Deduplication on Scalable Storage Systems | IEEE Journals & Magazine | IEEE Xplore

Crocus: Enabling Computing Resource Orchestration for Inline Cluster-Wide Deduplication on Scalable Storage Systems


Abstract:

Inline deduplication dramatically improves storage space utilization. However, it degrades I/O throughput due to computeintensive deduplication operations such as chunkin...Show More

Abstract:

Inline deduplication dramatically improves storage space utilization. However, it degrades I/O throughput due to computeintensive deduplication operations such as chunking, fingerprinting or hashing of chunk content, and redundant lookup I/Os over the network in the I/O path. In particular, the fingerprint or hash generation of content contributes largely to the degraded I/O throughput and is computationally expensive. In this article, we propose CROCUS, a framework that enables compute resource orchestration to enhance cluster-wide deduplication performance. In particular, CROCUS takes into account all compute resources such as local and remote {CPU, GPU} by managing decentralized compute pools. An opportunistic Load-Aware Fingerprint Scheduler (LAFS), distributes and offloads compute-intensive deduplication operations in a load-aware fashion to compute pools. CROCUS is highly generic and can be adopted in both inline and offline deduplication with different storage tier configurations. We implemented CROCUS in Ceph scale-out storage system. Our extensive evaluation shows that CROCUS reduces the fingerprinting overhead by 86 percent with 4KB chunk size compared to Ceph with baseline deduplication while maintaining high disk-space savings. Our proposed LAFS scheduler, when tested in different internal and external contention scenarios also showed 54 percent improvement over a fixed or static scheduling approach.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 31, Issue: 8, 01 August 2020)
Page(s): 1740 - 1753
Date of Publication: 11 February 2020

ISSN Information:

Funding Agency:


References

References is not available for this document.