Loading [a11y]/accessibility-menu.js
Deduplication-aware I/O Buffer Management in the Linux Kernel for Improved I/O Performance and Memory Utilization | IEEE Conference Publication | IEEE Xplore

Deduplication-aware I/O Buffer Management in the Linux Kernel for Improved I/O Performance and Memory Utilization


Abstract:

The amount of data being produced and consumed keeps increasing. As a result, there can be a large amount of redundant data in the storage system. Storing and accessing t...Show More

Abstract:

The amount of data being produced and consumed keeps increasing. As a result, there can be a large amount of redundant data in the storage system. Storing and accessing these duplicate data unnecessarily consume disk space and I/O bandwidth, respectively. Nowadays, deduplication techniques have been widely deployed to remove the redundancy. In particular, block-level deduplication solutions are proven to be effective. These solutions operate at the granularity of data blocks, are situated immediately above the disk driver layer, and are designed to be transparent to the upper-layer file system. However, such a design hides the block redundancy information from the operating system's page cache, which is a critical system component for reducing slow disk access and improving I/O performance. Consequently, the page cache may cache duplicate data blocks and unnecessarily read data that have been in the page cache from the disk. This leads to wastage of memory space and compromised I/O efficiency. In this work, we propose Dual-Dedup, a lightweight scheme that makes the page cache management aware of lower-layer block deduplication. It discloses the redundancy knowledge detected by the block-level deduplication layer to the page cache, which can then remove redundancy in the cache and prevent unnecessary read requests. We have built a prototype of the system on the Linux EXT4. Experiments result demonstrate that Dual-Dedup can significantly improve read performance. As an example, when running an FIO benchmark on a data set with 25% duplicate data, Dual-Dedup improves read throughput by 34%.
Date of Conference: 29 January 2020 - 01 February 2020
Date Added to IEEE Xplore: 09 April 2020
ISBN Information:
Print on Demand(PoD) ISSN: 2374-314X
Conference Location: Pattaya, Thailand

Contact IEEE to Subscribe

References

References is not available for this document.