skip to main content
research-article

Transparent Online Storage Compression at the Block-Level

Published: 01 May 2012 Publication History

Abstract

In this work, we examine how transparent block-level compression in the I/O path can improve both the space efficiency and performance of online storage. We present ZBD, a block-layer driver that transparently compresses and decompresses data as they flow between the file-system and storage devices. Our system provides support for variable-size blocks, metadata caching, and persistence, as well as block allocation and cleanup. ZBD targets maintaining high performance, by mitigating compression and decompression overheads that can have a significant impact on performance by leveraging modern multicore CPUs through explicit work scheduling. We present two case-studies for compression. First, we examine how our approach can be used to increase the capacity of SSD-based caches, thus increasing their cost-effectiveness. Then, we examine how ZBD can improve the efficiency of online disk-based storage systems.
We evaluate our approach in the Linux kernel on a commodity server with multicore CPUs, using PostMark, SPECsfs2008, TPC-C, and TPC-H. Preliminary results show that transparent online block-level compression is a viable option for improving effective storage capacity, it can improve I/O performance up to 80% by reducing I/O traffic and seek distance, and has a negative impact on performance, up to 34%, only when single-thread I/O latency is critical. In particular, for SSD-based caching, our results indicate that, in line with current technology trends, compressed caching trades off CPU utilization for performance and enhances SSD efficiency as a storage cache up to 99%.

References

[1]
Adaptec, Inc. 2009. MaxIQ SSD cache performance. White paper. www.adaptec.com/en-US/products/CloudComputing/-MaxIQ/SSD-Cache-Performance/index.htm.
[2]
Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J. D., Manasse, M., and Panigrahy, R. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference (ATC). 57--70.
[3]
Aleph One Ltd, Embedded Debian. 2002. Yaffs: A NAND-Flash Filesystem.
[4]
Appel, A. W. and Li, K. 1991. Virtual memory primitives for user programs. SIGPLAN Notes 26, 4, 96--107.
[5]
Ayers, L. 1997. E2compr: Transparent file compression for Linux. http://e2compr.sourceforge.net/.
[6]
Bobbarjung, D. R., Jagannathan, S., and Dubnicki, C. 2006. Improving duplicate elimination in storage systems. Trans. Storage 2, 4, 424--448.
[7]
Burrows, M., Jerian, C., Lampson, B., and Mann, T. 1992. On-line data compression in a log-structured file system. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’92). ACM, New York, 2--9.
[8]
Cate, V. and Gross, T. 1991. Combining the concepts of compression and Caching for two-level filesystem. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’91). ACM, New York, 200--211.
[9]
Coffing, C. and Brown, J. H. 1997. A survey of modern file compression techniques. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.9847.
[10]
Cormack, G. V. 1985. Data compression on a database system. Comm. ACM 28, 12, 1336--1342.
[11]
Deutsch, L. P. and Gailly, J.-L. 1996. ZLIB Compressed Data Format Specification version 3.3. Internet RFC 1950.
[12]
Dirik, C. and Jacob, B. 2009. The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization. In Proceedings of the ISCA’09. ACM, 279--289.
[13]
Douglis, F. 1992. On the role of compression in distributed systems. In Proceedings of the ACM SIGOPS, EW 5. 1--6.
[14]
Douglis, F. 1993. The compression cache: Using on-line compression to extend physical memory. In Proceedings of the Winter USENIX Conference. 519--529.
[15]
Engel, J. and Mertens, R. 2006. LogFS - finally a scalable flash file system. http://lazybastard.org/ joern/logfs1.pdf.
[16]
Fusion-io. 2007. Fusion-IO’s solid state storage: A new standard for enterprise-class reliability. http://www.fusionio.com.
[17]
Gupta, N. 2010. Compcache: Compressed in-memory swap device for Linux. http://code.google.com/p/compcache.
[18]
Katcher, J. 1997. PostMark: A new file system benchmark. http:// www.netapp.com/ tech_library/3022.html.
[19]
Kgil, T. and Trevor, M. 2006. Flashcache: A NAND flash memory file cache for low power web servers. In Proceedings of the CASES’06. ACM, 103--112.
[20]
Kim, H. and Ahn, S. 2008. BPLRU: A buffer management scheme for improving random writes in flash storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). USENIX Association, Berkeley, CA, 1--14.
[21]
Lee, S.-W., Moon, B., Park, C., Kim, J.-M., and Kim, S.-W. 2008. A case for flash memory SSD in enterprise database applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, 1075--1086.
[22]
Lelewer, D. A. and Hirschberg, D. S. 1987. Data compression. ACM Comput. Surv. 19, 3, 261--296.
[23]
Leventhal, A. 2008. Flash storage memory. Comm. ACM 51, 7, 47--51.
[24]
Lougher, P. and Lougher, R. 2008. SquashFS. http://squashfs.sourceforge.net.
[25]
Makatos, T., Klonatos, Y., Marazakis, M., Flouris, M. D., and Bilas, A. 2010a. Using transparent compression to improve SSD-based I/O caches. In Proceedings of the 5th European Conference on Computer Systems (EuroSys’10). ACM, New York, NY, 1--14.
[26]
Makatos, T., Klonatos, Y., Marazakis, M., Flouris, M. D., and Bilas, A. 2010b. ZBD: Using transparent compression at the block level to increase storage space efficiency. In Proceedings of the IEEE International Workshop on Storage Network Architecture and Parallel I/Os. 61--70.
[27]
Manber, U. 1994. Finding similar files in a large file system. In Proceedings of the USENIX Winter 1994 Technical Conference (WTEC’94). USENIX Association, 2--2.
[28]
Meisner, D., Gold, B. T., and Wenisch, T. F. 2009. POWERNAP: Eliminating server idle power. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). ACM, New York, 205--216.
[29]
Microsoft Corporation. 2008. Understanding NTFS Compression. http://blogs.msdn.com/ntdebugging/archive/2008/05/20/-understanding-ntfs-compression.aspx.
[30]
Microsoft Corporation. 2009. Best practices for NTFS compression in Windows. support.microsoft.com/default.aspx?scid=kb;en-us;Q251186.
[31]
Microsoft Corporation. 2010. Explore the features: Windows ReadyBoost. www.microsoft.com/windows/windows-vista/features/readyboost.aspx.
[32]
Narayanan, D., Thereska, E., Donnelly, A., Elnikety, S., and Rowstron, A. 2009. Migrating server storage to SSDS: Analysis of tradeoffs. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys’09). ACM, New York, 145--158.
[33]
Ng, W. K. and Ravishankar, C. V. 1997. Block-oriented compression techniques for large statistical databases. IEEE Trans. Knowl. Data Eng. 9, 2, 314--328.
[34]
North American Systems International, Inc. FalconStor HotZone - Maximize the performance of your SAN. http://www.nasi.com/hotZone.php.
[35]
Oberhumer, M. F. X. J. 2008. LZO--A real-time data compression library. http://www.oberhumer.com/opensource/lzo/.
[36]
Oracle Corporation and Sun Microsystems, Inc. 2009. Oracle Solaris ZFS. http://www.oracle.com/us/products/servers-storage/storage/storage-software/031857.htm.
[37]
Poess, M. and Potapov, D. 2003. Data compression in oracle. In Proceedings of the 29th VLDB Conference.
[38]
Rajimwale, A., Prabhakaran, V., and Davis J. D. 2009. Block management in solid-state devices. In Proceedings of the USENIX Annual Technical Conference.
[39]
Rizzo, L. 1997. A very fast algorithm for RAM compression. SIGOPS Oper. Syst. Rev. 31, 2, 36--45.
[40]
Rosenblum, M. and Ousterhout, J. K. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52.
[41]
Russel, P. 2002. The compressed loopback device. http://www.knoppix.net/wiki/Cloop.
[42]
Savage, S. 2006. CBD compressed block device, new embedded block device. http://lwn.net/Articles/168725.
[43]
Smith, M. E. G. and Storer, J. A. 1985. Parallel algorithms for data compression. J. ACM 32, 2, 344--373.
[44]
SPEC. 2008a. SPECsfs2008: SPEC’s benchmark designed to evaluate the speed and request-handling capabilities of file servers utilizing the NFSv3 and CIFS protocols. http://www.spec.org/sfs2008/.
[45]
SPEC. 2008b. SPECsfs2008_cifs published results, as of Nov-10-2009. http://www.spec.org/sfs2008/results/-sfs2008.html.
[46]
SPEC. 2009. SPECmail2009 published results, as of Nov-06-2009. http://www.spec.org/mail2009/results/-specmail_ent2009.html.
[47]
Svoboda, M. 2010. FuseCompress, a mountable Linux file system which transparently compress its content. http://miio.net/wordpress/projects/fusecompress/.
[48]
Thomas, C. and Wong, M. 2007. Database Test 2 (DBT-2), an OLTP transactional performance test. http://osdldbt.sourceforge.net/.
[49]
TPC. 1997. Overview of the TPC benchmark C: The order-entry benchmark. http://www.tpc.org/tpcc/default.asp.
[50]
TPC. 2009a. Top ten non-clustered TPC-H published results by performance. http://tpc.org/tpch/results/tpch_perf_results.asp?resulttype=noncluster.
[51]
TPC. 2009b. TPC-H: An ad-hoc, decision support benchmark. www.tpc.org/tpch.
[52]
Welch, T. A. 1984. A technique for high-performance data compression. IEEE Computer 17, 6, 8--19.
[53]
Wilson, P. R., Kaplan, S. F., and Smaragdakis, Y. 1999. The case for compressed caching in virtual memory systems. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, 101--116.
[54]
Woodhouse, D. 2001. JFFS: The Journalling Flash File System. http://www.csie.nctu.edu.tw/~ijsung/documents/jffs2.pdf.
[55]
Yang, L., Dick, R. P., Lekatsas, H., and Chakradhar, S. 2005. Crames: Compressed ram for embedded systems. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). ACM, New York, 93--98.
[56]
Zhu, B., Li, K., and Patterson, H. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). USENIX Association, Berkeley, CA, 1--14.
[57]
Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 337--343.

Cited By

View all
  • (2023)Elastic RAID: Implementing RAID over SSDs with Built-in Transparent CompressionProceedings of the 16th ACM International Conference on Systems and Storage10.1145/3579370.3594773(83-93)Online publication date: 5-Jun-2023
  • (2019)QZFSProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358822(163-176)Online publication date: 10-Jul-2019
  • (2019)A high-performance and endurable SSD cache for parity-based RAIDFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6523-913:1(16-34)Online publication date: 1-Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 8, Issue 2
May 2012
89 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/2180905
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2012
Accepted: 01 October 2011
Revised: 01 July 2011
Received: 01 March 2011
Published in TOS Volume 8, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Block-level compression
  2. SSD-based I/O cache

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Elastic RAID: Implementing RAID over SSDs with Built-in Transparent CompressionProceedings of the 16th ACM International Conference on Systems and Storage10.1145/3579370.3594773(83-93)Online publication date: 5-Jun-2023
  • (2019)QZFSProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358822(163-176)Online publication date: 10-Jul-2019
  • (2019)A high-performance and endurable SSD cache for parity-based RAIDFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6523-913:1(16-34)Online publication date: 1-Feb-2019
  • (2016)Improving RAID Performance Using an Endurable SSD Cache2016 45th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2016.52(396-405)Online publication date: Aug-2016
  • (2015)SelfieProceedings of the 8th ACM International Systems and Storage Conference10.1145/2757667.2757676(1-11)Online publication date: 26-May-2015
  • (2014)Applying Selectively Parallel I/O Compression to Parallel Storage SystemsEuro-Par 2014 Parallel Processing10.1007/978-3-319-09873-9_24(282-293)Online publication date: 2014

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media