Abstract
In this paper we study the problem of storing reliably an archive of versioned data. Specifically, we focus on systems where the differences (deltas) between subsequent versions rather than the whole objects are stored—a typical model for storing versioned data. For reliability, we propose erasure encoding techniques that exploit the sparsity of information in the deltas while storing them reliably in a distributed back-end storage system, resulting in improved I/O read performance to retrieve the whole versioned archive. Along with the basic techniques, we propose a few optimization heuristics, and evaluate the techniques’ efficacy analytically and with numerical simulations.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Esmaili KS, Chiniah A, Datta A (2013) Efficient updates in cross-object erasure-coded storage systems. In: IEEE international conference on big data
Ford D, Labelle F, Popovici FI, Stokely M, Truong V-A, Barroso L, Grimes C, Quinlan S (2010) Availability in globally distributed storage systems. In: The 9th USENIX conference on operating systems designand implementation (OSDI)
Han S, Pai H-T, Zheng R, Varshney PK (2013) Update-efficient regenerating codes with minimum per-node storage. In: Proceedings of the Int. Symp. Inf. Theory
Harshan J, Oggier F, Datta A (2015) Sparsity exploiting erasure coding for resilient storage and efficient i/o access in delta based versioning systems. In: ICDCS 2015
Lacan J, Fimes J (2003) A construction of matrices with no singular square submatrices. In: International conference on finite fields and applications
Mazumdar A, Wornell GW, Chandar V (2012) Update efficient codes for error correction. In: Proceedings of the Int. Symp. Inf. Theory
Oggier F, Datta A (2013) Coding techniques for repairability in networked distributed storage systems. In: Foundations and Trends in Communications and Information Theory. Now Publishers, Breda
Rawat A, Vishwanath S, Bhowmick A, Soljanin E (2011) Update efficient codes for distributed storage. In: Proceedings of the Int. Symp. Inf. Theory
Rouayheb S, Goparaju S, Kiah H, Milenkovic O (2015) Synchronising edits in distributed storage networks. In: Proceedings of the Int. Symp. Inf. Theory
SVN. http://subversion.apache.org/. Accessed 15 Dec 2015
Thusoo A, Shao Z, Anthony S, Borthakur D, Jain N, Sarma JS, Murthy R, Liu H (2010) Data warehousing and analytics infrastructure at facebook. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, ser. SIGMOD 10
Tarasov V, Mudrankit A, Buik W, Shilane P, Kuenning G, Zadok E (2012) Generating realistic datasets for deduplication analysis. In Proceedings of the 2012 USENIX conference on Annual Technical Conference
Wang Z, Cadambe V (2014) Multi-version coding for distributed storage. In Proceedings of the Int. Symp. Inf. Theory
Zhang F, Pfister HD (2008) Compressed sensing and linear codes over real numbers. In: Information theory and applications workshop (ITA)
Acknowledgments
This work is supported by the MoE Tier-2 grant MOE2013-T2-1-068 “eCode: erasure codes for data center environments”.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Harshan, J., Oggier, F. & Datta, A. Sparsity exploiting erasure coding for distributed storage of versioned data. Computing 98, 1305–1329 (2016). https://doi.org/10.1007/s00607-016-0485-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-016-0485-x