Article

Low-overhead byzantine fault-tolerant storage

Authors:
James Hendricks

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Gregory R. Ganger

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Michael K. Reiter

University of North Carolina at Chapel Hill, Chapel Hill, NC

University of North Carolina at Chapel Hill, Chapel Hill, NC
View Profile

SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principlesOctober 2007Pages 73–86https://doi.org/10.1145/1294261.1294269

Published:14 October 2007Publication History

SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles

Pages 73–86

ABSTRACT

This paper presents an erasure-coded Byzantine fault-tolerant block storage protocol that is nearly as efficient as protocols that tolerate only crashes. Previous Byzantine fault-tolerant block storage protocols have either relied upon replication, which is inefficient for large blocks of data when tolerating multiple faults, or a combination of additional servers, extra computation, and versioned storage. To avoid these expensive techniques, our protocol employs novel mechanisms to optimize for the common case when faults and concurrency are rare. In the common case, a write operation completes in two rounds of communication and a read completes in one round. The protocol requires a short checksum comprised of cryptographic hashes and homomorphic fingerprints. It achieves throughput within 10% of the crash-tolerant protocol for writes and reads in failure-free runs when configured to tolerate up to 6 faulty servers and any number of faulty clients.

Supplemental Material

1294269.mp4

mp4

172.9 MB

Download

Available for Download

other

Slides from the presentation

zip

p73-slides.zip (36.3 MB)

Supplemental material for Low-overhead byzantine fault-tolerant storage

mp3

1294269.mp3 (12.5 MB)

References

M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie. Fault-scalable Byzantine fault-tolerant services. In Proceedings of the 20th ACM Symposium on Operating Systems Principles, pages 59--74. ACM Press, 2005. Google ScholarDigital Library
M. Abd-El-Malek, G. R. Ganger, M. K. Reiter, J. J. Wylie, and G. R. Goodson. Lazy verification in fault-tolerant distributed storage systems. In Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems, pages 179--190. IEEE Computer Society, 2005. Google ScholarDigital Library
M. Abd-El-Malek, I. William V. Courtright, C. Cranor, G. R. Ganger, J. Hendricks, A. J. Klosterman, M. Mesnier, M. Prasad, B. Salmon, R. R. Sambasivan, S. Sinnamohideen, J. D. Strunk, E. Thereska, M. Wachs, and J. J. Wylie. Ursa Minor: Versatile cluster-based storage. In Proceedings of the 4th USENIX Conference on File and Storage Technologies, pages 59--72. USENIX Association, 2005. Google ScholarDigital Library
T. E. Anderson, M. D. Dahlin, J. M. Neefe, D. A. Patterson, D. S. Roselli, and R. Y. Wang. Serverless network file systems. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, pages 109--126. ACM Press, 1995. Google ScholarDigital Library
R. Bazzi and Y. Ding. Non-skipping timestamps for Byzantine data storage systems. In Proceedings of the 18th International Symposium on Distributed Computing, pages 405--419. Springer-Verlag, 2004.Google ScholarCross Ref
M. Bellare and P. Rogaway. Random oracles are practical: A paradigm for designing efficient protocols. In Proceedings of the 1st ACM Conference on Computer and Communications Security, pages 62--73. ACM Press, 1993. Google ScholarDigital Library
J. Bonwick, M. Ahrens, V. Henson, M. Maybee, and M. Shellenbaum. The Zettabyte File System. Technical report, Sun Microsystems.Google Scholar
C. Cachin and S. Tessaro. Asynchronous verifiable information dispersal. In Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems, pages 191--202. IEEE Press, 2005. Google ScholarDigital Library
C. Cachin and S. Tessaro. Optimal resilience for erasure-coded Byzantine distributed storage. In Proceedings of the International Conference on Dependable Systems and Networks, pages 115--124. IEEE Computer Society, 2006. Google ScholarDigital Library
M. Castro and B. Liskov. Authenticated Byzantine fault tolerance without public-key cryptography. Technical Memo MIT-LCS-TM-589, MIT, June 1999. Google ScholarDigital Library
P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. RAID: High-performance, reliable secondary storage. ACM Computing Surveys, 26(2):145--185, 1994. Google ScholarDigital Library
P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pages 1--14. USENIX Association, 2004. Google ScholarDigital Library
J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, pages 177--190. USENIX Association, 2006. Google ScholarDigital Library
C. De Canniere and C. Rechberger. Finding SHA-1 characteristics: General results and applications. In Advances in Cryptology -- ASIACRYPT, pages 1--20. Springer-Verlag, 2006. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 29--43. ACM Press, 2003. Google ScholarDigital Library
B. Gladman. SHA1, SHA2, HMAC and key derivation in C. http://fp.gladman.plus.com/cryptography_technology/sha.Google Scholar
L. Gong. Securely replicating authentication services. In Proceedings of the 9th International Conference on Distributed Computing Systems, pages 85--91. IEEE Computer Society, 1989.Google ScholarCross Ref
G. R. Goodson, J. J. Wylie, G. R. Ganger, and M. K. Reiter. Efficient Byzantine-tolerant erasure-coded storage. In Proceedings of the International Conference on Dependable Systems and Networks, pages 135--144. IEEE Computer Society, 2004. Google ScholarDigital Library
J. H. Hartman and J. K. Ousterhout. The Zebra striped network file system. ACM Transactions on Computer Systems, 13(3):274--310, 1995. Google ScholarDigital Library
J. Hendricks, G. R. Ganger, and M. K. Reiter. Verifying distributed erasure-coded data. In Proceedings of the 26th ACM Symposium on Principles of Distributed Computing, pages 163--168. ACM Press, 2007. Google ScholarDigital Library
M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, 1991. Google ScholarDigital Library
M. Herlihy, V. Luchangco, and M. Moir. Obstruction-free synchronization: Double-ended queues as an example. In Proceedings of the 23rd International Conference on Distributed Computing Systems, pages 522--529. IEEE Computer Society, 2003. Google ScholarDigital Library
M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463--492, 1990. Google ScholarDigital Library
R. Kotla and M. Dahlin. High throughput byzantine fault tolerance. In Proceedings of the International Conference on Dependable Systems and Networks, pages 575--584. IEEE Computer Society, 2004. Google ScholarDigital Library
L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382--401, 1982. Google ScholarDigital Library
B. Liskov and R. Rodrigues. Tolerating Byzantine faulty clients in a quorum system. In Proceedings of the 26th International Conference on Distributed Computing Systems, pages 34--43. IEEE Computer Society, 2006. Google ScholarDigital Library
J.-P. Martin, L. Alvisi, and M. Dahlin. Small Byzantine quorum systems. In Proceedings of the International Conference on Dependable Systems and Networks, pages 374--388. IEEE Computer Society, 2002. Google ScholarDigital Library
N. Moller. Nettle Manual, 1.15 edition, 2006.Google Scholar
D. Nagle, D. Serenyi, and A. Matthews. The Panasas ActiveScale storage cluster: Delivering scalable high bandwidth storage. In Proceedings of the ACM/IEEE SC2004 Conference, page 53. IEEE Computer Society, 2004. Google ScholarDigital Library
R. Primmer and C. D. Halluin. Collision and preimage resistance of the Centera content address. Technical report, EMC Corporation, 2005.Google Scholar
M. O. Rabin. Efficient dispersal of information for security, load balancing, and fault tolerance. Journal of the ACM, 36(2):335--348, 1989. Google ScholarDigital Library
I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. SIAM Journal of Applied Mathematics, 8:300--304, 1960.Google ScholarCross Ref
R. Rodrigues, P. Kouznetsov, and B. Bhattacharjee. Large-scale Byzantine fault tolerance: Safe but not always live. In Proceedings of the 3rd Workshop on Hot Topics in System Dependability. USENIX Association, 2007. Google ScholarDigital Library
Y. Saito, S. Frolund, A. Veitch, A. Merchant, and S. Spence. FAB: Building distributed enterprise disk arrays from commodity components. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 48--58. ACM Press, 2004. Google ScholarDigital Library
C. A. N. Soules, G. R. Goodson, J. D. Strunk, and G. R. Ganger. Metadata efficiency in versioning file systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 43--58. USENIX Association, 2003. Google ScholarDigital Library
Sun Microsystems. ZFS On-Disk Specification Draft, 2006.Google Scholar
E. Thereska, M. Abd-El-Malek, J. J. Wylie, D. Narayanan, and G. R. Ganger. Informed data distribution selection in a self-predicting storage system. In Proceedings of the 3rd International Conference on Autonomic Computing, pages 187--198. IEEE Computer Society, 2006. Google ScholarDigital Library
H. Weatherspoon and J. D. Kubiatowicz. Erasure coding vs. replication: a quantitative approach. In International Workshop on Peer-to-Peer Systems, pages 328--337. Springer-Verlag, 2002. Google ScholarDigital Library
J. Wilkes, R. Golding, C. Staelin, and T. Sullivan. The HP AutoRAID hierarchical storage system. ACM Transactions on Computer Systems, 14(1):108--136, 1996. Google ScholarDigital Library
Z. Zhang, S. Lin, Q. Lian, and C. Jin. RepStore: A self-managing and self-tuning storage backend with smart bricks. In Proceedings of the 1st International Conference on Autonomic Computing, pages 122--129. IEEE Computer Society, 2004. Google ScholarDigital Library

Index Terms

Low-overhead byzantine fault-tolerant storage
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software fault tolerance
    2. Software system structures
      1. Distributed systems organizing principles

Recommendations

Low-overhead byzantine fault-tolerant storage
SOSP '07

This paper presents an erasure-coded Byzantine fault-tolerant block storage protocol that is nearly as efficient as protocols that tolerate only crashes. Previous Byzantine fault-tolerant block storage protocols have either relied upon replication, ...
Read More
A Byzantine Fault-Tolerant Mutual Exclusion Algorithm and Its Application to Byzantine Fault-Tolerant Storage Systems
ICDCSW '05: Proceedings of the Fourth International Workshop on Assurance in Distributed Systems and Networks (ADSN) (ICDCSW'05) - Volume 01

This paper presents a new distributed mutual exclusion protocol that can tolerate Byzantine faults. We use the protocol to create Byzantine fault-tolerant storage systems. We show a necessary and sufficient condition to achieve distributed Byzantine ...
Read More
Efficient Byzantine-Tolerant Erasure-Coded Storage
DSN '04: Proceedings of the 2004 International Conference on Dependable Systems and Networks

This paper describes a decentralized consistency protocolfor survivable storage that exploits local data versioningwithin each storage-node. Such versioning enables the protocolto efficiently provide linearizability and wait-freedomof read and write ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
October 2007
378 pages
ISBN:9781595935915
DOI:10.1145/1294261
General Chair:
Thomas C. Bressoud
Denison University, USA
,
Program Chair:
M. Frans Kaashoek
Massachusetts Institute of Technology, USA
ACM SIGOPS Operating Systems Review Volume 41, Issue 6
SOSP '07
December 2007
363 pages
ISSN:0163-5980
DOI:10.1145/1323293
Issue’s Table of Contents
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
byzantine fault-tolerant storage
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate131of716submissions,18%
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 69
  Total Citations
  View Citations
- 1,099
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Low-overhead byzantine fault-tolerant storage

SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Low-overhead byzantine fault-tolerant storage

A Byzantine Fault-Tolerant Mutual Exclusion Algorithm and Its Application to Byzantine Fault-Tolerant Storage Systems

Efficient Byzantine-Tolerant Erasure-Coded Storage