ABSTRACT
This paper presents an erasure-coded Byzantine fault-tolerant block storage protocol that is nearly as efficient as protocols that tolerate only crashes. Previous Byzantine fault-tolerant block storage protocols have either relied upon replication, which is inefficient for large blocks of data when tolerating multiple faults, or a combination of additional servers, extra computation, and versioned storage. To avoid these expensive techniques, our protocol employs novel mechanisms to optimize for the common case when faults and concurrency are rare. In the common case, a write operation completes in two rounds of communication and a read completes in one round. The protocol requires a short checksum comprised of cryptographic hashes and homomorphic fingerprints. It achieves throughput within 10% of the crash-tolerant protocol for writes and reads in failure-free runs when configured to tolerate up to 6 faulty servers and any number of faulty clients.
Supplemental Material
Available for Download
Slides from the presentation
Supplemental material for Low-overhead byzantine fault-tolerant storage
- M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie. Fault-scalable Byzantine fault-tolerant services. In Proceedings of the 20th ACM Symposium on Operating Systems Principles, pages 59--74. ACM Press, 2005. Google ScholarDigital Library
- M. Abd-El-Malek, G. R. Ganger, M. K. Reiter, J. J. Wylie, and G. R. Goodson. Lazy verification in fault-tolerant distributed storage systems. In Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems, pages 179--190. IEEE Computer Society, 2005. Google ScholarDigital Library
- M. Abd-El-Malek, I. William V. Courtright, C. Cranor, G. R. Ganger, J. Hendricks, A. J. Klosterman, M. Mesnier, M. Prasad, B. Salmon, R. R. Sambasivan, S. Sinnamohideen, J. D. Strunk, E. Thereska, M. Wachs, and J. J. Wylie. Ursa Minor: Versatile cluster-based storage. In Proceedings of the 4th USENIX Conference on File and Storage Technologies, pages 59--72. USENIX Association, 2005. Google ScholarDigital Library
- T. E. Anderson, M. D. Dahlin, J. M. Neefe, D. A. Patterson, D. S. Roselli, and R. Y. Wang. Serverless network file systems. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, pages 109--126. ACM Press, 1995. Google ScholarDigital Library
- R. Bazzi and Y. Ding. Non-skipping timestamps for Byzantine data storage systems. In Proceedings of the 18th International Symposium on Distributed Computing, pages 405--419. Springer-Verlag, 2004.Google ScholarCross Ref
- M. Bellare and P. Rogaway. Random oracles are practical: A paradigm for designing efficient protocols. In Proceedings of the 1st ACM Conference on Computer and Communications Security, pages 62--73. ACM Press, 1993. Google ScholarDigital Library
- J. Bonwick, M. Ahrens, V. Henson, M. Maybee, and M. Shellenbaum. The Zettabyte File System. Technical report, Sun Microsystems.Google Scholar
- C. Cachin and S. Tessaro. Asynchronous verifiable information dispersal. In Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems, pages 191--202. IEEE Press, 2005. Google ScholarDigital Library
- C. Cachin and S. Tessaro. Optimal resilience for erasure-coded Byzantine distributed storage. In Proceedings of the International Conference on Dependable Systems and Networks, pages 115--124. IEEE Computer Society, 2006. Google ScholarDigital Library
- M. Castro and B. Liskov. Authenticated Byzantine fault tolerance without public-key cryptography. Technical Memo MIT-LCS-TM-589, MIT, June 1999. Google ScholarDigital Library
- P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. RAID: High-performance, reliable secondary storage. ACM Computing Surveys, 26(2):145--185, 1994. Google ScholarDigital Library
- P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pages 1--14. USENIX Association, 2004. Google ScholarDigital Library
- J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, pages 177--190. USENIX Association, 2006. Google ScholarDigital Library
- C. De Canniere and C. Rechberger. Finding SHA-1 characteristics: General results and applications. In Advances in Cryptology -- ASIACRYPT, pages 1--20. Springer-Verlag, 2006. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 29--43. ACM Press, 2003. Google ScholarDigital Library
- B. Gladman. SHA1, SHA2, HMAC and key derivation in C. http://fp.gladman.plus.com/cryptography_technology/sha.Google Scholar
- L. Gong. Securely replicating authentication services. In Proceedings of the 9th International Conference on Distributed Computing Systems, pages 85--91. IEEE Computer Society, 1989.Google ScholarCross Ref
- G. R. Goodson, J. J. Wylie, G. R. Ganger, and M. K. Reiter. Efficient Byzantine-tolerant erasure-coded storage. In Proceedings of the International Conference on Dependable Systems and Networks, pages 135--144. IEEE Computer Society, 2004. Google ScholarDigital Library
- J. H. Hartman and J. K. Ousterhout. The Zebra striped network file system. ACM Transactions on Computer Systems, 13(3):274--310, 1995. Google ScholarDigital Library
- J. Hendricks, G. R. Ganger, and M. K. Reiter. Verifying distributed erasure-coded data. In Proceedings of the 26th ACM Symposium on Principles of Distributed Computing, pages 163--168. ACM Press, 2007. Google ScholarDigital Library
- M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, 1991. Google ScholarDigital Library
- M. Herlihy, V. Luchangco, and M. Moir. Obstruction-free synchronization: Double-ended queues as an example. In Proceedings of the 23rd International Conference on Distributed Computing Systems, pages 522--529. IEEE Computer Society, 2003. Google ScholarDigital Library
- M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463--492, 1990. Google ScholarDigital Library
- R. Kotla and M. Dahlin. High throughput byzantine fault tolerance. In Proceedings of the International Conference on Dependable Systems and Networks, pages 575--584. IEEE Computer Society, 2004. Google ScholarDigital Library
- L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382--401, 1982. Google ScholarDigital Library
- B. Liskov and R. Rodrigues. Tolerating Byzantine faulty clients in a quorum system. In Proceedings of the 26th International Conference on Distributed Computing Systems, pages 34--43. IEEE Computer Society, 2006. Google ScholarDigital Library
- J.-P. Martin, L. Alvisi, and M. Dahlin. Small Byzantine quorum systems. In Proceedings of the International Conference on Dependable Systems and Networks, pages 374--388. IEEE Computer Society, 2002. Google ScholarDigital Library
- N. Moller. Nettle Manual, 1.15 edition, 2006.Google Scholar
- D. Nagle, D. Serenyi, and A. Matthews. The Panasas ActiveScale storage cluster: Delivering scalable high bandwidth storage. In Proceedings of the ACM/IEEE SC2004 Conference, page 53. IEEE Computer Society, 2004. Google ScholarDigital Library
- R. Primmer and C. D. Halluin. Collision and preimage resistance of the Centera content address. Technical report, EMC Corporation, 2005.Google Scholar
- M. O. Rabin. Efficient dispersal of information for security, load balancing, and fault tolerance. Journal of the ACM, 36(2):335--348, 1989. Google ScholarDigital Library
- I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. SIAM Journal of Applied Mathematics, 8:300--304, 1960.Google ScholarCross Ref
- R. Rodrigues, P. Kouznetsov, and B. Bhattacharjee. Large-scale Byzantine fault tolerance: Safe but not always live. In Proceedings of the 3rd Workshop on Hot Topics in System Dependability. USENIX Association, 2007. Google ScholarDigital Library
- Y. Saito, S. Frolund, A. Veitch, A. Merchant, and S. Spence. FAB: Building distributed enterprise disk arrays from commodity components. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 48--58. ACM Press, 2004. Google ScholarDigital Library
- C. A. N. Soules, G. R. Goodson, J. D. Strunk, and G. R. Ganger. Metadata efficiency in versioning file systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 43--58. USENIX Association, 2003. Google ScholarDigital Library
- Sun Microsystems. ZFS On-Disk Specification Draft, 2006.Google Scholar
- E. Thereska, M. Abd-El-Malek, J. J. Wylie, D. Narayanan, and G. R. Ganger. Informed data distribution selection in a self-predicting storage system. In Proceedings of the 3rd International Conference on Autonomic Computing, pages 187--198. IEEE Computer Society, 2006. Google ScholarDigital Library
- H. Weatherspoon and J. D. Kubiatowicz. Erasure coding vs. replication: a quantitative approach. In International Workshop on Peer-to-Peer Systems, pages 328--337. Springer-Verlag, 2002. Google ScholarDigital Library
- J. Wilkes, R. Golding, C. Staelin, and T. Sullivan. The HP AutoRAID hierarchical storage system. ACM Transactions on Computer Systems, 14(1):108--136, 1996. Google ScholarDigital Library
- Z. Zhang, S. Lin, Q. Lian, and C. Jin. RepStore: A self-managing and self-tuning storage backend with smart bricks. In Proceedings of the 1st International Conference on Autonomic Computing, pages 122--129. IEEE Computer Society, 2004. Google ScholarDigital Library
Index Terms
- Low-overhead byzantine fault-tolerant storage
Recommendations
Low-overhead byzantine fault-tolerant storage
SOSP '07This paper presents an erasure-coded Byzantine fault-tolerant block storage protocol that is nearly as efficient as protocols that tolerate only crashes. Previous Byzantine fault-tolerant block storage protocols have either relied upon replication, ...
A Byzantine Fault-Tolerant Mutual Exclusion Algorithm and Its Application to Byzantine Fault-Tolerant Storage Systems
ICDCSW '05: Proceedings of the Fourth International Workshop on Assurance in Distributed Systems and Networks (ADSN) (ICDCSW'05) - Volume 01This paper presents a new distributed mutual exclusion protocol that can tolerate Byzantine faults. We use the protocol to create Byzantine fault-tolerant storage systems. We show a necessary and sufficient condition to achieve distributed Byzantine ...
Efficient Byzantine-Tolerant Erasure-Coded Storage
DSN '04: Proceedings of the 2004 International Conference on Dependable Systems and NetworksThis paper describes a decentralized consistency protocolfor survivable storage that exploits local data versioningwithin each storage-node. Such versioning enables the protocolto efficiently provide linearizability and wait-freedomof read and write ...
Comments