ABSTRACT
Distributed systems are often designed to recover from downed nodes. Unfortunately, it is challenging to create recovery mechanisms that work in Byzantine networks, where the attacker controls some of the nodes and links. Often times an adversarial node can lie about an honest node being offline, and there is no way to verify this claim or detect the liar.
To resolve this challenge, we design rRPC, a robust remote procedure call library for distributed systems running in Byzantine networks. rRPC ensures that either the call succeeds or the online caller/callee can create a third-party verifiable proof that the other party is faulty. A distributed system can use these proofs to identify and remove a faulty node automatically. We implement a prototype of rRPC, and use 20 - 100 EC2 VMs to evaluate its performance as a standalone library and in the context of a distributed mix-net system. Our results quantitatively show that rRPC's overhead is low and induces about 1% increase in latency in the common case.
- S. Angel and S. Setty. Unobservable communication over fully untrusted infrastructure. In OSDI, pages 551--569, GA, 2016. USENIX Association.Google ScholarDigital Library
- D. J. Bernstein, T. Lange, and P. Schwabe. The security impact of a new cryptographic library. In International Conference on Cryptology and Information Security in Latin America, pages 159--176. Springer, 2012.Google ScholarDigital Library
- J. Brooks et al. Ricochet: Anonymous instant messaging for real privacy, 2016. https://ricochet.im.Google Scholar
- M. Castro, B. Liskov, et al. Practical Byzantine fault tolerance. In OSDI, pages 173--186, 1999.Google ScholarDigital Library
- B. Cohen. Incentives build robustness in BitTorrent. In Workshop on Economics of Peer-to-Peer systems, volume 6, pages 68--72, 2003.Google Scholar
- H. Corrigan-Gibbs, D. I. Wolinsky, and B. Ford. Proactively accountable anonymous messaging in verdict. In S. T. King, editor, USENIX Security Symposium, pages 147--162. USENIX Association, 2013.Google Scholar
- R. Dingledine, N. Mathewson, and P. Syverson. Tor: The second-generation onion router. In USENIX Security Symposium, pages 303--320. USENIX Association, August 2004.Google ScholarDigital Library
- Google. gRPC: A high-performance, open source universal RPC framework. https://grpc.io/, 2016.Google Scholar
- R. Guerraoui and A. Schiper. Software-based replication for fault tolerance. IEEE Computer, 30(4):68--74, Apr. 1997.Google ScholarDigital Library
- T. Gupta, N. Crooks, W. Mulhern, S. T. V. Setty, L. Alvisi, and M. Walfish. Scalable and private media consumption with Popcorn. In K. J. Argyraki and R. Isaacs, editors, NSDI, pages 91--107. USENIX Association, 2016.Google Scholar
- A. Haeberlen, P. Kouznetsov, and P. Druschel. Peerreview: practical accountability for distributed systems. In T. C. Bressoud and M. F. Kaashoek, editors, SOSP, pages 175--188. ACM, 2007.Google ScholarDigital Library
- B. Kemme and G. Alonso. Don't be lazy, be consistent: Postgres-R, A new way to implement database replication. In A. El Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, editors, VLDB, pages 134--143, 2000.Google Scholar
- R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. L. Wong. Zyzzyva: speculative byzantine fault tolerance. In T. C. Bressoud and M. F. Kaashoek, editors, SOSP, pages 45--58. ACM, 2007.Google ScholarDigital Library
- A. Kwon, H. Corrigan-Gibbs, S. Devadas, and B. Ford. Atom: Horizontally scaling strong anonymity. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, pages 406--422, New York, NY, USA, 2017. ACM.Google ScholarDigital Library
- A. Kwon, D. Lu, and S. Devadas. XRD: Scalable messaging system with cryptographic privacy. In NSDI. USENIX Association, 2020.Google Scholar
- M. Lacuyer, R. Spahn, K. Vodrahalli, R. Geambasu, and D. Hsu. Privacy accounting and quality control in the Sage differentially private ML platform. In T. Brecht and C. Williamson, editors, SOSP, pages 181--195. ACM, 2019.Google Scholar
- L. Lamport. Fast paxos. Distributed Computing, 19(2):79--103, 2006.Google ScholarDigital Library
- L. Lamport and P. M. Melliar-Smith. Byzantine clock synchronization. In PODC, pages 68--74, 1984.Google ScholarDigital Library
- A. Langley. Pond, 2016. https://github.com/agl/pond.Google Scholar
- B. Laurie, A. Langley, and E. Kasper. Certificate transparency. RFC 6962, RFC Editor, June 2013.Google Scholar
- D. Lazar, Y. Gilad, and N. Zeldovich. Karaoke: Fast and strong metadata privacy with low noise. In OSDI, Carlsbad, CA, 2018. USENIX Association.Google Scholar
- D. Lazar, Y. Gilad, and N. Zeldovich. Yodel: strong metadata security for voice calls. In T. Brecht and C. Williamson, editors, SOSP, pages 211--224. ACM, 2019.Google Scholar
- H. Leibowitz, A. M. Piotrowska, G. Danezis, and A. Herzberg. No right to remain silent: Isolating malicious mixes. In N. Heninger and P. Traynor, editors, USENIX Security Symposium, pages 1841--1858. USENIX Association, 2019.Google Scholar
- M. S. Melara, A. Blankstein, J. Bonneau, E. W. Felten, and M. J. Freedman. CONIKS: Bringing key transparency to end users. In USENIX Security Symposium, pages 383--398, Washington, D.C., 2015. USENIX Association.Google ScholarDigital Library
- T. P. Pedersen. Non-interactive and information-theoretic secure verifiable secret sharing. In J. Feigenbaum, editor, CRYPTO, volume 576 of LNCS, pages 129--140. Springer-Verlag, 1992, 11--15 Aug. 1991.Google Scholar
- A. M. Piotrowska, J. Hayes, T. Elahi, S. Meiser, and G. Danezis. The Loopix anonymity system. In USENIX Security Symposium, pages 1199--1216. USENIX Association, 2017.Google ScholarDigital Library
- E. Roth, D. Noble, B. H. Falk, and A. Haeberlen. Honeycrisp: large-scale differentially private aggregation without a trusted core. In T. Brecht and C. Williamson, editors, SOSP, pages 196--210. ACM, 2019.Google Scholar
- J. Terrance and M. J. Freedman. Object storage on CRAQ: High-throughput chain replication for read-mostly workloads. In ATC. USENIX, June 2009.Google Scholar
- A. Tomescu and S. Devadas. Catena: Efficient non-equivocation via Bitcoin. In 2017 IEEE Symposium on Security and Privacy (SP), pages 393--409, May 2017.Google ScholarCross Ref
- N. Tyagi, Y. Gilad, D. Leung, M. Zaharia, and N. Zeldovich. Stadium: A distributed metadata-private messaging system. In SOSP, SOSP '17, pages 423--440, New York, NY, USA, 2017. ACM.Google ScholarDigital Library
- S. B. Wicker and V. K. Bhargava. Reed-Solomon codes and their applications. John Wiley & Sons, 1999.Google ScholarCross Ref
- D. I. Wolinsky, H. Corrigan-Gibbs, B. Ford, and A. Johnson. Dissent in numbers: Making strong anonymity scale. In OSDI, pages 179--182, Hollywood, CA, 2012. USENIX Association.Google ScholarDigital Library
Index Terms
- Proving Server Faults: RPCs for Distributed Systems in Byzantine Networks
Recommendations
A self-stabilizing link-coloring protocol resilient to unbounded byzantine faults in arbitrary networks
OPODIS'05: Proceedings of the 9th international conference on Principles of Distributed SystemsSelf-stabilizing protocols can tolerate any type and any number of transient faults. However, in general, self-stabilizing protocols provide no guarantee about their behavior against permanent faults. This paper proposes a self-stabilizing link-coloring ...
A Note on Consensus on Dual Failure Modes
Meyer and Pradhan proposed the MS (for "mixed-sum") algorithm to solve the Byzantine Agreement (BA) problem with dual failure modes: arbitrary faults (Byzantine faults) and dormant faults (essentially omission faults and timing faults) [3]. Our study ...
Emulation of Transient Software Faults for Dependability Assessment: A Case Study
EDCC '10: Proceedings of the 2010 European Dependable Computing ConferenceFault Tolerance Mechanisms (FTMs) are extensively used in software systems to counteract software faults, in particular against faults that manifest transiently, namely Mandelbugs. In this scenario, Software Fault Injection (SFI) plays a key role for ...
Comments