research-article

Consensus-Oriented Parallelization: How to Earn Your First Million

Authors:
Johannes Behl

TU Braunschweig

TU Braunschweig
View Profile

,
Tobias Distler

FAU Erlangen-Nürnberg

FAU Erlangen-Nürnberg
View Profile

,
Rüdiger Kapitza

TU Braunschweig

TU Braunschweig
View Profile

Middleware '15: Proceedings of the 16th Annual Middleware ConferenceNovember 2015Pages 173–184https://doi.org/10.1145/2814576.2814800

Published:24 November 2015Publication History

Middleware '15: Proceedings of the 16th Annual Middleware Conference

Pages 173–184

ABSTRACT

Consensus protocols employed in Byzantine fault-tolerant systems are notoriously compute intensive. Unfortunately, the traditional approach to execute instances of such protocols in a pipelined fashion is not well suited for modern multi-core processors and fundamentally restricts the overall performance of systems based on them. To solve this problem, we present the consensus-oriented parallelization (COP) scheme, which disentangles consecutive consensus instances and executes them in parallel by independent pipelines; or to put it in the terminology of our main target, today's processors: COP is the introduction of superscalarity to the field of consensus protocols. In doing so, COP achieves 2.4 million operations per second on commodity server hardware, a factor of 6 compared to a contemporary pipelined approach measured on the same code base and a factor of over 20 compared to the highest throughput numbers published for such systems so far. More important, however, is: COP provides up to 3 times as much throughput on a single core than its competitors and it can make use of additional cores where other approaches are confined by the slowest stage in their pipeline. This enables Byzantine fault tolerance for the emerging market of extremely demanding transactional systems and gives more room for conventional deployments to increase their quality of service.

References

http://martinfowler.com/articles/lmax.html.Google Scholar
http://www.businessinsider.com/amazons-cloud-can-handle-1-million-transactions-per-second-2012-4.Google Scholar
https://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale.Google Scholar
https://www.amingspork.com/blog/2014/06/03/1-million-sql-queries-per-second-mysql-5-7-on-power8.Google Scholar
http://blog.foundationdb.com/databases-at-14.4mhz.Google Scholar
M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie. Fault-scalable Byzantine fault-tolerant services. In Proc. of the 20th Symp. on Operating Systems Principles (SOSP '05), pages 59--74, 2005. Google ScholarDigital Library
Y. Amir, B. Coan, J. Kirsch, and J. Lane. Prime: Byzantine replication under attack. IEEE Trans. on Dependable and Secure Computing, 8(4):564--577, 2011. Google ScholarDigital Library
P.-L. Aublin, S. B.Mokhtar, and V. Quéma. RBFT: Redundant Byzantine fault tolerance. In Proc. of the 33rd Int'l Conf. on Distributed Computing Systems (ICDCS '13), pages 297--306, 2013. Google ScholarDigital Library
L. A. Barroso, J. Clidaras, and U. Hölzle. The data-center as a computer: An introduction to the design of warehouse-scale machines. Morgan & Claypool Publishers, 2013. Google ScholarDigital Library
S. Benz, P. J. Marandi, F. Pedone, and B. Garbinato. Building global and scalable systems with atomic multicast. In Proceedings of the 15th International Conference on Middleware (MW '14), pages 169--180, 2014. Google ScholarDigital Library
A. Bessani, J. Sousa, and E. Alchieri. State machine replication for the masses with BFT-SMaRt. In Proc. of the 2014 Int'l Conf. on Dependable Systems and Networks (DSN '14), pages 355--362, 2014. Google ScholarDigital Library
S. Borkar. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro, 25(6):10--16, 2005. Google ScholarDigital Library
M. Castro and B. Liskov. Practical Byzantine fault tolerance. In Proc. of the 3rd Symp. on Operating Systems Design and Impl. (OSDI '99), pages 173--186, 1999. Google ScholarDigital Library
A. Clement, M. Kapritsos, S. Lee, Y. Wang, L. Alvisi, M. Dahlin, and T. Riche. UpRight cluster services. In Proc. of the 22nd Symp. on Operating Systems Principles (SOSP '09), pages 277--290, 2009. Google ScholarDigital Library
T. David, R. Guerraoui, and V. Trigonakis. Everything you always wanted to know about synchronization but were afraid to ask. In Proc. of the 24th Symp. on Operating Systems Principles (SOSP '13), pages 33--48, 2013. Google ScholarDigital Library
T. Distler and R. Kapitza. Increasing performance in Byzantine fault-tolerant systems with on-demand replica consistency. In Proc. of the 6th Europ. Conf. on Computer Systems (EuroSys '11), pages 91--105, 2011. Google ScholarDigital Library
R. Guerraoui, N. Knežević, V. Quéma, and M. Vukolić. The next 700 BFT protocols. In Proc. of the 5th European Conf. on Computer Systems (EuroSys '10), 2010. Google ScholarDigital Library
Z. Guo, C. Hong, M. Yang, D. Zhou, L. Zhou, and L. Zhuang. Rex: Replication at the speed of multicore. In Proc. of the 9th European Conf. on Computer Systems (EuroSys '14), 2014. Google ScholarDigital Library
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proc. of the 2010 USENIX Annual Technical Conf. (ATC '10), pages 145--158, 2010. Google ScholarDigital Library
R. Kapitza, J. Behl, C. Cachin, T. Distler, S. Kuhnle, S. V. Mohammadi, W. Schröder-Preikschat, and K. Stengel. CheapBFT: Resource-efficient Byzantine fault tolerance. In Proc. of the 7th European Conf. on Computer Systems (EuroSys '12), pages 295--308, 2012. Google ScholarDigital Library
M. Kapritsos and F. P. Junqueira. Scalable agreement: Toward ordering as a service. In Proc. of the 6th Workshop on Hot Topics in System Dependability (HotDep '10), 2010. Google ScholarDigital Library
M. Kapritsos, Y. Wang, V. Quéma, A. Clement, L. Alvisi, and M. Dahlin. All about Eve: Execute-verify replication for multi-core servers. In Proc. of the 10th Symp. on Operating Systems Design and Implementation (OSDI '12), pages 237--250, 2012. Google ScholarDigital Library
R. Kotla and M. Dahlin. High throughput Byzantine fault tolerance. In Proc. of the 2004 Int'l Conf. on Dependable Systems and Networks (DSN '04), pages 575-- 584, 2004. Google ScholarDigital Library
L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382--401, 1982. Google ScholarDigital Library
P. J. Marandi, C. E. Bezerra, and F. Pedone. Rethinking state-machine replication for parallelism. In Proc. of the 34th Int'l Conf. on Distributed Computing Systems (ICDCS '14), pages 368--377, 2014. Google ScholarDigital Library
N. Santos and A. Schiper. Achieving high-throughput state machine replication in multi-core systems. In Proc. of the 33rd Int'l Conf. on Distributed Computing Systems (ICDCS '13), pages 266--275, 2013. Google ScholarDigital Library
D. C. Schmidt and T. Suda. Transport system architecture services for high-performance communications systems. IEEE Journal on Selected Areas in Communications, 11(4):489--506, 1993. Google ScholarDigital Library
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22:299--319, 1990. Google ScholarDigital Library
G. S. Veronese, M. Correia, A. Bessani, and L. C. Lung. Spin one's wheels? Byzantine fault tolerance with a spinning primary. In Proc. of the 28th IEEE Int'l Symp. on Reliable Distributed Systems (SRDS '09), 2009. Google ScholarDigital Library
G. S. Veronese, M. Correia, A. N. Bessani, L. C. Lung, and P. Veríssimo. Efficient Byzantine fault-tolerance. IEEE Transactions on Computers, 62(1):16--30, 2013. Google ScholarDigital Library
M. Welsh, D. Culler, and E. Brewer. SEDA: An architecture for well-conditioned, scalable Internet services. In Proc. of the 18th Symp. on Operating Systems Principles (SOSP '01), pages 230--243, 2001. Google ScholarDigital Library
J. Yin, J.-P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin. Separating agreement from execution for Byzantine fault tolerant services. In Proc. of the 19th Symp. on Operating Systems Principles (SOSP '03), pages 253--267, 2003. Google ScholarDigital Library

Index Terms

Consensus-Oriented Parallelization: How to Earn Your First Million

Recommendations

Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA
GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation

This paper presents a straightforward implementation of a standard evolutionary algorithm that evaluates its population in parallel on a GPGPU card.

Tests done on a benchmark and a real world problem using an old NVidia 8800GTX card and a newer but not ...
Read More
Performance Gaps between OpenMP and OpenCL for Multi-core CPUs
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops

OpenCL and OpenMP are the most commonly used programming models for multi-core processors. They are also fundamentally different in their approach to parallelization. In this paper, we focus on comparing the performance of OpenCL and OpenMP. We select ...
Read More
Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Hybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Middleware '15: Proceedings of the 16th Annual Middleware Conference
November 2015
295 pages
ISBN:9781450336185
DOI:10.1145/2814576
General Chairs:
Rodger Lea
The University of British Columbia, Canada
,
Sathish Gopalakrishnan
The University of British Columbia, Canada
,
Program Chairs:
Eli Tilevich
Virginia Tech, USA
,
Amy L. Murphy
Bruno Kessler Foundation, Italy
,
Publications Chair:
Michael Blackstock
The University of British Columbia, Canada
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
BFT
Multi-Core
Scalability
State-Machine Replication
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Middleware '15 Paper Acceptance Rate23of118submissions,19%Overall Acceptance Rate203of948submissions,21%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 234
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Consensus-Oriented Parallelization: How to Earn Your First Million

Middleware '15: Proceedings of the 16th Annual Middleware Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond