ABSTRACT
Atomic multicast is a communication abstraction that allows for messages to be addressed to and reliably delivered by multiple process groups, while ensuring a partial order on delivered messages. Strong ordering guarantees can greatly simplify the design and implementation of distributed applications. One critical property for the performance and scalability of an atomic multicast protocol is that of genuineness: a protocol is said to be genuine if only the sender and destinations of a message are involved in ordering the message. This paper presents PrimCast, the first genuine atomic multicast protocol able to deliver messages at every destination in three communication steps. PrimCast uses a primary-based consensus protocol for deciding on message timestamps at each group. Differently from previous work, it does not rely on consensus for advancing and maintaining logical clocks. PrimCast introduces a novel approach, relying on simple quorum intersection, to decide when a multicast message can be delivered. We also show how loosely synchronized clocks can be used to reduce the convoy effect that delays messages under high system load. We present the complete algorithm for PrimCast and evaluate its performance under various scenarios. Our results show that PrimCast achieves lower latency than state-of-the-art approaches while providing higher or comparable throughput.
- Marcos K Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. 2001. Stable leader election. In Distributed Computing: 15th International Conference, DISC 2001 Lisbon, Portugal, October 3-5, 2001 Proceedings 15. Springer, 108--122.Google ScholarCross Ref
- Tarek Ahmed-Nacer, Pierre Sutra, and Denis Conan. 2016. The convoy effect in atomic multicast. In 2016 IEEE 35th Symposium on Reliable Distributed Systems Workshops (SRDSW). IEEE, 67--72.Google ScholarCross Ref
- Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobbler, Michael Wei, and John D Davis. 2012. Corfu: A shared log design for flash clusters. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 1--14.Google Scholar
- Samuel Benz, Parisa Jalili Marandi, Fernando Pedone, and Benoît Garbinato. 2014. Building Global and Scalable Systems with Atomic Multicast. In 15th ACM/IFIP/USENIX International Middleware Conference (Middleware).Google Scholar
- Samuel Benz and Fernando Pedone. 2017. Elastic Paxos: A Dynamic Atomic Multicast Protocol. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2157--2164.Google Scholar
- Carlos Eduardo Bezerra, Daniel Cason, and Fernando Pedone. 2015. Ridge: high-throughput, low-latency atomic multicast. In 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS). IEEE, 256--265.Google ScholarDigital Library
- Carlos Eduardo Bezerra, Fernando Pedone, and Robbert Van Renesse. 2014. Scalable state-machine replication. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 331--342.Google ScholarDigital Library
- Kenneth P Birman and Thomas A Joseph. 1987. Reliable communication in the presence of failures. ACM Transactions on Computer Systems (TOCS) 5, 1 (1987), 47--76.Google ScholarDigital Library
- Mike Blasgen, Jim Gray, Mike Mitoma, and Tom Price. 1979. The convoy phenomenon. ACM SIGOPS Operating Systems Review 13, 2 (1979), 20--25.Google ScholarDigital Library
- Paulo Coelho, Tarcisio Ceolin Junior, Alysson Bessani, Fernando Dotti, and Fernando Pedone. 2018. Byzantine fault-tolerant atomic multicast. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 39--50.Google ScholarCross Ref
- Paulo R Coelho, Nicolas Schiper, and Fernando Pedone. 2017. Fast atomic multicast. In Dependable Systems and Networks (DSN), 2017 47th Annual IEEE/IFIP International Conference on. IEEE, 37--48.Google ScholarCross Ref
- James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al. 2013. Spanner: Google's globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 8.Google ScholarDigital Library
- James Cowling and Barbara Liskov. 2012. Granola: Low-Overhead Distributed Transaction Coordination. In 2012 USENIX Annual Technical Conference (USENIX ATC 12). USENIX Association, Boston, MA, 223--235. https://www.usenix.org/conference/atc12/technical-sessions/presentation/cowlingGoogle Scholar
- Xavier Défago, André Schiper, and Péter Urbán. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys (CSUR) 36, 4 (2004), 372--421.Google ScholarDigital Library
- Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (1988), 288--323.Google ScholarDigital Library
- Vitor Enes, Carlos Baquero, Alexey Gotsman, and Pierre Sutra. 2021. Efficient Replication via Timestamp Stability. In Proceedings of the Sixteenth European Conference on Computer Systems (Online Event, United Kingdom) (EuroSys '21). ACM, New York, NY, USA, 178--193.Google ScholarDigital Library
- FastCast implementation [n. d.]. https://bitbucket.org/paulo_coelho/libmcast.Google Scholar
- Udo Fritzke and Philippe Ingels. 2001. Transactions on Partially Replicated Data based on Reliable and Atomic Multicasts. In Proceedings of the The 21st International Conference on Distributed Computing Systems. 284--291.Google ScholarCross Ref
- Udo Fritzke, Philippe Ingels, Achour Mostéfaoui, and Michel Raynal. 1998. Fault-tolerant total order multicast to asynchronous groups. In Reliable Distributed Systems, 1998. Proceedings. Seventeenth IEEE Symposium on. IEEE, 228--234.Google ScholarCross Ref
- Alexey Gotsman, Anatole Lefort, and Gregory Chockler. 2019. White-Box Atomic Multicast. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 176--187.Google Scholar
- Rachid Guerraoui and André Schiper. 1997. Genuine Atomic Multicast. In Proceedings of the 7th IEEE International Conference on Computer Communications and Networks. IEEE, 840--847.Google ScholarCross Ref
- Rachid Guerraoui and Andre Schiper. 1997. Total order multicast to multiple groups. In Proceedings of 17th International Conference on Distributed Computing Systems. IEEE, 578--585.Google ScholarCross Ref
- Rachid Guerraoui and André Schiper. 2001. Genuine atomic multicast in asynchronous distributed systems. Theoretical Computer Science 254, 1-2 (2001), 297--316.Google ScholarDigital Library
- Vassos Hadzilacos and Sam Toueg. 1994. A Modular Approach to Fault-Tolerant Broadcasts and Related Problems. Technical Report. Cornell University, Ithaca, NY, USA.Google Scholar
- Flavio P Junqueira, Benjamin C Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). IEEE, 245--256.Google ScholarDigital Library
- Sandeep S Kulkarni, Murat Demirbas, Deepak Madappa, Bharadwaj Avva, and Marcelo Leone. 2014. Logical physical clocks. In International Conference on Principles of Distributed Systems. Springer, 17--32.Google ScholarCross Ref
- Long Hoang Le, Enrique Fynn, Mojtaba Eslahi-Kelorazi, Robert Soulé, and Fernando Pedone. 2019. Dynastar: Optimized dynamic partitioning for scalable state machine replication. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 1453--1465.Google Scholar
- Jialin Li, Ellis Michael, Naveen Kr Sharma, Adriana Szekeres, and Dan RK Ports. 2016. Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering. In OSDI. 467--483.Google ScholarDigital Library
- Libevent library [n. d.]. https://libevent.org.Google Scholar
- Barbara Liskov and James Cowling. 2012. Viewstamped replication revisited. Technical Report. Technical Report MIT-CSAIL-TR-2012-021, MIT.Google Scholar
- Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2012. Multi-ring paxos. In Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on. IEEE, 1--12.Google ScholarDigital Library
- Leandro Pacheco. 2023. Scaling Strongly Consistent Replicated Systems. Ph. D. Dissertation. Università della Svizzera italiana. https://sonar.ch/usi/documents/325574Google Scholar
- Leandro Pacheco, Raluca Halalai, Valerio Schiavoni, Fernando Pedone, Etienne Riviere, and Pascal Felber. 2016. GlobalFS: A Strongly Consistent Multi-site File System. In Reliable Distributed Systems (SRDS), 2016 IEEE 35th Symposium on. IEEE, 147--156.Google ScholarCross Ref
- Fernando Pedone and André Schiper. 1999. Generic Broadcast. In Proceedings of the 13th International Symposium on Distributed Computing (DISC'99, formerly WDAG).Google Scholar
- PrimCast implementation [n. d.]. https://github.com/pacheco/primcast.Google Scholar
- Luis Rodrigues, Rachid Guerraoui, and André Schiper. 1998. Scalable atomic multicast. In International Conference on Computer Communications and Networks. 840--847.Google ScholarCross Ref
- Nicolas Schiper and Fernando Pedone. 2007. Optimal atomic broadcast and multicast algorithms for wide area networks. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing. ACM, 384--385.Google ScholarDigital Library
- Nicolas Schiper and Fernando Pedone. 2008. On the inherent cost of atomic broadcast and multicast in wide area networks. In International conference on Distributed computing and networking (ICDCN). 147--157.Google ScholarCross Ref
- Nicholas Schiper, Pierre Sutra, and Fernando Pedone. 2010. P-Store: Genuine Partial Replication in Wide Area Networks. In Symposium on Reliable Distributed Systems (SRDS).Google Scholar
- Amazon Time Sync Service. [n. d.]. https://aws.amazon.com/about-aws/whats-new/2017/11/introducing-the-amazon-time-sync-service/.Google Scholar
- Tokio asynchronous runtime [n. d.]. https://tokio.rs/.Google Scholar
- Robbert Van Renesse, Nicolas Schiper, and Fred B Schneider. 2014. Vive la différence: Paxos vs. viewstamped replication vs. zab. IEEE Transactions on Dependable and Secure Computing 12, 4 (2014), 472--484.Google ScholarDigital Library
- White-Box implementation [n. d.]. https://github.com/imdea-software/atomic-multicast.Google Scholar
Index Terms
- PrimCast: A Latency-Efficient Atomic Multicast
Recommendations
FlexCast: Genuine Overlay-based Atomic Multicast
Middleware '23: Proceedings of the 24th International Middleware ConferenceAtomic multicast is a communication abstraction where messages are propagated to groups of processes with reliability and order guarantees. Atomic multicast is at the core of strongly consistent storage and transactional systems. This paper presents ...
Broadcast Protocols for Distributed Systems
An innovative approach is presented to the design of fault-tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The approach is based on broadcast communication over a ...
Optimistic Atomic Multicast
ICDCS '13: Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing SystemsMessage ordering is one of the cornerstones of reliable distributed systems. However, some ordering guarantees, such as atomic order, are expensive to implement in terms of message delays. This paper presents Optimistic Atomic Multicast, a protocol that ...
Comments