skip to main content
10.1145/3590140.3629110acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

PrimCast: A Latency-Efficient Atomic Multicast

Published:27 November 2023Publication History

ABSTRACT

Atomic multicast is a communication abstraction that allows for messages to be addressed to and reliably delivered by multiple process groups, while ensuring a partial order on delivered messages. Strong ordering guarantees can greatly simplify the design and implementation of distributed applications. One critical property for the performance and scalability of an atomic multicast protocol is that of genuineness: a protocol is said to be genuine if only the sender and destinations of a message are involved in ordering the message. This paper presents PrimCast, the first genuine atomic multicast protocol able to deliver messages at every destination in three communication steps. PrimCast uses a primary-based consensus protocol for deciding on message timestamps at each group. Differently from previous work, it does not rely on consensus for advancing and maintaining logical clocks. PrimCast introduces a novel approach, relying on simple quorum intersection, to decide when a multicast message can be delivered. We also show how loosely synchronized clocks can be used to reduce the convoy effect that delays messages under high system load. We present the complete algorithm for PrimCast and evaluate its performance under various scenarios. Our results show that PrimCast achieves lower latency than state-of-the-art approaches while providing higher or comparable throughput.

References

  1. Marcos K Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. 2001. Stable leader election. In Distributed Computing: 15th International Conference, DISC 2001 Lisbon, Portugal, October 3-5, 2001 Proceedings 15. Springer, 108--122.Google ScholarGoogle ScholarCross RefCross Ref
  2. Tarek Ahmed-Nacer, Pierre Sutra, and Denis Conan. 2016. The convoy effect in atomic multicast. In 2016 IEEE 35th Symposium on Reliable Distributed Systems Workshops (SRDSW). IEEE, 67--72.Google ScholarGoogle ScholarCross RefCross Ref
  3. Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobbler, Michael Wei, and John D Davis. 2012. Corfu: A shared log design for flash clusters. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 1--14.Google ScholarGoogle Scholar
  4. Samuel Benz, Parisa Jalili Marandi, Fernando Pedone, and Benoît Garbinato. 2014. Building Global and Scalable Systems with Atomic Multicast. In 15th ACM/IFIP/USENIX International Middleware Conference (Middleware).Google ScholarGoogle Scholar
  5. Samuel Benz and Fernando Pedone. 2017. Elastic Paxos: A Dynamic Atomic Multicast Protocol. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2157--2164.Google ScholarGoogle Scholar
  6. Carlos Eduardo Bezerra, Daniel Cason, and Fernando Pedone. 2015. Ridge: high-throughput, low-latency atomic multicast. In 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS). IEEE, 256--265.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carlos Eduardo Bezerra, Fernando Pedone, and Robbert Van Renesse. 2014. Scalable state-machine replication. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 331--342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kenneth P Birman and Thomas A Joseph. 1987. Reliable communication in the presence of failures. ACM Transactions on Computer Systems (TOCS) 5, 1 (1987), 47--76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mike Blasgen, Jim Gray, Mike Mitoma, and Tom Price. 1979. The convoy phenomenon. ACM SIGOPS Operating Systems Review 13, 2 (1979), 20--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Paulo Coelho, Tarcisio Ceolin Junior, Alysson Bessani, Fernando Dotti, and Fernando Pedone. 2018. Byzantine fault-tolerant atomic multicast. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 39--50.Google ScholarGoogle ScholarCross RefCross Ref
  11. Paulo R Coelho, Nicolas Schiper, and Fernando Pedone. 2017. Fast atomic multicast. In Dependable Systems and Networks (DSN), 2017 47th Annual IEEE/IFIP International Conference on. IEEE, 37--48.Google ScholarGoogle ScholarCross RefCross Ref
  12. James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al. 2013. Spanner: Google's globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. James Cowling and Barbara Liskov. 2012. Granola: Low-Overhead Distributed Transaction Coordination. In 2012 USENIX Annual Technical Conference (USENIX ATC 12). USENIX Association, Boston, MA, 223--235. https://www.usenix.org/conference/atc12/technical-sessions/presentation/cowlingGoogle ScholarGoogle Scholar
  14. Xavier Défago, André Schiper, and Péter Urbán. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys (CSUR) 36, 4 (2004), 372--421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (1988), 288--323.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Vitor Enes, Carlos Baquero, Alexey Gotsman, and Pierre Sutra. 2021. Efficient Replication via Timestamp Stability. In Proceedings of the Sixteenth European Conference on Computer Systems (Online Event, United Kingdom) (EuroSys '21). ACM, New York, NY, USA, 178--193.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. FastCast implementation [n. d.]. https://bitbucket.org/paulo_coelho/libmcast.Google ScholarGoogle Scholar
  18. Udo Fritzke and Philippe Ingels. 2001. Transactions on Partially Replicated Data based on Reliable and Atomic Multicasts. In Proceedings of the The 21st International Conference on Distributed Computing Systems. 284--291.Google ScholarGoogle ScholarCross RefCross Ref
  19. Udo Fritzke, Philippe Ingels, Achour Mostéfaoui, and Michel Raynal. 1998. Fault-tolerant total order multicast to asynchronous groups. In Reliable Distributed Systems, 1998. Proceedings. Seventeenth IEEE Symposium on. IEEE, 228--234.Google ScholarGoogle ScholarCross RefCross Ref
  20. Alexey Gotsman, Anatole Lefort, and Gregory Chockler. 2019. White-Box Atomic Multicast. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 176--187.Google ScholarGoogle Scholar
  21. Rachid Guerraoui and André Schiper. 1997. Genuine Atomic Multicast. In Proceedings of the 7th IEEE International Conference on Computer Communications and Networks. IEEE, 840--847.Google ScholarGoogle ScholarCross RefCross Ref
  22. Rachid Guerraoui and Andre Schiper. 1997. Total order multicast to multiple groups. In Proceedings of 17th International Conference on Distributed Computing Systems. IEEE, 578--585.Google ScholarGoogle ScholarCross RefCross Ref
  23. Rachid Guerraoui and André Schiper. 2001. Genuine atomic multicast in asynchronous distributed systems. Theoretical Computer Science 254, 1-2 (2001), 297--316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Vassos Hadzilacos and Sam Toueg. 1994. A Modular Approach to Fault-Tolerant Broadcasts and Related Problems. Technical Report. Cornell University, Ithaca, NY, USA.Google ScholarGoogle Scholar
  25. Flavio P Junqueira, Benjamin C Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). IEEE, 245--256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sandeep S Kulkarni, Murat Demirbas, Deepak Madappa, Bharadwaj Avva, and Marcelo Leone. 2014. Logical physical clocks. In International Conference on Principles of Distributed Systems. Springer, 17--32.Google ScholarGoogle ScholarCross RefCross Ref
  27. Long Hoang Le, Enrique Fynn, Mojtaba Eslahi-Kelorazi, Robert Soulé, and Fernando Pedone. 2019. Dynastar: Optimized dynamic partitioning for scalable state machine replication. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 1453--1465.Google ScholarGoogle Scholar
  28. Jialin Li, Ellis Michael, Naveen Kr Sharma, Adriana Szekeres, and Dan RK Ports. 2016. Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering. In OSDI. 467--483.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Libevent library [n. d.]. https://libevent.org.Google ScholarGoogle Scholar
  30. Barbara Liskov and James Cowling. 2012. Viewstamped replication revisited. Technical Report. Technical Report MIT-CSAIL-TR-2012-021, MIT.Google ScholarGoogle Scholar
  31. Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2012. Multi-ring paxos. In Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on. IEEE, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Leandro Pacheco. 2023. Scaling Strongly Consistent Replicated Systems. Ph. D. Dissertation. Università della Svizzera italiana. https://sonar.ch/usi/documents/325574Google ScholarGoogle Scholar
  33. Leandro Pacheco, Raluca Halalai, Valerio Schiavoni, Fernando Pedone, Etienne Riviere, and Pascal Felber. 2016. GlobalFS: A Strongly Consistent Multi-site File System. In Reliable Distributed Systems (SRDS), 2016 IEEE 35th Symposium on. IEEE, 147--156.Google ScholarGoogle ScholarCross RefCross Ref
  34. Fernando Pedone and André Schiper. 1999. Generic Broadcast. In Proceedings of the 13th International Symposium on Distributed Computing (DISC'99, formerly WDAG).Google ScholarGoogle Scholar
  35. PrimCast implementation [n. d.]. https://github.com/pacheco/primcast.Google ScholarGoogle Scholar
  36. Luis Rodrigues, Rachid Guerraoui, and André Schiper. 1998. Scalable atomic multicast. In International Conference on Computer Communications and Networks. 840--847.Google ScholarGoogle ScholarCross RefCross Ref
  37. Nicolas Schiper and Fernando Pedone. 2007. Optimal atomic broadcast and multicast algorithms for wide area networks. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing. ACM, 384--385.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Nicolas Schiper and Fernando Pedone. 2008. On the inherent cost of atomic broadcast and multicast in wide area networks. In International conference on Distributed computing and networking (ICDCN). 147--157.Google ScholarGoogle ScholarCross RefCross Ref
  39. Nicholas Schiper, Pierre Sutra, and Fernando Pedone. 2010. P-Store: Genuine Partial Replication in Wide Area Networks. In Symposium on Reliable Distributed Systems (SRDS).Google ScholarGoogle Scholar
  40. Amazon Time Sync Service. [n. d.]. https://aws.amazon.com/about-aws/whats-new/2017/11/introducing-the-amazon-time-sync-service/.Google ScholarGoogle Scholar
  41. Tokio asynchronous runtime [n. d.]. https://tokio.rs/.Google ScholarGoogle Scholar
  42. Robbert Van Renesse, Nicolas Schiper, and Fred B Schneider. 2014. Vive la différence: Paxos vs. viewstamped replication vs. zab. IEEE Transactions on Dependable and Secure Computing 12, 4 (2014), 472--484.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. White-Box implementation [n. d.]. https://github.com/imdea-software/atomic-multicast.Google ScholarGoogle Scholar

Index Terms

  1. PrimCast: A Latency-Efficient Atomic Multicast

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          Middleware '23: Proceedings of the 24th International Middleware Conference
          November 2023
          334 pages
          ISBN:9798400701771
          DOI:10.1145/3590140

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate203of948submissions,21%
        • Article Metrics

          • Downloads (Last 12 months)43
          • Downloads (Last 6 weeks)8

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader