skip to main content
research-article

RT-ZooKeeper: Taming the Recovery Latency of a Coordination Service

Published: 22 September 2021 Publication History

Abstract

Fault-tolerant coordination services have been widely used in distributed applications in cloud environments. Recent years have witnessed the emergence of time-sensitive applications deployed in edge computing environments, which introduces both challenges and opportunities for coordination services. On one hand, coordination services must recover from failures in a timely manner. On the other hand, edge computing employs local networked platforms that can be exploited to achieve timely recovery. In this work, we first identify the limitations of the leader election and recovery protocols underlying Apache ZooKeeper, the prevailing open-source coordination service. To reduce recovery latency from leader failures, we then design RT-Zookeeper with a set of novel features including a fast-convergence election protocol, a quorum channel notification mechanism, and a distributed epoch persistence protocol. We have implemented RT-Zookeeper based on ZooKeeper version 3.5.8. Empirical evaluation shows that RT-ZooKeeper achieves 91% reduction in maximum recovery latency in comparison to ZooKeeper. Furthermore, a case study demonstrates that fast failure recovery in RT-ZooKeeper can benefit a common messaging service like Kafka in terms of message latency.

References

[1]
Jaiganesh Balasubramanian, Aniruddha Gokhale, Abhishek Dubey, Friedhelm Wolf, Chenyang Lu, Chris Gill, and Douglas Schmidt. 2010. Middleware for resource-aware deployment and configuration of fault-tolerant real-time systems. In 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium. IEEE, Stockholm, Sweden, 69–78.
[2]
Jaiganesh Balasubramanian, Sumant Tambe, Chenyang Lu, and Aniruddha Gokhale. 2009. Adaptive failover for real-time middleware with passive replication. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’09). IEEE, San Francisco, 118–127.
[3]
Martin Biely, Zarko Milosevic, Nuno Santos, and Andre Schiper. 2012. S-paxos: Offloading the leader for high throughput state machine replication. In 2012 IEEE 31st Symposium on Reliable Distributed Systems. IEEE, Irvine, 111–120.
[4]
William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos replicated state machines as the basis of a high-performance data store. In Proc. NSDI’11, USENIX Conference on Networked Systems Design and Implementation. ACM, Boston, 141–154.
[5]
Mike Burrows. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th symposium on Operating systems design and implementation. ACM, Seattle, 335–350.
[6]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015), 1–1.
[7]
Minyu Cui, Angeliki Kritikakou, Lei Mo, and Emmanuel Casseau. 2021. Fault-tolerant mapping of real-time parallel applications under multiple DVFS schemes. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’21). IEEE, Nashville, 387–399.
[8]
Dell. Inc.2020. Edge Analytics for Industry 4.0 with Confluent Platform. https://www.delltechnologies.com/asset/en-us/products/ready-solutions/technical-support/h18352-da-edge-iiot-confluent-dellencinfra-ra.pdf. Accessed: 2021-06-04.
[9]
Ibrahim EL-Sanosi and Paul Ezhilchelvan. 2018. Improving zookeeper atomic broadcast performance when a server quorum never crashes. EAI Endorsed Transactions on Energy Web 5, 17 (2018), 1–1.
[10]
Neeraj Gandhi, Edo Roth, Robert Gifford, Linh Thi Xuan Phan, and Andreas Haeberlen. 2020. Bounded-time recovery for distributed real-time systems. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, Sydney, 110–123.
[11]
Arpan Gujarati, Sergey Bozhko, and Björn B. Brandenburg. 2020. Real-time replica consistency over ethernet with reliability bounds. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, Sydney, 376–389.
[12]
Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for internet-scale systems. In USENIX annual technical conference, Vol. 8. ACM, Boston, 1–1.
[13]
EL-Sanosi Ibrahim and Paul Ezhilchelvan. 2017. Improving zookeeper atomic broadcast performance by coin tossing. In European Workshop on Performance Engineering. Springer, Berlin, 249–265.
[14]
Flavio Junqueira. 2010. Last processed zxid set prematurely while establishing leadership. https://issues.apache.org/jira/browse/ZOOKEEPER-790. Accessed: 2020-10-01.
[15]
Flavio Junqueira. 2013. Zab Pre 1.0. https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+Pre+1.0. Accessed: 2020-10-01.
[16]
Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). IEEE, Hong Kong, 245–256.
[17]
Babak Kalantari and André Schiper. 2013. Addressing the ZooKeeper synchronization inefficiency. In International Conference on Distributed Computing and Networking. Springer, Mumbai, 434–438.
[18]
Hyoseung Kim and Ragunathan Rajkumar. 2016. Real-time cache management for multi-core virtualization. In 2016 International Conference on Embedded Software (EMSOFT). IEEE, Pittsburgh, 1–10.
[19]
Mahadev Konar. 2011. Zookeeper servers should commit the new leader txn to their logs. https://issues.apache.org/jira/browse/ZOOKEEPER-335. Accessed: 2020-10-01.
[20]
Jay Kreps, Neha Narkhede, Jun Rao, et al. 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB, Vol. 11. IEEE, Athens, 1–7.
[21]
Leslie Lamport. 1998. The part-time parliament. ACM Transactions on Computer Systems (TOCS) 16, 2 (1998), 133–169.
[22]
Leslie Lamport et al. 2001. Paxos made simple. ACM Sigact News 32, 4 (2001), 18–25.
[23]
Andrew Loveless, Ronald Dreslinski, Baris Kasikci, and Linh Thi Xuan Phan. 2021. IGOR: Accelerating byzantine fault tolerance for real-time systems with eager execution. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’21). IEEE, Nashville, 360–373.
[24]
Lucidworks. 2018. Setting Up an External ZooKeeper Ensemble. https://doc.lucidworks.com/fusion-server/4.2/reference/solr-reference-guide/7.5.0/setting-up-an-external-zookeeper-ensemble.html. Accessed: 2020-10-01.
[25]
Nancy A. Lynch. 1996. Distributed algorithms. Elsevier, San Francisco.
[26]
Nancy A. Lynch. 1996. Distributed algorithms. Elsevier, San Francisco, Chapter Algorithms in General Synchronous Networks, 51–80.
[27]
Nancy A. Lynch. 1996. Distributed algorithms. Elsevier, San Francisco, Chapter Leader Election in an Arbitrary Network, 495–496.
[28]
André Medeiros. 2012. ZooKeeper’s atomic broadcast protocol: Theory and practice. Technical Report. Technical report.
[29]
Benjamin Reed. 2016. Zab 1.0. https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0. Accessed: 2020-10-01.
[30]
Benjamin Reed and Norbert Kalmar. 2019. Applications Powered by ZooKeeper. https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy.
[31]
Edo Roth and Andreas Haeberlen. 2021. Do not overpay for fault tolerance! In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’21). IEEE, Nashville, 151–160.
[32]
Maurice Sebastian, Philip Axer, and Rolf Ernst. 2011. Utilizing hidden markov models for formal reliability analysis of real-time communication systems with errors. In 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing. IEEE, Pasadena, 79–88.
[33]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, Incline Village, 1–10.
[34]
Jiguo Song, John Wittrock, and Gabriel Parmer. 2013. Predictable, efficient system-level fault tolerance in C^ 3. In 2013 IEEE 34th Real-Time Systems Symposium. IEEE, Vancouver, 21–32.
[35]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2, 2 (2009), 1626–1629.
[36]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. 2014. Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, Snowbird, 147–156.
[37]
Elin Vinka. 2018. How many Zookeepers in a cluster?https://www.cloudkarafka.com/blog/2018-07-04-cloudkarafka-how-many-zookeepers-in-a-cluster.html. Accessed: 2020-10-01.
[38]
Chao Wang, Christopher Gill, and Chenyang Lu. 2019. Frame: Fault tolerant and real-time messaging for edge computing. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, Dallas, 976–985.
[39]
Sisu Xi, Meng Xu, Chenyang Lu, Linh T. X. Phan, Christopher Gill, Oleg Sokolsky, and Insup Lee. 2014. Real-time multi-core virtual machine scheduling in Xen. In 2014 International Conference on Embedded Software (EMSOFT). ACM, New Delhi, 1–1.
[40]
Meng Xu, Linh Thi, Xuan Phan, Hyon-Young Choi, and Insup Lee. 2017. vCAT: Dynamic cache management using CAT virtualization. In 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, Pittsburgh, 211–222.

Cited By

View all
  • (2025)Systematic review on neural architecture searchArtificial Intelligence Review10.1007/s10462-024-11058-w58:3Online publication date: 6-Jan-2025
  • (2024)Total Execution Order in Fault-Tolerant Real-Time SystemsProceedings of the 32nd International Conference on Real-Time Networks and Systems10.1145/3696355.3699704(12-24)Online publication date: 6-Nov-2024
  • (2024)Computation Offloading and Band Selection for IoT Devices in Multi-Access Edge ComputingACM Transactions on Modeling and Computer Simulation10.1145/3670400Online publication date: 3-Jun-2024
  • Show More Cited By

Index Terms

  1. RT-ZooKeeper: Taming the Recovery Latency of a Coordination Service

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 20, Issue 5s
      Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
      October 2021
      1367 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3481713
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 22 September 2021
      Accepted: 01 July 2021
      Revised: 01 June 2021
      Received: 01 April 2021
      Published in TECS Volume 20, Issue 5s

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Real-time fault tolerance
      2. Apache ZooKeeper
      3. response time analysis

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • NSF
      • Fullgraf Foundation

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)92
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Systematic review on neural architecture searchArtificial Intelligence Review10.1007/s10462-024-11058-w58:3Online publication date: 6-Jan-2025
      • (2024)Total Execution Order in Fault-Tolerant Real-Time SystemsProceedings of the 32nd International Conference on Real-Time Networks and Systems10.1145/3696355.3699704(12-24)Online publication date: 6-Nov-2024
      • (2024)Computation Offloading and Band Selection for IoT Devices in Multi-Access Edge ComputingACM Transactions on Modeling and Computer Simulation10.1145/3670400Online publication date: 3-Jun-2024
      • (2024)Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN DeploymentProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656631(498-510)Online publication date: 3-Jun-2024
      • (2024)AlterEgoProceedings of the 7th International Workshop on Edge Systems, Analytics and Networking10.1145/3642968.3654814(7-12)Online publication date: 22-Apr-2024
      • (2024)Reg-Tune: A Regression-Focused Fine-Tuning Approach for Profiling Low Energy Consumption and LatencyACM Transactions on Embedded Computing Systems10.1145/362338023:3(1-28)Online publication date: 11-May-2024
      • (2024)EvoLP: Self-Evolving Latency Predictor for Model Compression in Real-Time Edge SystemsIEEE Embedded Systems Letters10.1109/LES.2023.332159916:2(174-177)Online publication date: Jun-2024
      • (2024)Neural architecture search for in-memory computing-based deep learning acceleratorsNature Reviews Electrical Engineering10.1038/s44287-024-00052-71:6(374-390)Online publication date: 20-May-2024
      • (2024)Neural architecture search for image super-resolution: A review on the emerging state-of-the-artNeurocomputing10.1016/j.neucom.2024.128481(128481)Online publication date: Aug-2024
      • (2024)AutoML: A systematic review on automated machine learning with neural architecture searchJournal of Information and Intelligence10.1016/j.jiixd.2023.10.0022:1(52-81)Online publication date: Jan-2024
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media