skip to main content
10.1145/3613372.3613409acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

How The Retry Pattern Impacts Application Performance: A Controlled Experiment

Published: 25 September 2023 Publication History

Abstract

Distributed application developers typically use resiliency patterns like Retry, Circuit Breaker, and Fail Fast for handling remote service failures. However, limited research exists on how these patterns may impact performance across various operational conditions. This paper presents a controlled experiment assessing the performance of over 100 Retry pattern configurations in Java and C# using Resilience4j and Polly libraries, under different workloads and failure rates. Our experimental results indicate increasing any of the three Retry parameters investigated (i.e., the initial backoff delay, the backoff delay multiplier, and the maximum number of retries) reduces response time but raises execution time, with effects intensifying exponentially as failure rates grow. An analysis using a state-of-the-art model explainer reveals the initial backoff delay’s impact is twice that of other parameters at low to moderate failure rates, with more balanced effects at high rates. These findings apply to both Resilience4j and Polly, with Polly’s impact being slightly higher due to subtle implementation differences. Our results can benefit both distributed application developers and researchers. Developers can learn from our findings to tailor the Retry pattern to their applications’ needs. Researchers can expand upon our work to enhance our collective understanding of resiliency patterns’ impact and implications.

References

[1]
Carlos M. Aderaldo and Nabor C. Mendonça. 2022. ResilienceBench: Um Ambiente para Avaliação Experimental de Padrões de Resiliência para Microsserviços. In Anais Estendidos do XL Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (Fortaleza, CE). SBC, Porto Alegre, RS, Brasil, 65–72.
[2]
Gibeon Aquino, Rafael Queiroz, Geoff Merrett, and Bashir Al-Hashimi. 2019. The circuit breaker pattern targeted to future iot applications. In International Conference on Service-Oriented Computing. Springer, 390–396.
[3]
Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. 2016. Site Reliability Engineering: How Google Runs Production Systems. O’Reilly.
[4]
Alessandro Birolini. 2013. Reliability Engineering: Theory and Practice. Springer Science & Business Media.
[5]
Steve Bourne. 2004. A Conversation with Bruce Lindsay: Designing for Failure May Be the Key to Success. ACM Queue 2, 8 (2004), 22–33.
[6]
Marc Brooker. 2015. Exponential Backoff And Jitter. AWS Architecture Blog, https://aws.amazon.com/pt/blogs/architecture/exponential-backoff-and-jitter/.
[7]
Franz Brosch, Barbora Buhnova, Heiko Koziolek, and Ralf Reussner. 2011. Reliability Prediction for Fault-Tolerant Software Architectures. In Joint ACM SIGSOFT Conference and ACM SIGSOFT Symposium on Quality of Software Architectures (QoSA) and Architecting Critical Systems (ISARCS). 75–84.
[8]
Franz Brosch, Heiko Koziolek, Barbora Buhnova, and Ralf Reussner. 2011. Architecture-Based Reliability Prediction with the Palladio Component Model. IEEE Transactions on Software Engineering 38, 6 (2011), 1319–1339.
[9]
Giuliano Casale, Ningfang Mi, Ludmila Cherkasova, and Evgenia Smirni. 2012. Dealing with Burstiness in Multi-Tier Applications: Models and Their Parameterization. IEEE Transactions on Software Engineering 38, 5 (2012), 1040–1053.
[10]
Thiago Costa, Davi Vasconcelos, Carlos Aderaldo, and Nabor Mendonça. 2022. Avaliação de Desempenho de Dois Padrões de Resiliência para Microsserviços: Retry e Circuit Breaker. In Anais do XL Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (Fortaleza, CE). SBC, Porto Alegre, RS, Brasil, 517–530.
[11]
Docker. 2021. Overview of Docker Compose. https://docs.docker.com/compose/.
[12]
Envoy. 2023. Envoy Proxy. https://www.envoyproxy.io.
[13]
Martin Fowler. 2014. CircuitBreaker. https://martinfowler.com/bliki/CircuitBreaker.html.
[14]
Google Cloud. 2019. Rate-limiting strategies and techniques. https://cloud.google.com/architecture/rate-limiting-strategies-techniques.
[15]
gRPC Authors. 2023. gRPC: A high performance, open source universal RPC framework. https://grpc.io/.
[16]
Jiawei Han, Jian Pei, and Hanghang Tong. 2022. Data mining: concepts and techniques. Morgan kaufmann.
[17]
Victor Heorhiadi, Shriram Rajagopalan, Hani Jamjoom, Michael K Reiter, and Vyas Sekar. 2016. Gremlin: Systematic Resilience Testing of Microservices. In 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS). 57–66.
[18]
Bilgin Ibryam. 2017. It takes more than a Circuit Breaker to create a resilient application. https://developers.redhat.com/blog/2017/05/16/it-takes-more-than-a-circuit-breaker-to-create-a-resilient-application/.
[19]
Istio.io. 2023. The Istio service mesh. https://istio.io/.
[20]
Lalita J Jagadeesan and Veena B Mendiratta. 2020. When Failure is (Not) an Option: Reliability Models for Microservices Architectures. In 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 19–24.
[21]
Pooyan Jamshidi, Claus Pahl, Nabor C Mendonça, James Lewis, and Stefan Tilkov. 2018. Microservices: The Journey So Far and Challenges Ahead. IEEE Software 35, 3 (2018), 24–35.
[22]
Marta Kwiatkowska, Gethin Norman, and David Parker. 2007. Stochastic Model Checking. In Formal Methods for the Design of Computer, Communication and Software Systems: Performance Evaluation (SFM’07)(LNCS (Tutorial Volume), Vol. 4486), M. Bernardo and J. Hillston (Eds.). Springer, 220–270.
[23]
Marta Kwiatkowska, Gethin Norman, and David Parker. 2011. PRISM 4.0: Verification of Probabilistic Real-time Systems. In Proc. 23rd International Conference on Computer Aided Verification (CAV’11)(LNCS, Vol. 6806), G. Gopalakrishnan and S. Qadeer (Eds.). Springer, 585–591.
[24]
Xabier Larrakoetxea. 2018. Goresilience: a Go library to improve applications resiliency. https://slok.medium.com/goresilience-a-go-library-to-improve-applications-resiliency-14d229aee385.
[25]
Leo Liberti, Carlile Lavor, Nelson Maculan, and Antonio Mucherino. 2014. Euclidean distance geometry and applications. SIAM review 56, 1 (2014), 3–69.
[26]
Zhenyue Long, Guoquan Wu, Xiaojiang Chen, Chengxu Cui, Wei Chen, and Jun Wei. 2020. Fitness-guided Resilience Testing of Microservice-based Applications. In 2020 IEEE International Conference on Web Services (ICWS). IEEE, 151–158.
[27]
Scott M Lundberg. 2022. SHAP: A game theoretic approach to explain the output of any machine learning model. https://github.com/slundberg/shap.
[28]
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765–4774.
[29]
Nabor C Mendonca and Carlos M Aderaldo. 2021. Towards First-Class Architectural Connectors: The Case for Self-Adaptive Service Meshes. In 35th Brazilian Symposium on Software Engineering (SBES). 404–409.
[30]
Nabor C. Mendonca, Carlos Mendes Aderaldo, Javier Cámara, and David Garlan. 2020. Model-based analysis of microservice resiliency patterns. In 2020 IEEE International Conference on Software Architecture (ICSA). IEEE, 114–124.
[31]
Microsoft. 2022. Polly. https://github.com/App-vNext/Polly.
[32]
Microsoft Azure. 2017. Resiliency patterns. https://docs.microsoft.com/en-us/azure/architecture/patterns/category/resiliency.
[33]
Microsoft Azure. 2017. Retry Pattern. https://docs.microsoft.com/en-us/azure/architecture/patterns/retry.
[34]
Piotr Minkowski. 2020. Circuit breaker and retries on Kubernetes with Istio and Spring Boot. Piotr’s TechBlog, https://piotrminkowski.com/2020/06/03/circuit-breaker-and-retries-on-kubernetes-with-istio-and-spring-boot/.
[35]
Raffaela Mirandola, Pasqualina Potena, Elvinia Riccobene, and Patrizia Scandurra. 2014. A Reliability Model for Service Component Architectures. Journal of Systems and Software 89 (2014), 109–127.
[36]
Netflix. 2018. Hystrix: Latency and Fault Tolerance for Distributed Systems. https://github.com/Netflix/Hystrix.
[37]
Netflix. 2020. Chaos Monkey. https://github.com/Netflix/chaosmonkey.
[38]
Michael Nygard. 2007. Release It!: Design and Deploy Production-Ready Software. Pragmatic Bookshelf.
[39]
Roberto Pietrantuono, Stefano Russo, and Antonio Guerriero. 2020. Testing microservice architectures for operational reliability. Software Testing, Verification and Reliability 30, 2 (2020), e1725.
[40]
PingCAP. 2023. Chaos Mesh. https://github.com/chaos-mesh/chaos-mesh.
[41]
Postman Inc.2017. HttpBin. https://github.com/postmanlabs/httpbin
[42]
Resilience4j. 2022. Resilience4j: A Fault tolerance library designed for functional programming. https://github.com/resilience4j/resilience4j.
[43]
Casey Rosenthal, Lorin Hochstein, Aaron Blohowiak, Nora Jones, and Ali Basiri. 2017. Chaos Engineering: Building Confidence in System Behavior through Experiments. O’Reilly.
[44]
Mohammad Reza Saleh Sedghpour, Cristian Klein, and Johan Tordsson. 2022. An Empirical Study of Service Mesh Traffic Management Policies for Microservices. In ACM/SPEC Int. Conf. Performance Engineering (ICPE). 17–27.
[45]
Corey Scott. 2018. Designing Resilient Systems: Circuit Breakers or Retries? (Part 1). Grab Tech Blog, https://engineering.grab.com/designing-resilient-systems-part-1.
[46]
Corey Scott. 2019. Designing Resilient Systems: Circuit Breakers or Retries? (Part 2). Grab Tech Blog, https://engineering.grab.com/designing-resilient-systems-part-2.
[47]
Mohammad Reza Saleh Sedghpour, Cristian Klein, and Johan Tordsson. 2021. Service mesh circuit breaker: From panic button to performance management tool. In 1st Workshop on High Availability and Observability of Cloud Systems (HAOC). 4–10.
[48]
Gráinne Sheerin. 2018. gRPC and Deadlines. https://grpc.io/blog/deadlines/.
[49]
Systems Engineering Body of Knowledge. 2020. System Resilience. https://www.sebokwiki.org/wiki/System_Resilience.
[50]
Dan Tran. 2018. Circuit Breaker and Retry. https://dantt.medium.com/circuit-breaker-and-retry-64830e71d0f6.
[51]
Twitter. 2022. Finagle: A fault tolerant, protocol-agnostic RPC system. https://github.com/twitter/finagle.
[52]
Kanglin Yin, Qingfeng Du, Wei Wang, Juan Qiu, and Jincheng Xu. 2019. On representing and eliciting resilience requirements of microservice architecture systems. arXiv preprint arXiv:1909.13096 (2019).

Cited By

View all
  • (2024)Static Configurations Pose Challenges to Resilience PatternsIntelligent Systems Design and Applications10.1007/978-3-031-64850-2_31(330-337)Online publication date: 2-Aug-2024
  • (2024)A declarative approach and benchmark tool for controlled evaluation of microservice resiliency patternsSoftware: Practice and Experience10.1002/spe.336855:1(170-192)Online publication date: 28-Aug-2024

Index Terms

  1. How The Retry Pattern Impacts Application Performance: A Controlled Experiment

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering
      September 2023
      570 pages
      ISBN:9798400707872
      DOI:10.1145/3613372
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 September 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. controlled experiment
      2. performance analysis
      3. retry pattern

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • CNpQ - Conselho Nacional de Desenvolvimento Científico e Tecnológico

      Conference

      SBES 2023
      SBES 2023: XXXVII Brazilian Symposium on Software Engineering
      September 25 - 29, 2023
      Campo Grande, Brazil

      Acceptance Rates

      Overall Acceptance Rate 147 of 427 submissions, 34%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)38
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Static Configurations Pose Challenges to Resilience PatternsIntelligent Systems Design and Applications10.1007/978-3-031-64850-2_31(330-337)Online publication date: 2-Aug-2024
      • (2024)A declarative approach and benchmark tool for controlled evaluation of microservice resiliency patternsSoftware: Practice and Experience10.1002/spe.336855:1(170-192)Online publication date: 28-Aug-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media