skip to main content
10.1145/3238147.3241535acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
short-paper

A multi-objective framework for effective performance fault injection in distributed systems

Published:03 September 2018Publication History

ABSTRACT

Modern distributed systems should be built to anticipate performance degradation. Often requests in these systems involve ten to thousands Remote Procedure Calls, each of which can be a source of performance degradation. The PhD programme presented here intends to address this issue by providing automated instruments to effectively drive performance fault injection in distributed systems. The envisioned approach exploits multi-objective search-based techniques to automatically find small combinations of tiny performance degradations induced by specific RPCs,which have significant impacts on the user-perceived performance. Automating the search of these events will improve the ability to inject performance issues in production in order to force developers to anticipate and mitigate them.

References

  1. Peter Alvaro, Kolton Andrus, Chris Sanden, Casey Rosenthal, Ali Basiri, and Lorin Hochstein. 2016. Automating Failure Testing Research at Internet Scale. In the ACM Symposium on Cloud Computing. 17–28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Peter Alvaro, Joshua Rosen, and Joseph M. Hellerstein. 2015. Lineage-driven Fault Injection. In SIGMOD. 331–346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dan Ardelean, Amer Diwan, and Chandra Erdman. 2018. Performance Analysis of Cloud Applications. In the Symposium on Networked Systems Design and Implementation. 405–417.Google ScholarGoogle Scholar
  4. Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Justin Reynolds, and Casey Rosenthal. 2016. Chaos Engineering. IEEE Software 33, 3 (May 2016), 35–41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jake Brutlag. 2009. Google AI Blog: Speed matters. https://ai.googleblog.com/ 2009/06/speed-matters.htmlGoogle ScholarGoogle Scholar
  6. Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transaction Evolutionary Computation 6, 2 (April 2002), 182–197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dror Feitelson, Eitan Frachtenberg, and Kent Beck. 2013. Development and Deployment at Facebook. IEEE Internet Computing 17, 4 (July 2013), 8–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Haryadi S. Gunawi, Thanh Do, Joseph M. Hellerstein, Ion Stoica, Dhruba Borthakur, and Jesse Robbins. 2011. Failure as a Service (FaaS): A Cloud Service for Large-Scale, Online Failure Drills. Technical Report. http://www2.eecs. berkeley.edu/Pubs/TechRpts/2011/EECS-2011-87.htmlGoogle ScholarGoogle Scholar
  9. Haryadi S. Gunawi, Thanh Do, Pallavi Joshi, Peter Alvaro, Joseph M. Hellerstein, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Koushik Sen, and Dhruba Borthakur. 2011. FATE and DESTINI: A Framework for Cloud Recovery Testing. In the Conference on Networked Systems Design and Implementation. 238–252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based Software Engineering: Trends, Techniques and Applications. Comput. Surveys 45, 1, Article 11 (Dec. 2012), 11:1–11:61 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lorin Hochstein and Casey Rosenthal. 2016. Chaos Engineering Panel. In ICSE (Companion). 90–91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and Detecting Real-world Performance Bugs. In PLDI. 77–88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jonathan Kaldor, Jonathan Mace, MichałBejda, Edison Gao, Wiktor Kuropatwa, Joe O’Neill, Kian Win Ong, Bill Schaller, Pingjia Shan, Brendan Viscomi, Vinod Venkataraman, Kaushik Veeraraghavan, and Yee Jiun Song. 2017. Canopy: An End-to-End Performance Tracing And Analysis System. In SOSP. 34–50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ghani A. Kanawati, Nasser A. Kanawati, and Jacob A. Abraham. 1995. FERRARI: A Flexible Software-Based Fault and Error Injection System. IEEE Transanctions on Computers 44, 2 (Feb. 1995), 248–260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. 2015. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. In the Symposium on Operating Systems Principles. 378–393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sam Newman. 2015. Building Microservices (1st ed.). O’Reilly Media, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Charlene O’Hanlon. 2006. A Conversation with Werner Vogels. Queue 4, 4, Article 14 (May 2006), 14:14–14:22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Julia Rubin and Martin Rinard. 2016. The Challenges of Staying Together While Moving Fast: An Exploratory Study. In ICSE. 982–993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google, Inc. https://research.google.com/archive/papers/dapper-2010-1.pdfGoogle ScholarGoogle Scholar
  20. Kaushik Veeraraghavan, Justin Meza, David Chou, Wonho Kim, Sonia Margulis, Scott Michelson, Rajesh Nishtala, Daniel Obenshain, Dmitri Perelman, and Yee Jiun Song. 2016. Kraken: Leveraging Live Traffic Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services. In OSDI. 635–651. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wei Zheng, Ricardo Bianchini, G. John Janakiraman, Jose Renato Santos, and Yoshio Turner. 2009. JustRunIt: Experiment-based Management of Virtualized Data Centers. In the USENIX Annual Technical Conference. 18–18. Abstract 1 Introduction 2 Envisioned approach 2.1 Approach 2.2 Explanatory example 2.3 Instantiations 3 Expected contribution 4 Related work 5 conclusion References Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A multi-objective framework for effective performance fault injection in distributed systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering
        September 2018
        955 pages
        ISBN:9781450359375
        DOI:10.1145/3238147

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 September 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate82of337submissions,24%

        Upcoming Conference

      • Article Metrics

        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)3

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader