short-paper

A multi-objective framework for effective performance fault injection in distributed systems

Author:
Luca Traini

University of L'Aquila, Italy

University of L'Aquila, Italy
View Profile

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software EngineeringSeptember 2018Pages 936–939https://doi.org/10.1145/3238147.3241535

Published:03 September 2018Publication History

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Pages 936–939

ABSTRACT

Modern distributed systems should be built to anticipate performance degradation. Often requests in these systems involve ten to thousands Remote Procedure Calls, each of which can be a source of performance degradation. The PhD programme presented here intends to address this issue by providing automated instruments to effectively drive performance fault injection in distributed systems. The envisioned approach exploits multi-objective search-based techniques to automatically find small combinations of tiny performance degradations induced by specific RPCs,which have significant impacts on the user-perceived performance. Automating the search of these events will improve the ability to inject performance issues in production in order to force developers to anticipate and mitigate them.

References

Peter Alvaro, Kolton Andrus, Chris Sanden, Casey Rosenthal, Ali Basiri, and Lorin Hochstein. 2016. Automating Failure Testing Research at Internet Scale. In the ACM Symposium on Cloud Computing. 17–28. Google ScholarDigital Library
Peter Alvaro, Joshua Rosen, and Joseph M. Hellerstein. 2015. Lineage-driven Fault Injection. In SIGMOD. 331–346. Google ScholarDigital Library
Dan Ardelean, Amer Diwan, and Chandra Erdman. 2018. Performance Analysis of Cloud Applications. In the Symposium on Networked Systems Design and Implementation. 405–417.Google Scholar
Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Justin Reynolds, and Casey Rosenthal. 2016. Chaos Engineering. IEEE Software 33, 3 (May 2016), 35–41. Google ScholarDigital Library
Jake Brutlag. 2009. Google AI Blog: Speed matters. https://ai.googleblog.com/ 2009/06/speed-matters.htmlGoogle Scholar
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transaction Evolutionary Computation 6, 2 (April 2002), 182–197. Google ScholarDigital Library
Dror Feitelson, Eitan Frachtenberg, and Kent Beck. 2013. Development and Deployment at Facebook. IEEE Internet Computing 17, 4 (July 2013), 8–17. Google ScholarDigital Library
Haryadi S. Gunawi, Thanh Do, Joseph M. Hellerstein, Ion Stoica, Dhruba Borthakur, and Jesse Robbins. 2011. Failure as a Service (FaaS): A Cloud Service for Large-Scale, Online Failure Drills. Technical Report. http://www2.eecs. berkeley.edu/Pubs/TechRpts/2011/EECS-2011-87.htmlGoogle Scholar
Haryadi S. Gunawi, Thanh Do, Pallavi Joshi, Peter Alvaro, Joseph M. Hellerstein, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Koushik Sen, and Dhruba Borthakur. 2011. FATE and DESTINI: A Framework for Cloud Recovery Testing. In the Conference on Networked Systems Design and Implementation. 238–252. Google ScholarDigital Library
Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based Software Engineering: Trends, Techniques and Applications. Comput. Surveys 45, 1, Article 11 (Dec. 2012), 11:1–11:61 pages. Google ScholarDigital Library
Lorin Hochstein and Casey Rosenthal. 2016. Chaos Engineering Panel. In ICSE (Companion). 90–91. Google ScholarDigital Library
Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and Detecting Real-world Performance Bugs. In PLDI. 77–88. Google ScholarDigital Library
Jonathan Kaldor, Jonathan Mace, MichałBejda, Edison Gao, Wiktor Kuropatwa, Joe O’Neill, Kian Win Ong, Bill Schaller, Pingjia Shan, Brendan Viscomi, Vinod Venkataraman, Kaushik Veeraraghavan, and Yee Jiun Song. 2017. Canopy: An End-to-End Performance Tracing And Analysis System. In SOSP. 34–50. Google ScholarDigital Library
Ghani A. Kanawati, Nasser A. Kanawati, and Jacob A. Abraham. 1995. FERRARI: A Flexible Software-Based Fault and Error Injection System. IEEE Transanctions on Computers 44, 2 (Feb. 1995), 248–260. Google ScholarDigital Library
Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. 2015. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. In the Symposium on Operating Systems Principles. 378–393. Google ScholarDigital Library
Sam Newman. 2015. Building Microservices (1st ed.). O’Reilly Media, Inc. Google ScholarDigital Library
Charlene O’Hanlon. 2006. A Conversation with Werner Vogels. Queue 4, 4, Article 14 (May 2006), 14:14–14:22 pages. Google ScholarDigital Library
Julia Rubin and Martin Rinard. 2016. The Challenges of Staying Together While Moving Fast: An Exploratory Study. In ICSE. 982–993. Google ScholarDigital Library
Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google, Inc. https://research.google.com/archive/papers/dapper-2010-1.pdfGoogle Scholar
Kaushik Veeraraghavan, Justin Meza, David Chou, Wonho Kim, Sonia Margulis, Scott Michelson, Rajesh Nishtala, Daniel Obenshain, Dmitri Perelman, and Yee Jiun Song. 2016. Kraken: Leveraging Live Traffic Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services. In OSDI. 635–651. Google ScholarDigital Library
Wei Zheng, Ricardo Bianchini, G. John Janakiraman, Jose Renato Santos, and Yoshio Turner. 2009. JustRunIt: Experiment-based Management of Virtualized Data Centers. In the USENIX Annual Technical Conference. 18–18. Abstract 1 Introduction 2 Envisioned approach 2.1 Approach 2.2 Explanatory example 2.3 Instantiations 3 Expected contribution 4 Related work 5 conclusion References Google ScholarDigital Library

Index Terms

A multi-objective framework for effective performance fault injection in distributed systems
1. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering
  2. Software organization and properties
    1. Extra-functional properties
      1. Software performance

Recommendations

A Framework for Assessing Dependability in Distributed Systems with Lightweight Fault Injectors
IPDS '00: Proceedings of the 4th International Computer Performance and Dependability Symposium

Many fault injection tools are available for dependability assessment. Although these tools are good at injecting a single fault model into a single system, they suffer from two main limitations for use in distributed systems: (1) no single tool is ...
Read More
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems

The authors describe a dependability evaluation method based on fault injection that establishes the link between the experimental evaluation of the fault tolerance process and the fault occurrence process. The main characteristics of a fault injection ...
Read More
A Java Framework to Specify Faultloads for Fault Injection Campaigns

In an operational environment, the identification and reproduction of faults may be hard to be done, specially in complex systems. Use of fault injection accelerates this process, improving the test of fault tolerance mechanisms. However, there are a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering
September 2018
955 pages
ISBN:9781450359375
DOI:10.1145/3238147
General Chair:
Marianne Huchard
University of Montpellier, France
,
Program Chairs:
Christian Kästner
Carnegie Mellon University, USA
,
Gordon Fraser
University of Passau, Germany
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 September 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Distributed Systems
Fault Injection
Search-Based Software Engineering
Software Performance
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate82of337submissions,24%
Upcoming Conference
ASE '24

Sponsor:

sigsoft online

sigsoft online

ASE '24: 39th IEEE/ACM International Conference on Automated Software Engineering

October 27 - November 1, 2024

Sacramento , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 196
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A multi-objective framework for effective performance fault injection in distributed systems

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Framework for Assessing Dependability in Distributed Systems with Lightweight Fault Injectors

Fault Injection and Dependability Evaluation of Fault-Tolerant Systems

A Java Framework to Specify Faultloads for Fault Injection Campaigns