Abstract
Modern distributed systems are designed to tolerate unreliable environments, i.e., they aim to provide services even when some failures happen in the underlying hardware or network. However, the impact of unreliable environments can be significant on the performance of the distributed systems, which should be considered when deploying the services. In this paper, we present an approach to optimize performance of the distributed systems under unreliable deployed environments, through searching for optimal configuration parameters. To simulate an unreliable environment, we inject several failures in the environment of a service application, such as a node crash in the cluster, network failures between nodes, resource contention in nodes, etc. Then, we use a search algorithm to find the optimal parameters automatically in the user-selected parameter space, under the unreliable environment we created. We have implemented our approach in a testing-based framework and applied it to several well-known distributed service systems.
F. Ivančić and G. Balakrishnan—Current affiliation: Google, Inc.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Execution time is not obtained from Ganglia, but from Linux time command.
References
Ganglia. http://ganglia.sourceforge.net/
Juju. https://juju.ubuntu.com/
Juju Charms. https://jujucharms.com/
LoadRunner. http://www.hp.com/go/LoadRunner
Selenium. http://seleniumhq.org/
Allspaw, J.: Fault injection in production. Commun. ACM 55(10), 48–52 (2012)
Babu, S.: Towards automatic optimization of mapreduce programs. In: SOCC 2010, pp. 137–142 (2010)
Banabic, R., Candea, G.: Fast black-box testing of system recovery code. In: Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys 2012, pp. 281–294 (2012)
Broadwell, P., Sastry, N., Traupman, J.: FIG: A prototype tool for online verification of recovery. In: Workshop on Self-Healing, Adaptive and Self-Managed Systems (2002)
Carbone, M., Rizzo, L.: Dummynet revisited. SIGCOMM Comput. Commun. Rev. 40(2), 12–20 (2010)
Dawson, S., Jahanian, F., Mitton, T.: Experiments on six commercial TCP implementations using a software fault injection tool. Softw. Pract. Exper. 27(12), 1385–1410 (1997)
Gunawi, H., Do, T., Joshi, P., Alvaro, P., Hellerstein, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Sen, K., Borthakur, D.: FATE and DESTINI: A framework for cloud recovery testing. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI 2011, pp. 18–18 (2011)
Herodotos, H., Babu, S.: Profiling, What-if analysis, and cost-based optimization of MapReduce programs. In: VLDB 2011, pp. 1111–1122 (2011)
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A self-tuning system for big data analytics. In: CIDR 2011, pp. 261–272 (2011)
Hoarau, W., Tixeuil, S., Vauchelles, F.: FAIL-FCI: Versatile fault injection. Future Gener. Comput. Syst. 23(7), 913–919 (2007)
Joshi, P., Ganai, M., Balakrishnan, B., Gupta, A., Papakonstantinou, N.: SETSUDO: Perturbation-based testing framework for scalable distributed systems. In: Proceeding of the Conference on Timely Results in Operating Systems (2013)
Joshi, P., Gunawi, H., Sen, K.: PREFAIL: A programmable tool for multiple-failure injection. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA 2011, pp. 171–188 (2011)
Lubke, R., Lungwitz, R., Schuster, D., Schill, A.: Large-scale tests of distributed systems with integrated emulation of advanced network behavior. WWW/Internet 10(2), 138–151 (2013)
Marinescu, P., Candea, G.: Efficient testing of recovery code using fault injection. ACM Trans. Comput. Syst. 29(4), 11:1–11:38 (2011)
Molyneaux, I.: The Art of Application Performance Testing: Help for Programmers and Quality Assurance. O’Reilly Media (2009)
Tseitlin, A.: The antifragile organization. CACM 56(8), 40–44 (2013)
Ye, T., Kalyanaraman, S.: A recursive random search algorithm for large-scale network parameter configuration. In: SIGMETRICS 2003, pp. 196–205 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lin, Y., Ivančić, F., Joshi, P., Balakrishnan, G., Ganai, M., Gupta, A. (2015). Environment-Sensitive Performance Tuning for Distributed Service Orchestration. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-17353-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17352-8
Online ISBN: 978-3-319-17353-5
eBook Packages: Computer ScienceComputer Science (R0)