ABSTRACT
The rise of elastically scaling applications that frequently deploy new machines has led to the adoption of DevOps practices across the cloud engineering stack. So-called configuration management tools utilize scripts that are based on declarative resource descriptions and make the system converge to the desired state. It is crucial for convergent configurations to be able to gracefully handle transient faults, e.g., network outages when downloading and installing software packages. In this paper we introduce a conceptual framework for asserting reliable convergence in configuration management. Based on a formal definition of configuration scripts and their resources, we utilize state transition graphs to test whether a script makes the system converge to the desired state under different conditions. In our generalized model, configuration actions are partially ordered, often resulting in prohibitively many possible execution orders. To reduce this problem space, we define and analyze a property called preservation, and we show that if preservation holds for all pairs of resources, then convergence holds for the entire configuration. Our implementation builds on Puppet, but the approach is equally applicable to other frameworks like Chef, Ansible, etc. We perform a comprehensive evaluation based on real world Puppet scripts and show the effectiveness of the approach. Our tool is able to detect all idempotence and convergence related issues in a set of existing Puppet scripts with known issues as well as some hitherto undiscovered bugs in a large random sample of scripts.
- J.-P. Arcangeli, R. Boujbel, and S. Leriche. Automatic deployment of distributed software systems: Definitions and state of the art. Journal of Systems and Software, 103, 2015. Google ScholarDigital Library
- A. Arnold. Finite Transition Systems: Semantics of Communicating Systems. Prentice Hall, 1994. Google ScholarDigital Library
- G. Brightwell and P. Winkler. Counting Linear Extensions is #P-complete. In 23rd Annual ACM Symposium on Theory of Computing (STOC), pages 175–181, 1991. Google ScholarDigital Library
- R. Bubley and M. Dyer. Faster random generation of linear extensions. Discrete Mathematics, 201, 1999. Google ScholarDigital Library
- M. Burgess. CFEngine: a site configuration engine. Computing Systems, 8(3), 1995.Google Scholar
- M. Burgess and A. Couch. Modeling Next Generation Configuration Management Tools. In 20th Int. Conference on Large Installation System Administration (LISA), 2006. Google ScholarDigital Library
- Chef Software, Inc. Ohai. https://docs.chef.io/ohai. html, 2015.Google Scholar
- J. Collard, N. Gupta, R. Shambaugh, A. Weiss, and A. Guha. On Static Verification of Puppet System Configurations. CoRR, 2015.Google Scholar
- A. Couch and M. Chiarini. Dynamic Consistency Analysis for Convergent Operators. In Resilient Networks and Services. 2008. Google ScholarDigital Library
- A. Couch and N. Daniels. The Maelstrom: Network Service Debugging via ”Ineffective Procedures”. In 15th USENIX Conference on Large Installation System Administration (LISA), pages 63–78, 2001. Google ScholarDigital Library
- A. Couch and Y. Sun. On the Algebraic Structure of Convergence. In Self-Managing Distributed Systems, pages 28–40, 2003.Google ScholarCross Ref
- A. Couch and Y. Sun. On observed reproducibility in network configuration management. Science of Computer Programming, 2004.Google ScholarCross Ref
- T. Delaet, W. Joosen, and B. Vanbrabant. A Survey of System Configuration Tools. In 24th International Conference on Large Installation System Administration (LISA). USENIX Association, 2010. Google ScholarDigital Library
- S. Erdweg, M. Lichter, and M. Weiel. A sound and optimal incremental build system with dynamic dependencies. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 89–106, 2015. Google ScholarDigital Library
- F. Erich, C. Amrit, and M. Daneva. A Mapping Study on Cooperation between Information System Development and Operations. In Product-Focused Software Process Improvement. 2014.Google Scholar
- A. Gambi, W. Hummer, H.-L. Truong, and S. Dustdar. Testing Elastic Computing Systems. IEEE Internet Computing, 17(6):76–82, 2013. Google ScholarDigital Library
- R. Harrison. How to Avoid Puppet Dependency Nightmares With Defines. https://blog.openshift.com/how-toavoid-puppet-dependency-nightmares-with-defines, retrieved on 12/15/2015, 2013.Google Scholar
- L. Hochstein. Ansible: Up and Running. O’Reilly Media, Inc., 2014. Google ScholarDigital Library
- W. Hummer, F. Rosenberg, F. Oliveira, and T. Eilam. Testing Idempotence for Infrastructure as Code. In 14th ACM/ IFIP/USENIX International Middleware Conference. 2013.Google ScholarCross Ref
- M. Hüttermann. DevOps for developers. Apress, 2012. Google ScholarCross Ref
- S. Krum, W. Hevelingen, B. Kero, J. Turnbull, and J. Mc-Cune. Pro Puppet. Apress, 2013. Google ScholarDigital Library
- J. Loope. Managing Infrastructure with Puppet. O’Reilly Media, Inc., 2011. Google ScholarDigital Library
- D. Merkel. Docker: Lightweight Linux Containers for Consistent Development and Deployment. Linux Journal, 2014(239), Mar. 2014. Google ScholarDigital Library
- M. Miglierina. Application Deployment and Management in the Cloud. In 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2014.Google Scholar
- S. Ntafos and S. Hakimi. On Path Cover Problems in Digraphs and Applications to Program Testing. IEEE Transactions on Software Engineering, SE-5(5):520–529, 1979. Google ScholarDigital Library
- J. Offutt, S. Liu, A. Abdurazik, and P. Ammann. Generating test data from state-based specifications. Software Testing, Verification and Reliability, 13(1):25–53, 2003.Google ScholarCross Ref
- H. Powell. ZFS and Btrfs: A Quick Introduction to Modern Filesystems. Linux J., 2012(218), June 2012. Google ScholarDigital Library
- V. Sobeslav and A. Komarek. OpenSource Automation in Cloud Computing. In 4th International Conference on Computer Engineering and Networks, pages 805–812. 2015.Google Scholar
- D. Spinellis. Don’t Install Software by Hand. IEEE Software, 2012. Google ScholarDigital Library
- M. Taylor and S. Vargo. Learning Chef: A Guide to Configuration Management and Automation. O’Reilly Media, 2014. Google ScholarDigital Library
- J. Tretmans. Model Based Testing with Labelled Transition Systems. In Formal Methods and Testing, pages 1–38. Springer, 2008. Google ScholarDigital Library
- L. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8(2), 1979.Google Scholar
- F. van Ham, H. van de Wetering, and J. van Wijk. Interactive visualization of state transition systems. IEEE Transactions on Visualization and Computer Graphics, 8(4):319– 329, 2002. Google ScholarDigital Library
- J. Wettinger, U. Breitenbücher, and F. Leymann. Compensation-Based vs. Convergent Deployment Automation for Services Operated in the Cloud. In 12th International Conference on Service-Oriented Computing (ICSOC), pages 336–350, 2014.Google Scholar
Index Terms
- Asserting reliable convergence for configuration management scripts
Recommendations
Asserting reliable convergence for configuration management scripts
OOPSLA '16The rise of elastically scaling applications that frequently deploy new machines has led to the adoption of DevOps practices across the cloud engineering stack. So-called configuration management tools utilize scripts that are based on declarative ...
Scaling Puppet and Foreman for HPC
PEARC '18: Proceedings of the Practice and Experience on Advanced Research ComputingThe Ohio Supercomputer Center has deployed a Puppet configuration management and Foreman provisioning environment that scales to almost one thousand servers that are a mix of HPC cluster compute and service nodes as well as storage, web, and ...
Private Cloud Configuration with MetaConfig
CLOUD '11: Proceedings of the 2011 IEEE 4th International Conference on Cloud ComputingWith the advent of private clouds, the challenge of configuring a mix of physical and virtual machines is no longer reserved to a few system administrator gurus. How to assign virtual machines onto physical machines to leverage the available resources? ...
Comments