skip to main content
10.1145/2755644.2755647acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Architecting a Persistent and Reliable Configuration Management System

Published:16 June 2015Publication History

ABSTRACT

Streamlined configuration management plays a significant role in modern, complex distributed systems. Via mechanisms that promote consistency, repeatability, and transparency, configuration management systems (CMSes) address complexity and aim to increase the efficiency of administrative procedures, including deployment and failure recovery scenarios. Considering the importance of minimizing disruptions in these systems, we design an architecture that increases persistency and reliability of infrastructure management. We present our architecture in the context of hybrid, cluster-cloud environments and describe our highly available implementation that builds upon the open source CMS called Chef and infrastructure-as-a-service cloud resources from Amazon Web Services. We demonstrate how we enabled a smooth transition from the pre-existing single-server configuration to the proposed highly available management system. We summarize our experience with managing a 20-node Linux cluster using this implementation. Our analysis of utilization and cost of necessary cloud resources indicates that the designed system is a low-cost alternative to acquiring additional physical hardware for hardening cluster management. We also highlight the prototype's security and manageability features that are suitable for larger, production-ready deployments.

References

  1. B. Schroeder and G. Gibson, 'A Large-Scale Study of Failures in High-Performance Computing Systems', IEEE Transactions on Dependable and Secure Computing, vol. 7, no. 4, pp. 337--350, Jan. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Yamamoto, A. Uno, H. Murai, T. Tsukamoto, F. Shoji, S. Matsui, R. Sekizawa, F. Sueyasu, H. Uchiyama, M. Okamoto, N. Ohgushi, K. Takashina, D. Wakabayashi, Y. Taguchi, and M. Yokokawa, 'The K computer Operations: Experiences and Statistics', Procedia Computer Science, vol. 29, pp. 576--585, Jan. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  3. P. Marshall, H. Tufo, K. Keahey, D. LaBissoniere, and M. Woitaszek, 'A Large-Scale Elastic Environment for Scientific Computing',Communications in Computer and Information Science, pp. 112--126, Jan. 2013.Google ScholarGoogle Scholar
  4. P. Marshall, H. Tufo, and K. Keahey, 'Provisioning Policies for Elastic Computing Environments', 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, Jan. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 'Open Source Chef Server 11'. {Online}. Available: https://www.chef.io/download-open-source-chef-server-11/. {Accessed: 20-Feb-2015}.Google ScholarGoogle Scholar
  6. 'Welcome to Openswan!'. {Online}. Available: https://www.openswan.org/. {Accessed: 20-Feb-2015}.Google ScholarGoogle Scholar
  7. 'DRBD: Software Development for High Availability Clusters'. {Online}. Available: http://drbd.linbit.com/. {Accessed: 20-Feb-2015}.Google ScholarGoogle Scholar
  8. 'AWS CloudFormation - Configuration Management & Cloud Orchestration'. {Online}. Available: http://aws.amazon.com/cloudformation/. {Accessed: 20-Feb-2015}.Google ScholarGoogle Scholar
  9. 'Chef'. {Online}. Available: http://www.opscode.com/chef/. {Accessed: 04-Apr-2015}.Google ScholarGoogle Scholar
  10. 'Heatbeat -- Linux-HA'. {Online}. Available: http://linux-ha.org/wiki/Heartbeat. {Accessed: 20-Feb-2015}.Google ScholarGoogle Scholar
  11. 'GitHub: ha-chef repository by Dmitry Duplyakin'. {Online}. Available: https://github.com/dmdu/ha-chef/blob/master/scripts/ha_chef_install.sh. {Accessed: 20-Feb-2015}.Google ScholarGoogle Scholar
  12. 'Scenario 4: VPC with a Private Subnet Only and Hardware VPN Access - Amazon Virtual Private Cloud'. {Online}. Available: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario4.html. {Accessed: 20-Feb-2015}.Google ScholarGoogle Scholar
  13. 'pdsh - Parallel Distributed Shell'. {Online}. Available: https://code.google.com/p/pdsh/. {Accessed: 20-Feb-2015}.Google ScholarGoogle Scholar
  14. R. McLay, K. Schulz, W. Barth, and T. Minyard, 'Best practices for the deployment and management of production HPC clusters', State of the Practice Reports on - SC '11, Jan. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Fischer, R. Majumdar, and S. Esmaeilsabzali, 'Engage: A Deployment Management System', Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation - PLDI '12, Jan. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Benton, R. Rati, and E. Erlandson, 'Wallaby: A Scalable Semantic Configuration Service for Grids and Clouds', State of the Practice Reports on - SC '11, Jan. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Kim, J. Kim, and J. Koh, 'Convergence in Information and Communication Technology (ICT) Using Patent Analysis', Journal of Information Systems and Technology Management, vol. 11, Jan. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Wettinger, M. Behrendt, T. Binz, U. Breitenbücher, G. Breiter, F. Leymann, S. Moser, I. Schwertle, and T. Spatzier, 'Integrating Configuration Management with Model-Driven Cloud Management Based on TOSCA', Proceedings of the 3rd International Conference on Cloud Computing and Services Science (CLOSER 2013), Jan. 2013Google ScholarGoogle Scholar
  19. J. Schroeter, P. Mucha, M. Muth, K. Jugel, and M. Lochau, 'Dynamic configuration management of cloud-based applications', Proceedings of the 16th International Software Product Line Conference on - SPLC '12 -volume 1, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Han, S. Kim, H. Jung, H. Y. Yeom, C. Yoon, J. Park, and Y. Lee, 'A RESTful Approach to the Management of Cloud Infrastructure', 2009 IEEE International Conference on Cloud Computing, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Architecting a Persistent and Reliable Configuration Management System

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader