skip to main content
research-article

A Cost-Efficient Container Orchestration Strategy in Kubernetes-Based Cloud Computing Infrastructures with Heterogeneous Resources

Published: 17 April 2020 Publication History

Abstract

Containers, as a lightweight application virtualization technology, have recently gained immense popularity in mainstream cluster management systems like Google Borg and Kubernetes. Prevalently adopted by these systems for task deployments of diverse workloads such as big data, web services, and IoT, they support agile application deployment, environmental consistency, OS distribution portability, application-centric management, and resource isolation. Although most of these systems are mature with advanced features, their optimization strategies are still tailored to the assumption of a static cluster. Elastic compute resources would enable heterogeneous resource management strategies in response to the dynamic business volume for various types of workloads. Hence, we propose a heterogeneous task allocation strategy for cost-efficient container orchestration through resource utilization optimization and elastic instance pricing with three main features. The first one is to support heterogeneous job configurations to optimize the initial placement of containers into existing resources by task packing. The second one is cluster size adjustment to meet the changing workload through autoscaling algorithms. The third one is a rescheduling mechanism to shut down underutilized VM instances for cost saving and reallocate the relevant jobs without losing task progress. We evaluate our approach in terms of cost and performance on the Australian National Cloud Infrastructure (Nectar). Our experiments demonstrate that the proposed strategy could reduce the overall cost by 23% to 32% for different types of cloud workload patterns when compared to the default Kubernetes framework.

References

[1]
R. Mocevicius. 2015. CoreOS Essentials. Packt Publishing Ltd.
[2]
A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. 2015. Large scale cluster management at Google with Borg. In Proceedings of the 10th European Conference on Computer Systems. 18.
[3]
K. Hightower, B. Burns, and J. Beda. 2017. Kubernetes: Up and Running: Dive into the Future of Infrastructure. O'Reilly Media.
[4]
M. A. Rodriguez and R. Buyya. 2019. Container‐based cluster orchestration systems: A taxonomy and future directions. Software: Practice and Experience 49, 5 (2019), 698--719.
[5]
H. D. Karatza. 2004. Scheduling in distributed systems. In Performance Tools and Applications to Networked Systems. Lecture Notes in Computer Science, Vol. 2965. Springer, 336--356.
[6]
G. Copil, D. Moldovan, H. Truong, and S. Dustdar. 2016. rSYBL: A framework for specifying and controlling cloud services elasticity. ACM Transactions on Internet Technology 16, 3 (2016), 18.
[7]
D. Bernstei. 2014. Containers and cloud: From LXC to Docker to Kubernetes. IEEE Cloud Computing 1, 3 (2014), 81--84.
[8]
V. Medel, O. Rana, J. Á. Bañares, and U. Arronategui. 2016. Modelling performance and resource management in Kubernetes. In Proceedings of the 9th IEEE/ACM International Conference on Utility and Cloud Computing (UCC’16). 257--262.
[9]
N. Naik. 2016. Building a virtual system of systems using Docker swarm in multiple clouds. In Proceedings of the 2nd IEEE International Symposium on Systems Engineering (ISSE’16). 1--3.
[10]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation. 295--308.
[11]
GitHub. 2019. Marathon. Retrieved March 22, 2020 from https://mesosphere.github.io/marathon.
[12]
R. DelValle, G. Rattihalli, A. Beltre, M. Govindaraju, and M. J. Lewis. 2016. Exploring the design space for optimizations with Apache Aurora and Mesos. In Proceedings of the 9th IEEE International Conference on Cloud Computing (CLOUD’16). 537--544.
[13]
J. Guo, Z. Chang, S. Wang, H. Ding, Y. Feng, L. Mao, and Y. Bao. 2019. Who limits the resource efficiency of my datacenter: An analysis of Alibaba datacenter traces. In Proceedings of the ACM International Symposium on Quality of Service (IWQoS’19). 39.
[14]
H. Zhang, H. Ma, G. Fu, X. Yang, Z. Jiang, and Y. Gao. 2016. Container based video surveillance cloud service with fine-grained resource provisioning. In Proceedings of the 9th IEEE International Conference on Cloud Computing (CLOUD’16). 758--765.
[15]
C. Kaewkasi and K. Chuenmuneewong. 2017. Improvement of container scheduling for Docker using ant colony optimization. In Proceedings of the 9th International Conference on Knowledge and Smart Technology (KST’17). 254--259.
[16]
Q. Liu and Z. Yu. 2018. The elasticity and plasticity in semi-containerized co-locating cloud workload: A view from Alibaba Trace. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’18). ACM, New York, NY, 347--360.
[17]
C. Guerrero, I. Lera, and C. Juiz. 2018. Genetic algorithm for multi-objective optimization of container allocation in cloud architecture. Journal of Grid Computing 16, 1 (2018), 113--135.
[18]
S. Kehrer and W. Blochinger. 2018. TOSCA-based container orchestration on Mesos. Computer Science—Research and Development 33, 3--4 (2018), 305--316.
[19]
M. Xu, A. Toosi, and R. Buyya. 2019. iBrownout: An integrated approach for managing energy and brownout in container-based clouds. IEEE Transactions on Sustainable Computing 4, 1 (2019), 53--66.
[20]
S. Taherizadeh and V. Stankovski. 2018. Dynamic multi-level autoscaling rules for containerized applications. Computer Journal 62, 2 (2018), 174--197.
[21]
A. Chung, J. W. Park, and G. R. Ganger. 2018. Stratus: Cost-aware container scheduling in the public cloud. In Proceedings of the ACM Symposium on Cloud Computing. 121--134.
[22]
M. A. Rodriguez and R. Buyya. 2018. Containers orchestration with cost-efficient autoscaling in cloud computing environments. arXiv:1812.00300.
[23]
D. N. Jha, S. Garg, P. P. Jayaraman, R. Buyya, Z. Li, and R. Ranjan. 2018. A holistic evaluation of Docker containers for interfering microservices. In Proceedings of the 2018 IEEE International Conference on Services Computing. 33--40.
[24]
J. Son, A. V. Dastjerdi, R. N. Calheiros, and R. Buyya. 2017. SLA-aware and energy-efficient dynamic overbooking in SDN-based cloud data centers. IEEE Transactions on Sustainable Computing 2, 2 (2017), 76--89.
[25]
M. Mao and M. Humphrey. 2011. Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’11). 1--12.
[26]
J. Kang and S. Park. 2003. Algorithms for the variable sized bin packing problem. European Journal of Operational Research 147, 2 (2003), 365--372.
[27]
Nectar. Home Page. Retrieved March 22, 2020 from https://nectar.org.au/.
[28]
Lakshman and P. Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review 44, 2 (2010), 35--40.
[29]
S. Pickartz, N. Eiling, S. Lankes, L. Razik, and A. Monti. 2016. Migrating LinuX containers using CRIU. In High Performance Computing. Lecture Notes in Computer Science, Vol. 9945. Springer, 674--684.
[30]
Nedelcu, Clément. 2010. Nginx HTTP Server: Adopt Nginx for Your Web Applications to Make the Most of Your Infrastructure and Serve Pages Faster Than Ever. Packt Publishing Ltd.
[31]
M. Chen, W. Li, G. Fortino, Y. Hao, L. Hu, and I. Humar. 2019. A dynamic service migration mechanism in edge cognitive computing. ACM Transactions on Internet Technology 19, 2 (2019) 30.
[32]
Z. Gong, X. Gu, and J. Wilkes. 2010. PRESS: PRedictive elastic resource scaling for cloud systems. In Proceedings of 2010 International Conference on Network and Service Management. 9--16.
[33]
Khan, X. Yan, S. Tao, and N. Anerousis. 2012. Workload characterization and prediction in the cloud: A multiple time series approach. In Proceedings of the 2012 IEEE Network Operations and Management Symposium. 1287--1294.
[34]
V. Medel, O. Rana, J. Á. Bañares, and U. Arronategui. 2016. Adaptive application scheduling under interference in Kubernetes. In Proceedings of the 9th IEEE/ACM International Conference on Utility and Cloud Computing (UCC’16). 426--427.
[35]
C. T. Joseph and K. Chandrasekaran. 2019. Straddling the crevasse: A review of microservice software architecture foundations and recent advancements. Software: Practice and Experience 49, 10 (2019), 1448--1484.
[36]
U. Paščinsk, J. Trnkoczy, V. Stankovski, M. Cigale, and S. Gec. 2018. QoS-aware orchestration of network intensive software utilities within software defined data centres. Journal of Grid Computing 16, 1 (2018), 85--112.
[37]
P. Kochovski, P. D. Drobintsev, and V. Stankovski. 2019. Formal quality of service assurances, ranking and verification of cloud deployment options with a probabilistic model checking method. Information and Software Technology 109, 2 (2019), 14--25.
[38]
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the 3rd ACM Symposium on Cloud Computing. 7.
[39]
B. Sharma, V. Chudnovsky, J. L. Hellerstein, R. Rifaat, and C. R. Das. 2011. Modeling and synthesizing task placement constraints in Google compute clusters. In Proceedings of the 2nd ACM Symposium on Cloud Computing. 3.
[40]
C. Pahl and B. Lee. 2015. Containers and clusters for edge cloud architectures—A technology review. In Proceedings of the 3rd IEEE International Conference on Future Internet of Things and Cloud. 379--386.
[41]
B. Burns and D. Oppenheimer. 2016. Design patterns for container-based distributed systems. In Proceedings of the 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’16). 2016.
[42]
J. Yu and R. Buyya. 2005. A taxonomy of scientific workflow systems for grid computing. ACM SIGMOD Record 34, 3 (2005) 44--49.
[43]
M. Xu and R. Buyya. 2019. BrownoutCon: A software system based on brownout and containers for energy-efficient cloud computing. Journal of Systems and Software 155, 5 (2019), 91--103.
[44]
X. Xu, H. Yu, and X. Pei. 2014. A novel resource scheduling approach in container based clouds. In Proceedings of the 17th IEEE International Conference on Computational Science and Engineering. 257--264.
[45]
L. Yin, J. Luo, and H. Luo. 2018. Tasks scheduling and resource allocation in fog computing based on containers for smart manufacturing. IEEE Transactions on Industrial Informatics 14, 10 (2018), 4712--4721.
[46]
R. Buyya, R. N. Calheiros, J. Son, A. V. Dastjerdi, and Y. Yoon. 2014. Software-defined cloud computing: Architectural elements and open challenges. In Proceedings of the 3rd IEEE International Conference on Advances in Computing, Communications, and informatics (ICACCI’14). 1--12.
[47]
Z. Zhao, A. Taal, A. Jones, I. Taylor, V. Stankovski, I. G. Vega, and C. de Laat. 2015. A software workbench for interactive, time critical and highly self-adaptive cloud applications (SWITCH). In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. 1181--1184.
[48]
Z. Zhang, C. Li, Y. Tao, R. Yang, H. Tang, and J. Xu. 2014. Fuxi: A fault-tolerant resource management and job scheduling system at Internet scale. Proceedings of the VLDB Endowment 7, 13 (2014), 1393--1404.
[49]
L. Qi. 2019. Maximizing CPU Resource Utilization on Alibaba's Servers. Retrieved March 22, 2020 from https://102.alibaba.com/detail/?id=61.
[50]
C. Delimitrou, D. Sanchez, and C. Kozyrakis. 2015. Tarcil: Reconciling scheduling speed and quality in large shared clusters. In Proceedings of the 6th ACM Symposium on Cloud Computing. 97--110.
[51]
S. Shastri and D. Irwin. 2017. HotSpot: Automated server hopping in cloud spot markets. In Proceedings of the 8th ACM Symposium on Cloud Computing. 493--505.

Cited By

View all
  • (2025)Online Elastic Resource Provisioning With QoS Guarantee in Container-Based Cloud ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.352208536:3(361-376)Online publication date: 1-Mar-2025
  • (2025)HephaestusForge: Optimal microservice deployment across the Compute Continuum via Reinforcement LearningFuture Generation Computer Systems10.1016/j.future.2024.107680166(107680)Online publication date: May-2025
  • (2025)Resource Utilization-Based Container Orchestration: Closing the Gap for Enhanced Cloud Application PerformanceSN Computer Science10.1007/s42979-024-03624-46:3Online publication date: 18-Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 20, Issue 2
Special Section on Emotions in Conflictual Social Interactions and Regular Papers
May 2020
256 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3386441
  • Editor:
  • Ling Liu
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2020
Accepted: 01 January 2020
Revised: 01 November 2019
Received: 01 August 2019
Published in TOIT Volume 20, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cluster management
  2. container orchestration
  3. cost efficiency
  4. resource heterogeneity

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • China Scholarship Council and the Australia Research Council Discovery Project

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)331
  • Downloads (Last 6 weeks)21
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Online Elastic Resource Provisioning With QoS Guarantee in Container-Based Cloud ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.352208536:3(361-376)Online publication date: 1-Mar-2025
  • (2025)HephaestusForge: Optimal microservice deployment across the Compute Continuum via Reinforcement LearningFuture Generation Computer Systems10.1016/j.future.2024.107680166(107680)Online publication date: May-2025
  • (2025)Resource Utilization-Based Container Orchestration: Closing the Gap for Enhanced Cloud Application PerformanceSN Computer Science10.1007/s42979-024-03624-46:3Online publication date: 18-Feb-2025
  • (2025)Energy-aware Scheduling Algorithm for Microservices in Kubernetes CloudsJournal of Grid Computing10.1007/s10723-024-09788-w23:1Online publication date: 1-Mar-2025
  • (2025)Reinforcement learning-based task scheduling for heterogeneous computing in end-edge-cloud environmentCluster Computing10.1007/s10586-024-04828-228:3Online publication date: 1-Jun-2025
  • (2024)Research on optimization strategy of container orchestration technology for cloud computing environmentApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-25619:1Online publication date: 3-Sep-2024
  • (2024)Zookeeper – Managed Operations Manager and CoordinatorSSRN Electronic Journal10.2139/ssrn.4814785Online publication date: 2024
  • (2024)Cost-aware Service Placement and Scheduling in the Edge-Cloud ContinuumACM Transactions on Architecture and Code Optimization10.1145/364082321:2(1-24)Online publication date: 16-Jan-2024
  • (2024) Tango : Harmonious Optimization for Mixed Services in Kubernetes-Based Edge Clouds IEEE Transactions on Services Computing10.1109/TSC.2024.3479926(1-14)Online publication date: 2024
  • (2024)ComboFunc: Joint Resource Combination and Container Placement for Serverless Function Scaling with Heterogeneous ContainerIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.3454071(1-17)Online publication date: 2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media