Workload-aware resource management for software-defined compute

Nam, Yoonsung; Kang, Minkyu; Sung, Hanul; Kim, Jincheol; Eom, Hyeonsang

doi:10.1007/s10586-016-0613-6

Workload-aware resource management for software-defined compute

Published: 18 August 2016

Volume 19, pages 1555–1570, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Yoonsung Nam¹,
Minkyu Kang¹,
Hanul Sung¹,
Jincheol Kim² &
…
Hyeonsang Eom¹

617 Accesses
2 Citations
Explore all metrics

Abstract

With advance of cloud computing technologies, there have been more diverse and heterogeneous workloads running on cloud datacenters. As more and more workloads run on the datacenters, the contention for the limited shared resources may increase, which can make the management of the resources difficult, often leading to low resource utilization. For effective resource management, the management software for the datacenters should be redesigned and used in a software-defined way to dynamically allocate “right” resources to workloads based on different characteristics of workloads so that they can decrease the cost of their operation while meeting the service level objectives such as satisfying the latency requirement. However, recent datacenter resource management frameworks do not operate in such software-defined ways, thus leading to not only the waste of resources, but also the performance degradation. To address this problem, we have designed and developed a workload-aware resource management framework for software-defined compute. The framework consists mainly of the workload profiler and workload-aware schedulers. To demonstrate the effectiveness of the framework, we have prototyped the schedulers that minimize the interferences on the shared computing and memory resources. We have compared them with the existing schedulers in the OpenStack and VMWare vSphere testbeds, and evaluated its effectiveness in high contention scenarios. Our experimental study suggests that the use of our proposed approach can lead to up to 100 % improvements in throughput and up to 95 % reductions in tail latency for latency critical workloads compared to the existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy efficiency in cloud computing data centers: a survey on software technologies

Article 30 August 2022

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Dynamic resource allocation in cloud computing: analysis and taxonomies

Article 28 January 2022

References

Linden, G.: Make data useful (2006)
Memcached. https://www.memcached.org
Redis. http://www.redis.io
Openstack. https://www.openstack.org
Gulati, A., Shanmuganathan, G., Holler, A.M., Ahmad, I.: Cloud scale resource management: challenges and techniques. In: Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing, pp. 3:1–3:6 (2011)
Vmware software-defined data center. http://www.vmware.com/files/pdf/techpaper/Technical-whitepaper-SDDC-Capabilities-IToutcomes.pdf
Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: ACM SIGARCH Computer Architecture News, vol. 38, pp. 129–142. ACM (2010)
Intel(r) 64 and ia-32 architectures software developer’s manual
Spec 2006 benchmark. https://www.spec.org/cpu2006
Kim, S., Eom, H., Yeom, H.Y.: Virtual machine consolidation based on interference modeling. J. Supercomput. 66(3), 1489–1506 (2013)
Article Google Scholar
Cheng, L., Wang, C.L.: vBalance: using interrupt load balance to improve i/o performance for smp virtual machines. In: Proceedings of the Third ACM Symposium on Cloud Computing, pp. 2:1–2:14. ACM (2012)
Gordon, A., Amit, N., Har’El, N., Ben-Yehuda, M., Landau, A., Schuster, A., Tsafrir, D.: Eli: bare-metal performance for i/o virtualization. ACM SIGPLAN Not. 47(4), 411–422 (2012)
Google Scholar
Li, J., Sharma, N.K., Ports, D.R., Gribble, S.D.: Tales of the tail: hardware, OS, and application-level sources of tail latency. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–14. ACM (2014)
Tu, C.C., Ferdman, M., Lee, C.T., Chiueh, T.C.: A comprehensive implementation and evaluation of direct interrupt delivery. In: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pp. 1–15. ACM (2015)
Xu, Y., Bailey, M., Noble, B., Jahanian, F.: Small is better: avoiding latency traps in virtualized data centers. In: Proceedings of the 4th annual Symposium on Cloud Computing, pp. 7:1–7:16. ACM (2013)
Little, J.D., Graves, S.C.: Little’s law. In: Building Intuition, pp. 81–100. Springer, New York (2008)
Linux perf. https://www.perf.wiki.kernel.org
Mutilate. https://www.github.com/leverich/mutilate
Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: ACM SIGMETRICS Performance Evaluation Review, vol. 40, pp. 53–64. ACM (2012)
Delimitrou, C., Kozyrakis, C.: iBench: quantifying interference for datacenter applications. In: Workload Characterization (IISWC), 2013 IEEE International Symposium on, pp. 23–33. IEEE (2013)
Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture, pp. 248–259. ACM (2011)
Zhang, X., Tune, E., Hagmann, R., Jnagal, R., Gokhale, V., Wilkes, J.: Cpi 2: Cpu performance isolation for shared compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp. 379–391. ACM (2013)
Monasca. https://www.wiki.openstack.org/wiki/Monasca
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, pp. 295–308 (2011)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp. 15–28. USENIX Association (2012)
Delimitrou, C., Kozyrakis, C.: Paragon: Qos-aware scheduling for heterogeneous datacenters. ACM SIGARCH Comput. Archit. News 41(1), 77–88 (2013)
Google Scholar
Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and qos-aware cluster management. ACM SIGPLAN Not. 49(4), 127–144 (2014)
Google Scholar
Kubernetes. http://www.kubernetes.io
Karanasos, K., Rao, S., Curino, C., Douglas, C., Chaliparambil, K., Fumarola, G.M., Heddaya, S., Ramakrishnan, R., Sakalanaga, S.: Mercury: hybrid centralized and distributed scheduling in large shared clusters. In: 2015 USENIX Annual Technical Conference (USENIX ATC 15), pp. 485–497 (2015)
Apache Hadoop Yarn. http://www.hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Comput. Archit. News 41(3), 607–618 (2013)
Article Google Scholar
Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Heracles: improving resource efficiency at scale. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 450–462. ACM (2015)
Leverich, J., Kozyrakis, C.: Reconciling high server utilization and sub-millisecond quality-of-service. In: Proceedings of the Ninth European Conference on Computer Systems, pp. 4:1–4:14. ACM (2014)

Download references

Acknowledgments

This research was supported by a Grant of the SKT-SNU SDDC R&D Collaboration Program through the SK Telecom Corporate R&D Center funded by SK Telecom (Grant Number: 1519C00101-616052). It was also partly supported by Institute for Information & communications Technology Promotion (IITP) Grant funded by the Korea government (MSIP) (R0190-16-2012, High Performance Big Data Analytics Platform Performance Acceleration Technologies Development), and partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2064629). In addition, this work was partly supported by BK21 Plus for Pioneers in Innovative Computing (Dept. of Computer Science and Engineering, SNU) funded by National Research Foundation of Korea(NRF) (21A20151113068).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
Yoonsung Nam, Minkyu Kang, Hanul Sung & Hyeonsang Eom
AI Tech Lab., Future Technology R&D Center, Corporate R&D Center, SK Telecom, Seoul, South Korea
Jincheol Kim

Authors

Yoonsung Nam
View author publications
You can also search for this author in PubMed Google Scholar
Minkyu Kang
View author publications
You can also search for this author in PubMed Google Scholar
Hanul Sung
View author publications
You can also search for this author in PubMed Google Scholar
Jincheol Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hyeonsang Eom
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyeonsang Eom.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nam, Y., Kang, M., Sung, H. et al. Workload-aware resource management for software-defined compute. Cluster Comput 19, 1555–1570 (2016). https://doi.org/10.1007/s10586-016-0613-6

Download citation

Received: 31 May 2016
Accepted: 31 July 2016
Published: 18 August 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10586-016-0613-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Workload-aware resource management for software-defined compute

Abstract

Access this article

Similar content being viewed by others

Energy efficiency in cloud computing data centers: a survey on software technologies

A survey of Kubernetes scheduling algorithms

Dynamic resource allocation in cloud computing: analysis and taxonomies

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Workload-aware resource management for software-defined compute

Abstract

Access this article

Similar content being viewed by others

Energy efficiency in cloud computing data centers: a survey on software technologies

A survey of Kubernetes scheduling algorithms

Dynamic resource allocation in cloud computing: analysis and taxonomies

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation