Skip to main content
Log in

Workload-aware resource management for software-defined compute

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

With advance of cloud computing technologies, there have been more diverse and heterogeneous workloads running on cloud datacenters. As more and more workloads run on the datacenters, the contention for the limited shared resources may increase, which can make the management of the resources difficult, often leading to low resource utilization. For effective resource management, the management software for the datacenters should be redesigned and used in a software-defined way to dynamically allocate “right” resources to workloads based on different characteristics of workloads so that they can decrease the cost of their operation while meeting the service level objectives such as satisfying the latency requirement. However, recent datacenter resource management frameworks do not operate in such software-defined ways, thus leading to not only the waste of resources, but also the performance degradation. To address this problem, we have designed and developed a workload-aware resource management framework for software-defined compute. The framework consists mainly of the workload profiler and workload-aware schedulers. To demonstrate the effectiveness of the framework, we have prototyped the schedulers that minimize the interferences on the shared computing and memory resources. We have compared them with the existing schedulers in the OpenStack and VMWare vSphere testbeds, and evaluated its effectiveness in high contention scenarios. Our experimental study suggests that the use of our proposed approach can lead to up to 100 % improvements in throughput and up to 95 % reductions in tail latency for latency critical workloads compared to the existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Linden, G.: Make data useful (2006)

  2. Memcached. https://www.memcached.org

  3. Redis. http://www.redis.io

  4. Openstack. https://www.openstack.org

  5. Gulati, A., Shanmuganathan, G., Holler, A.M., Ahmad, I.: Cloud scale resource management: challenges and techniques. In: Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing, pp. 3:1–3:6 (2011)

  6. Vmware software-defined data center. http://www.vmware.com/files/pdf/techpaper/Technical-whitepaper-SDDC-Capabilities-IToutcomes.pdf

  7. Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: ACM SIGARCH Computer Architecture News, vol. 38, pp. 129–142. ACM (2010)

  8. Intel(r) 64 and ia-32 architectures software developer’s manual

  9. Spec 2006 benchmark. https://www.spec.org/cpu2006

  10. Kim, S., Eom, H., Yeom, H.Y.: Virtual machine consolidation based on interference modeling. J. Supercomput. 66(3), 1489–1506 (2013)

    Article  Google Scholar 

  11. Cheng, L., Wang, C.L.: vBalance: using interrupt load balance to improve i/o performance for smp virtual machines. In: Proceedings of the Third ACM Symposium on Cloud Computing, pp. 2:1–2:14. ACM (2012)

  12. Gordon, A., Amit, N., Har’El, N., Ben-Yehuda, M., Landau, A., Schuster, A., Tsafrir, D.: Eli: bare-metal performance for i/o virtualization. ACM SIGPLAN Not. 47(4), 411–422 (2012)

    Google Scholar 

  13. Li, J., Sharma, N.K., Ports, D.R., Gribble, S.D.: Tales of the tail: hardware, OS, and application-level sources of tail latency. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–14. ACM (2014)

  14. Tu, C.C., Ferdman, M., Lee, C.T., Chiueh, T.C.: A comprehensive implementation and evaluation of direct interrupt delivery. In: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pp. 1–15. ACM (2015)

  15. Xu, Y., Bailey, M., Noble, B., Jahanian, F.: Small is better: avoiding latency traps in virtualized data centers. In: Proceedings of the 4th annual Symposium on Cloud Computing, pp. 7:1–7:16. ACM (2013)

  16. Little, J.D., Graves, S.C.: Little’s law. In: Building Intuition, pp. 81–100. Springer, New York (2008)

  17. Linux perf. https://www.perf.wiki.kernel.org

  18. Mutilate. https://www.github.com/leverich/mutilate

  19. Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: ACM SIGMETRICS Performance Evaluation Review, vol. 40, pp. 53–64. ACM (2012)

  20. Delimitrou, C., Kozyrakis, C.: iBench: quantifying interference for datacenter applications. In: Workload Characterization (IISWC), 2013 IEEE International Symposium on, pp. 23–33. IEEE (2013)

  21. Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture, pp. 248–259. ACM (2011)

  22. Zhang, X., Tune, E., Hagmann, R., Jnagal, R., Gokhale, V., Wilkes, J.: Cpi 2: Cpu performance isolation for shared compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp. 379–391. ACM (2013)

  23. Monasca. https://www.wiki.openstack.org/wiki/Monasca

  24. Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, pp. 295–308 (2011)

  25. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp. 15–28. USENIX Association (2012)

  26. Delimitrou, C., Kozyrakis, C.: Paragon: Qos-aware scheduling for heterogeneous datacenters. ACM SIGARCH Comput. Archit. News 41(1), 77–88 (2013)

    Google Scholar 

  27. Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and qos-aware cluster management. ACM SIGPLAN Not. 49(4), 127–144 (2014)

    Google Scholar 

  28. Kubernetes. http://www.kubernetes.io

  29. Karanasos, K., Rao, S., Curino, C., Douglas, C., Chaliparambil, K., Fumarola, G.M., Heddaya, S., Ramakrishnan, R., Sakalanaga, S.: Mercury: hybrid centralized and distributed scheduling in large shared clusters. In: 2015 USENIX Annual Technical Conference (USENIX ATC 15), pp. 485–497 (2015)

  30. Apache Hadoop Yarn. http://www.hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

  31. Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Comput. Archit. News 41(3), 607–618 (2013)

    Article  Google Scholar 

  32. Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Heracles: improving resource efficiency at scale. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 450–462. ACM (2015)

  33. Leverich, J., Kozyrakis, C.: Reconciling high server utilization and sub-millisecond quality-of-service. In: Proceedings of the Ninth European Conference on Computer Systems, pp. 4:1–4:14. ACM (2014)

Download references

Acknowledgments

This research was supported by a Grant of the SKT-SNU SDDC R&D Collaboration Program through the SK Telecom Corporate R&D Center funded by SK Telecom (Grant Number: 1519C00101-616052). It was also partly supported by Institute for Information & communications Technology Promotion (IITP) Grant funded by the Korea government (MSIP) (R0190-16-2012, High Performance Big Data Analytics Platform Performance Acceleration Technologies Development), and partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2064629). In addition, this work was partly supported by BK21 Plus for Pioneers in Innovative Computing (Dept. of Computer Science and Engineering, SNU) funded by National Research Foundation of Korea(NRF) (21A20151113068).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyeonsang Eom.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nam, Y., Kang, M., Sung, H. et al. Workload-aware resource management for software-defined compute. Cluster Comput 19, 1555–1570 (2016). https://doi.org/10.1007/s10586-016-0613-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0613-6

Keywords

Navigation