Abstract
With advance of cloud computing technologies, there have been more diverse and heterogeneous workloads running on cloud datacenters. As more and more workloads run on the datacenters, the contention for the limited shared resources may increase, which can make the management of the resources difficult, often leading to low resource utilization. For effective resource management, the management software for the datacenters should be redesigned and used in a software-defined way to dynamically allocate “right” resources to workloads based on different characteristics of workloads so that they can decrease the cost of their operation while meeting the service level objectives such as satisfying the latency requirement. However, recent datacenter resource management frameworks do not operate in such software-defined ways, thus leading to not only the waste of resources, but also the performance degradation. To address this problem, we have designed and developed a workload-aware resource management framework for software-defined compute. The framework consists mainly of the workload profiler and workload-aware schedulers. To demonstrate the effectiveness of the framework, we have prototyped the schedulers that minimize the interferences on the shared computing and memory resources. We have compared them with the existing schedulers in the OpenStack and VMWare vSphere testbeds, and evaluated its effectiveness in high contention scenarios. Our experimental study suggests that the use of our proposed approach can lead to up to 100 % improvements in throughput and up to 95 % reductions in tail latency for latency critical workloads compared to the existing ones.
Similar content being viewed by others
References
Linden, G.: Make data useful (2006)
Memcached. https://www.memcached.org
Redis. http://www.redis.io
Openstack. https://www.openstack.org
Gulati, A., Shanmuganathan, G., Holler, A.M., Ahmad, I.: Cloud scale resource management: challenges and techniques. In: Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing, pp. 3:1–3:6 (2011)
Vmware software-defined data center. http://www.vmware.com/files/pdf/techpaper/Technical-whitepaper-SDDC-Capabilities-IToutcomes.pdf
Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: ACM SIGARCH Computer Architecture News, vol. 38, pp. 129–142. ACM (2010)
Intel(r) 64 and ia-32 architectures software developer’s manual
Spec 2006 benchmark. https://www.spec.org/cpu2006
Kim, S., Eom, H., Yeom, H.Y.: Virtual machine consolidation based on interference modeling. J. Supercomput. 66(3), 1489–1506 (2013)
Cheng, L., Wang, C.L.: vBalance: using interrupt load balance to improve i/o performance for smp virtual machines. In: Proceedings of the Third ACM Symposium on Cloud Computing, pp. 2:1–2:14. ACM (2012)
Gordon, A., Amit, N., Har’El, N., Ben-Yehuda, M., Landau, A., Schuster, A., Tsafrir, D.: Eli: bare-metal performance for i/o virtualization. ACM SIGPLAN Not. 47(4), 411–422 (2012)
Li, J., Sharma, N.K., Ports, D.R., Gribble, S.D.: Tales of the tail: hardware, OS, and application-level sources of tail latency. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–14. ACM (2014)
Tu, C.C., Ferdman, M., Lee, C.T., Chiueh, T.C.: A comprehensive implementation and evaluation of direct interrupt delivery. In: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pp. 1–15. ACM (2015)
Xu, Y., Bailey, M., Noble, B., Jahanian, F.: Small is better: avoiding latency traps in virtualized data centers. In: Proceedings of the 4th annual Symposium on Cloud Computing, pp. 7:1–7:16. ACM (2013)
Little, J.D., Graves, S.C.: Little’s law. In: Building Intuition, pp. 81–100. Springer, New York (2008)
Linux perf. https://www.perf.wiki.kernel.org
Mutilate. https://www.github.com/leverich/mutilate
Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: ACM SIGMETRICS Performance Evaluation Review, vol. 40, pp. 53–64. ACM (2012)
Delimitrou, C., Kozyrakis, C.: iBench: quantifying interference for datacenter applications. In: Workload Characterization (IISWC), 2013 IEEE International Symposium on, pp. 23–33. IEEE (2013)
Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture, pp. 248–259. ACM (2011)
Zhang, X., Tune, E., Hagmann, R., Jnagal, R., Gokhale, V., Wilkes, J.: Cpi 2: Cpu performance isolation for shared compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp. 379–391. ACM (2013)
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, pp. 295–308 (2011)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp. 15–28. USENIX Association (2012)
Delimitrou, C., Kozyrakis, C.: Paragon: Qos-aware scheduling for heterogeneous datacenters. ACM SIGARCH Comput. Archit. News 41(1), 77–88 (2013)
Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and qos-aware cluster management. ACM SIGPLAN Not. 49(4), 127–144 (2014)
Kubernetes. http://www.kubernetes.io
Karanasos, K., Rao, S., Curino, C., Douglas, C., Chaliparambil, K., Fumarola, G.M., Heddaya, S., Ramakrishnan, R., Sakalanaga, S.: Mercury: hybrid centralized and distributed scheduling in large shared clusters. In: 2015 USENIX Annual Technical Conference (USENIX ATC 15), pp. 485–497 (2015)
Apache Hadoop Yarn. http://www.hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Comput. Archit. News 41(3), 607–618 (2013)
Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Heracles: improving resource efficiency at scale. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 450–462. ACM (2015)
Leverich, J., Kozyrakis, C.: Reconciling high server utilization and sub-millisecond quality-of-service. In: Proceedings of the Ninth European Conference on Computer Systems, pp. 4:1–4:14. ACM (2014)
Acknowledgments
This research was supported by a Grant of the SKT-SNU SDDC R&D Collaboration Program through the SK Telecom Corporate R&D Center funded by SK Telecom (Grant Number: 1519C00101-616052). It was also partly supported by Institute for Information & communications Technology Promotion (IITP) Grant funded by the Korea government (MSIP) (R0190-16-2012, High Performance Big Data Analytics Platform Performance Acceleration Technologies Development), and partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2064629). In addition, this work was partly supported by BK21 Plus for Pioneers in Innovative Computing (Dept. of Computer Science and Engineering, SNU) funded by National Research Foundation of Korea(NRF) (21A20151113068).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nam, Y., Kang, M., Sung, H. et al. Workload-aware resource management for software-defined compute. Cluster Comput 19, 1555–1570 (2016). https://doi.org/10.1007/s10586-016-0613-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0613-6