research-article

Reconciling high server utilization and sub-millisecond quality-of-service

Authors:
Jacob Leverich

Stanford University

Stanford University
View Profile

,
Christos Kozyrakis

Stanford University

Stanford University
View Profile

EuroSys '14: Proceedings of the Ninth European Conference on Computer SystemsApril 2014Article No.: 4Pages 1–14https://doi.org/10.1145/2592798.2592821

Published:14 April 2014Publication History

EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems

Pages 1–14

ABSTRACT

The simplest strategy to guarantee good quality of service (QoS) for a latency-sensitive workload with sub-millisecond latency in a shared cluster environment is to never run other workloads concurrently with it on the same server. Unfortunately, this inevitably leads to low server utilization, reducing both the capability and cost effectiveness of the cluster.

In this paper, we analyze the challenges of maintaining high QoS for low-latency workloads when sharing servers with other workloads. We show that workload co-location leads to QoS violations due to increases in queuing delay, scheduling delay, and thread load imbalance. We present techniques that address these vulnerabilities, ranging from provisioning the latency-critical service in an interference aware manner, to replacing the Linux CFS scheduler with a scheduler that provides good latency guarantees and fairness for co-located workloads. Ultimately, we demonstrate that some latency-critical workloads can be aggressively co-located with other workloads, achieve good QoS, and that such co-location can improve a datacenter's effective throughput per TCO-$ by up to 52%.

References

Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload Analysis of a Large-Scale Key-Value Store. SIGMETRICS, 2012. Google ScholarDigital Library
Luiz Andre Barroso. Warehouse-Scale Computing: Entering the Teenage Decade. ISCA, 2011.Google Scholar
Juan A. Colmenares et al. Tessellation: Refactoring the OS Around Explicit Resource Containers with Continuous Adaptation. DAC, 2013. Google ScholarDigital Library
Jeffrey Dean and Luiz André Barroso. The Tail at Scale. Commununications of the ACM, 56(2):74--80, February 2013. Google ScholarDigital Library
Christina Delimitrou and Christos Kozyrakis. iBench: Quantifying Interference for Datacenter Workloads. IISWC, 2013.Google Scholar
Christina Delimitrou and Christos Kozyrakis. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. ASPLOS, 2013. Google ScholarDigital Library
Kenneth J Duda and David R Cheriton. Borrowed-Virtual-Time (BVT) Scheduling: Supporting Latency-Sensitive Threads in a General-Purpose Scheduler. SOSP, 1999. Google ScholarDigital Library
Frank C. Eigler, Vara Prasad, Will Cohen, Hien Nguyen, Martin Hunt, Jim Keniston, and Brad Chen. Architecture of Systemtap: A Linux Trace/Probe Tool, 2005.Google Scholar
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. Dark Silicon and the End of Multicore Scaling. ISCA, 2011. Google ScholarDigital Library
Gartner says efficient data center design can lead to 300 percent capacity growth in 60 percent less space. http://www.gartner.com/newsroom/id/1472714, 2010.Google Scholar
Donald Gross, John F Shortle, James M Thompson, and Carl M Harris. Fundamentals of Queueing Theory. Wiley, 2013. Google ScholarDigital Library
Andrew Herdrich, Ramesh Illikkal, Ravi Iyer, Ronak Singhal, Matt Merten, and Martin Dixon. SMT QoS: Hardware Prototyping of Thread-level Performance Differentiation Mechanisms. HotPar, 2012.Google Scholar
Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. NSDI, 2011. Google ScholarDigital Library
Urs Hoelzle and Luiz Andre Barroso. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 1st edition, 2009. Google ScholarDigital Library
Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, Balaj i Prabhakar, Changhoon Kim, and Albert Greenberg. EyeQ: Practical Network Performance Isolation at the Edge. NSDI, 2013. Google ScholarDigital Library
James M Kaplan, William Forrest, and Noah Kindler. Revolutionizing Data Center Energy Efficiency. Technical report, McKinsey & Company, 2008.Google Scholar
Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M Voelker, and Ãmin Vahdat. Chronos: Predictable Low Latency for Data Center Applications. SOCC, 2012. Google ScholarDigital Library
David G Kendall. Stochastic Processes Occurring in the Theory of Queues and their Analysis by the Method of the Imbedded Markov Chain. The Annals of Mathematical Statistics, 1953.Google ScholarCross Ref
Kevin Lim, Parthasarathy Ranganathan, Jichuan Chang, Chandrakant Patel, Trevor Mudge, and Steven Reinhardt. Understanding and designing new server architectures for emerging warehouse-computing environments. ISCA, 2008. Google ScholarDigital Library
Chung Laung Liu and James W Layland. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment. Journal of the ACM, 20(1):46--61, 1973. Google ScholarDigital Library
Huan Liu. A Measurement Study of Server Utilization in Public Clouds. In Proc. of the Intl. Conference on Dependable, Autonomic and Secure Computing, 2011. Google ScholarDigital Library
Jason Mars, Lingjia Tang, and Robert Hundt. Heterogeneity in "Homogeneous" Warehouse-Scale Computers: A Performance Opportunity. IEEE Computer Architecture Letters, 10(2), 2011. Google ScholarDigital Library
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In Proc. of the Intl. Symposium on Microarchitecture, 2011. Google ScholarDigital Library
David Meisner, Junjie Wu, and Thomas F Wenisch. Big-House: A Simulation Infrastructure for Data Center Systems. ISPASS, 2012. Google ScholarDigital Library
Onur Mutlu and Thomas Moscibroda. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In Proc. of the Intl. Symposium on Microarchitecture, 2007.Google Scholar
Kyle J. Nesbit, Nidhi Aggarwal, James Laudon, and James E. Smith. Fair Queuing Memory Systems. In Proc. of the Intl. Symposium on Microarchitecture, 2006. Google ScholarDigital Library
John Ousterhout et al. The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM. ACM SIGOPS Operating Systems Review, 43(4), 2010. Google ScholarDigital Library
Chandandeep Singh Pabla. Completely fair scheduler. Linux Journal, 2009(184):4, 2009. Google ScholarDigital Library
Chandrakant D. Patel and Amip J. Shah. Cost Model for Planning, Development and Operation of a Data Center. Technical report HPL-2005-107R1, Hewlett-Packard Labs, 2005.Google Scholar
Aleksey Pesterev, Jacob Strauss, Nickolai Zeldovich, and Robert T Morris. Improving Network Connection Locality on Multicore Systems. EuroSys, 2012. Google ScholarDigital Library
Moinuddin K. Qureshi and Yale N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the Intl. Symposium on Microarchitecture, 2006. Google ScholarDigital Library
Charles Reiss, Alexey Tumanov, Gregory R Ganger, Randy H Katz, and Michael A Kozuch. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. SOCC, 2012. Google ScholarDigital Library
Barret Rhoden, Kevin Klues, David Zhu, and Eric Brewer. Improving Per-Node Efficiency in the Datacenter with New OS Abstractions. SOCC, 2011. Google ScholarDigital Library
Efraim Rotem, Alon Naveh, Doron Rajwan, Avinash Ananthakrishnan, and Eliezer Weissmann. Power-Management Architecture of the Intel Microarchitecture Code-named Sandy Bridge. IEEE Micro, 32(2), 2012. Google ScholarDigital Library
Paul Saab. Scaling memcached at Facebook. https://www.facebook.com/note.php?note_id=39391378919, December 2008.Google Scholar
Daniel Sanchez and Christos Kozyrakis. Scalable and Efficient Fine-Grained Cache Partitioning with Vantage. IEEE Micro's Top Picks, 32(3), May-June 2012. Google ScholarDigital Library
Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. Omega: Flexible, Scalable Schedulers for Large Compute Clusters. EuroSys, 2013. Google ScholarDigital Library
David Shue, Michael J Freedman, and Anees Shaikh. Performance Isolation and Fairness for Multi-Tenant Cloud Storage. OSDI, 2012. Google ScholarDigital Library
Paul Turner, Bharata B Rao, and Nikhil Rao. CPU Bandwidth Control for CFS. Linux Symposium, 2010.Google Scholar
Arunchandar Vasan, Anand Sivasubramaniam, Vikrant Shimpi, T Sivabalan, and Rajesh Subbiah. Worth Their Watts?--An Empirical Study of Datacenter Servers. HPCA, 2010.Google ScholarCross Ref
VMware. VMware Infrastructure: Resource Management with VMware DRS. White paper, VMware, 2006.Google Scholar
Wenji Wu, Phil DeMar, and Matt Crawford. Why Can Some Advanced Ethernet NICs Cause Packet Reordering? IEEE Communications Letters, 15(2):253--255, 2011.Google ScholarCross Ref
Yunjing Xu, Zachary Musgrave, Brian Noble, and Michael Bailey. Bobtail: Avoiding Long Tails in the Cloud. NSDI, 2013. Google ScholarDigital Library
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. Bubble-Flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers. ISCA, 2013. Google ScholarDigital Library
Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. CPI²: CPU Performance Isolation for Shared Compute Clusters. EuroSys, 2013.Google ScholarDigital Library

Reconciling high server utilization and sub-millisecond quality-of-service

Recommendations

Improving server utilization using fast virtual machine migration

Live virtual machine (VM) migration is a technique for transferring an active VM from one physical host to another without disrupting the VM. In principle, live VM migration enables dynamic resource requirements to be matched with available physical ...
Read More
Towards high-quality I/O virtualization
SYSTOR '09: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference

High-quality I/O virtualization (that is, complete device semantics, full-feature set, close-to-native performance and real-time response) is critical to both server and client virtualizations. Existing solutions for I/O virtualization (e.g., full ...
Read More
Improving server utilization via resource-adaptive batch VMs: poster
Middleware '17: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference: Posters and Demos

Public cloud data centers often suffer from low resource utilization [1]. To increase utilization, recent works have proposed running batch workloads next to customer VMs to leverage idle resources [6]. While effective, the key challenge here is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems
April 2014
388 pages
ISBN:9781450327046
DOI:10.1145/2592798
General Chairs:
Dick Bultermann
CWI
,
Herbert Bos
Vrije Universiteit Amsterdam
,
Program Chairs:
Ant Rowstron
Microsoft Research Cambridge
,
Peter Druschel
MPI SWS
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 April 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
EuroSys '14 Paper Acceptance Rate27of147submissions,18%Overall Acceptance Rate241of1,308submissions,18%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 178
  Total Citations
  View Citations
- 1,334
  Total Downloads
- Downloads (Last 12 months)112
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reconciling high server utilization and sub-millisecond quality-of-service

EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems

ABSTRACT

References

Cited By

Recommendations

Improving server utilization using fast virtual machine migration

Towards high-quality I/O virtualization

Improving server utilization via resource-adaptive batch VMs: poster

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reconciling high server utilization and sub-millisecond quality-of-service

EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems

ABSTRACT

References

Cited By

Recommendations

Improving server utilization using fast virtual machine migration

Towards high-quality I/O virtualization

Improving server utilization via resource-adaptive batch VMs: poster

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media