skip to main content
10.1145/2901318.2901330acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

PSLO: enforcing the Xth percentile latency and throughput SLOs for consolidated VM storage

Published: 18 April 2016 Publication History

Abstract

It is desirable but challenging to simultaneously support latency SLO at a pre-defined percentile, i.e., the Xth percentile latency SLO, and throughput SLO for consolidated VM storage. Ensuring the Xth percentile latency contributes to accurately differentiating service levels in the metric of the application-level latency SLO compliance, especially for the application built on multiple VMs. However, the Xth percentile latency SLO and throughput SLO enforcement are the opposite sides of the same coin due to the conflicting requirements for the level of IO concurrency. To address this challenge, this paper proposes PSLO, a framework supporting the Xth percentile latency and throughput SLOs under consolidated VM environment by precisely coordinating the level of IO concurrency and arrival rate for each VM issue queue. It is noted that PSLO can take full advantage of the available IO capacity allowed by SLO constraints to improve throughput or reduce latency with the best effort. We design and implement a PSLO prototype in the real VM consolidation environment created by Xen. Our extensive trace-driven prototype evaluation shows that our system is able to optimize the Xth percentile latency and throughput for consolidated VMs under SLO constraints.

References

[1]
Amazon EC2 website, 2015. URL http://aws.amazon.com/ec2.
[2]
Storagetrace, storage performance council (umass trace repository), 2015. URL http://traces.cs.umass.edu/index.php/Storage.
[3]
P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2003.
[4]
J. C. R. Bennett and H. Zhang. WF2Q: Worst-case fair weighted fair queueing. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), 1996.
[5]
D. D. Chambliss, G. A. Alvarez, P. Pandey, D. Jadav, J. Xu, R. Menon, and T. P. Lee. Performance virtualization for large-scale storage systems. In Proceedings 22nd International Symposium on Reliable Distributed Systems (SRDS), 2003.
[6]
J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56:74--80, 2013.
[7]
A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair queuing algorithm. Journal of Internetworking Research and Experience, 1(1):3--26, 1990.
[8]
G. F. Franklin, J. D. Powell, and M. Workman. Digital Control of Dynamic Systems. Addison-Wesley, 1998.
[9]
P. Goyal, H. M. Vin, and H. Chen. Start-time fair queuing: A scheduling algorithm for integrated services packet switching networks. IEEE/ACM Transactions on Networking, 5(5):690--704, 1997.
[10]
A. Gulati, A. Merchant, and P. J. Varman. pClock: an arrival curve based approach for QoS guarantees in shared storage systems. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2007.
[11]
A. Gulati, I. Ahmad, and C. A. Waldspurger. PARDA: proportional allocation of resources for distributed storage access. In Proccedings of the conference on File and storage technologies (FAST), 2009.
[12]
A. Gulati, A. Merchant, and P. J. Varman. mClock: handling throughput variability for hypervisor IO scheduling. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI), 2010.
[13]
A. Gulati, G. Shanmuganathan, X. Zhang, and P. Varman. Demand based hierarchical QoS using storage resource pools. In Proceedings of the USENIX Annual Technical Conference (ATC), 2012.
[14]
J. L. Hellerstein, Y. Diao, S. Parekh, and D. M. Tilbury. Feedback Control of Computing Systems. John Wiley & Sons, Inc., Hoboken, New Jersey, 2004.
[15]
V. Jalaparti, P. Bodik, S. Kandula, I. Menache, M. Rybalkin, and C. Yan. Speeding up distributed request-response workflows. In SIGCOMM, 2013.
[16]
W. Jin, J. S. Chase, and J. Kaur. Interposed proportional sharing for a storage service utility. ACM SIGMETRICS Performance Evaluation Review, 32(1):37--48, 2004.
[17]
M. Karlsson, C. Karamanolis, and X. Zhu. Triage: Performance differentiation for storage systems using adaptive control. ACM Transactions on Storage (TOS), 1(4):457--480, 2005.
[18]
S. Kavalanekar, B. L. Worthington, Q. Zhang, and V. Sharda. Characterization of storage workload traces from production windows servers. In IEEE International Symposium on Workload Characterization, 2008.
[19]
J. Li, N. K. Sharma, D. R. K. Ports, and S. D. Gribble. Tales of the tail: Hardware, OS, and application-level sources of tail latency. In Proceedings of the ACM symposium on Cloud computing (SoCC), 2014.
[20]
C. R. Lumb and R. Golding. D-SPTF: Decentralized request distribution in brick-based storage systems. SIGOPS Oper. Syst. Rev., 38(5):37--47, 2004.
[21]
C. R. Lumb, A. Merchant, and G. A. Alvarez. Façade: Virtual storage devices with performance guarantees. In Proccedings of the conference on File and storage technologies (FAST), 2003.
[22]
J. C. McCullough, J. Dunagan, A. Wolman, and A. C. Snoeren. Stout: An adaptive interface to scalable cloud storag. In Proceedings of the USENIX Annual Technical Conference (ATC), 2010.
[23]
A. Povzner, T. Kaldewey, S. Brandt, R. Golding, T. M. Wong, and C. Maltzahn. Efficient guaranteed disk request scheduling with fahrrad. In Proceedings of the 3th European conference on Computer systems (EuroSys), 2008.
[24]
M. Shreedhar and G. Varghese. Efficient fair queuing using deficit round-robin. IEEE/ACM Transactions on Networking, 4(3):375--385, 1996.
[25]
D. Shue, M. J. Freedman, and A. Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI), 2012.
[26]
L. Suresh, M. Canini, S. Schmid, and A. Feldmann. C3: Cutting tail latency in cloud data stores via adaptive replica selection. In Proceedings of the USENIX conference on Networked systems design and implementation (NSDI), 2015.
[27]
A. Vulimiri, P. B. Godfrey, R. Mittal, J. Sherry, S. Ratnasamy, and S. Shenker. Low latency via redundancy. In CoNEXT, 2013.
[28]
M. Wachs, M. Abd-El-Malek, E. Thereska, and G. R. Ganger. Argon: performance insulation for shared storage servers. In Proccedings of the conference on File and storage technologies (FAST), 2007.
[29]
A. Wang, S. Venkataraman, S. Alspaugh, R. Katz, and I. Stoica. Cake: enabling high-level SLOs on shared storage systems. In Proceedings of the ACM symposium on Cloud computing (SoCC), 2012.
[30]
J. C. Wu and S. A. Brandt. The design and implementation of Aqua: an adaptive quality of service aware object-based storage device. In Proceedings of the IEEE conference on Mass Storage Systems and Technologies (MSST), 2006.
[31]
Z. Wu, C. Yu, and H. V. Madhyastha. CosTLO: Cost-effective redundancy for lower latency variance on cloud storage services. In Proceedings of the USENIX conference on Networked systems design and implementation (NSDI), 2015.
[32]
Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: Avoiding long tails in the cloud. In Proceedings of the USENIX conference on Networked systems design and implementation (NSDI), 2013.
[33]
J. Zhang, A. Riska, A. Sivasubramaniam, Q. Wang, and E. Riedel. Storage performance virtualization via throughput and latency control. ACM Transactions on Storage (TOS), 2 (3):283--308, 2006.
[34]
T. Zhu, A. Tumanov, M. A. Kozuch, M. Harchol-Balter, and G. R. Ganger. PriorityMeister: Tail latency QoS for shared networked storage. In Proceedings of the ACM symposium on Cloud computing (SoCC), 2014.

Cited By

View all
  • (2025)FleetIO: Managing Multi-Tenant Cloud Storage with Multi-Agent Reinforcement LearningProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707229(478-492)Online publication date: 30-Mar-2025
  • (2024)zQoS: Unleashing full performance capabilities of NVMe SSDs while enforcing SLOs in distributed storage systemsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673156(618-628)Online publication date: 12-Aug-2024
  • (2024)A Latency-Predictable Cloud-Native Network Architecture based on XDP2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00090(661-668)Online publication date: 30-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroSys '16: Proceedings of the Eleventh European Conference on Computer Systems
April 2016
605 pages
ISBN:9781450342407
DOI:10.1145/2901318
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • US NSF
  • NSFC
  • National High-tech R&D Program of China (863 Program)

Conference

EuroSys '16
EuroSys '16: Eleventh EuroSys Conference 2016
April 18 - 21, 2016
London, United Kingdom

Acceptance Rates

EuroSys '16 Paper Acceptance Rate 38 of 180 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)FleetIO: Managing Multi-Tenant Cloud Storage with Multi-Agent Reinforcement LearningProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707229(478-492)Online publication date: 30-Mar-2025
  • (2024)zQoS: Unleashing full performance capabilities of NVMe SSDs while enforcing SLOs in distributed storage systemsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673156(618-628)Online publication date: 12-Aug-2024
  • (2024)A Latency-Predictable Cloud-Native Network Architecture based on XDP2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00090(661-668)Online publication date: 30-Oct-2024
  • (2023)Pushing Performance Isolation Boundaries into Application with pBoxProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613159(247-263)Online publication date: 23-Oct-2023
  • (2023)TailGuard: Tail Latency SLO Guaranteed Task Scheduling for Data-Intensive User-Facing Applications2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00042(898-909)Online publication date: Jul-2023
  • (2023)Taming Metadata-intensive HPC Jobs Through Dynamic, Application-agnostic QoS Control2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00015(47-61)Online publication date: May-2023
  • (2022)QWin: Core Allocation for Enforcing Differentiated Tail Latency SLOs at Shared Storage Backend2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS54860.2022.00109(1098-1109)Online publication date: Jul-2022
  • (2022)Burger-tree: A Three-Layer Cache-Conscious Tree Index for Persistent Memory2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00152(951-956)Online publication date: Dec-2022
  • (2022)Layered Contention Mitigation for Cloud Storage2022 IEEE 15th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD55607.2022.00036(167-178)Online publication date: Jul-2022
  • (2020)FVMProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488820(955-971)Online publication date: 4-Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media