skip to main content
10.1145/2451116.2451126acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers

Published: 16 March 2013 Publication History

Abstract

As multicore processors with expanding core counts continue to dominate the server market, the overall utilization of the class of datacenters known as warehouse scale computers (WSCs) depends heavily on colocation of multiple workloads on each server to take advantage of the computational power provided by modern processors. However, many of the applications running in WSCs, such as websearch, are user-facing and have quality of service (QoS) requirements. When multiple applications are co-located on a multicore machine, contention for shared memory resources threatens application QoS as severe cross-core performance interference may occur. WSC operators are left with two options: either disregard QoS to maximize WSC utilization, or disallow the co-location of high-priority user-facing applications with other applications, resulting in low machine utilization and millions of dollars wasted.
This paper presents ReQoS, a static/dynamic compilation approach that enables low-priority applications to adaptively manipulate their own contentiousness to ensure the QoS of high-priority co-runners. ReQoS is composed of a profile guided compilation technique that identifies and inserts markers in contentious code regions in low-priority applications, and a lightweight runtime that monitors the QoS of high-priority applications and reactively reduces the pressure low-priority applications generate to the memory subsystem when cross-core interference is detected. In this work, we show that ReQoS can accurately diagnose contention and significantly reduce performance interference to ensure application QoS. Applying ReQoS to SPEC2006 and SmashBench workloads on real multicore machines, we are able to improve machine utilization by more than 70% in many cases, and more than 50% on average, while enforcing a 90% QoS threshold. We are also able to improve the energy efficiency of modern multicore machines by 47% on average over a policy of disallowing co-locations.

References

[1]
Intel 64 and ia-32 architectures software developer's manual volume 2b: Instruction set reference, m-z.
[2]
M. Banikazemi, D. Poff, and B. Abali. Pam: a novel performance/power aware meta-scheduler for multi-core systems. SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Nov 2008.
[3]
L. Barroso and U. Hölzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 4(1):1--108, 2009.
[4]
M. Bhadauria and S. McKee. An approach to resource-aware coscheduling for cmps. ICS 2010, Jun 2010.
[5]
S. Cho and L. Jin. Managing distributed, shared l2 caches through os-level page allocation. MICRO 39, Dec 2006.
[6]
E. Ebrahimi, C. Lee, O. Mutlu, and Y. Patt. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. ASPLOS 2010, Mar 2010.
[7]
A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. PACT 2007, Sep 2007.
[8]
F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. MIRCO 2007, pages 343--355, 2007.
[9]
A. Herdrich, R. Illikkal, R. Iyer, D. Newell, V. Chadha, and J. Moses. Rate-based qos techniques for cache/memory in cmp platforms. ICS '09: Proceedings of the 23rd international conference on Supercomputing, Jun 2009.
[10]
R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Jun 2007.
[11]
Y. Jiang, K. Tian, and X. Shen. Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. High Performance Embedded Architectures and Compilers, pages 201--215, 2010.
[12]
M. Kandemir, S. Muralidhara, S. Narayanan, Y. Zhang, and O. Ozturk. Optimizing shared cache behavior of chip multiprocessors. Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on DOI UR -, pages 505--516, 2009.
[13]
M. Kandemir, T. Yemliha, S. Muralidhara, S. Srikantaiah, M. Irwin, and Y. Zhnag. Cache topology aware computation mapping for multicores. PLDI '10, Jun 2010.
[14]
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. PACT 2004, Sep 2004.
[15]
R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using os observations to improve performance in multicore systems. IEEE Micro, 28(3):54--66, 2008.
[16]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. HPCA 2008, pages 367--378, 2008.
[17]
F. Liu, X. Jiang, and Y. Solihin. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. HPCA 2010, pages 1--12, 2010.
[18]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. PLDI '05, pages 190--200, New York, NY, USA, 2005. ACM.
[19]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubbleup: Increasing utilization in modern warehouse scale computers via sensible co-locations. In MICRO '11: Proceedings of The 44th Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA, 2011. ACM.
[20]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Increasing utilization in warehouse scale computers using bubble-up! Special Issue: IEEE Micro's Top Picks from 2011 Computer Architecture Conferences, 2012.
[21]
J. Mars, N. Vachharajani, R. Hundt, and M. Soffa. Contention aware execution: online contention detection and response. CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, Apr 2010.
[22]
D. Meisner, B. Gold, and T. Wenisch. Powernap: eliminating server idle power. ASPLOS '09: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, Feb 2009.
[23]
R. Nathuji, A. Kansal, and A. Ghaffarkhah. Q-clouds: managing performance interference effects for qos-aware clouds. EuroSys '10, Apr 2010.
[24]
K. Nesbit, N. Aggarwal, J. Laudon, and J. Smith. Fair queuing memory systems. MICRO 2006, pages 208 -- 222, 2006.
[25]
M. Qureshi and Y. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Dec 2006.
[26]
G. Ren, E. Tune, T. Moseley, Y. Shi, S. Rus, and R. Hundt. Googlewide profiling: A continuous profiling infrastructure for data centers. IEEE Micro, 30:65--79, 2010.
[27]
S. Rus, R. Ashok, and D. Li. Automated locality optimization based on the reuse distance of string operations. CGO '11, pages 181--190, Apr 2011.
[28]
A. Sandberg, D. Eklöv, and E. Hagersten. Reducing cache pollution through detection and elimination of non-temporal memory accesses. SC 2010, Nov 2010.
[29]
L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer. Micro 2008, pages 258 -- 269, 2008.
[30]
S. Son, M. Kandemir, M. Karakoy, and D. Chakrabarti. A compilerdirected data prefetching scheme for chip multiprocessors. PPoPP 2009, Feb 2009.
[31]
S. Srikantaiah, M. Kandemir, and M. Irwin. Adaptive set pinning: managing shared caches in chip multiprocessors. ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, Mar 2008.
[32]
L. Tang, J. Mars, and M. L. Soffa. Compiling for niceness: mitigating contention for qos in warehouse scale computers. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pages 1--12, New York, NY, USA, 2012. ACM.
[33]
L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, pages 283--294, 2011.
[34]
D. Xu, C. Wu, and P.-C. Yew. On mitigating memory bandwidth contention through bandwidth-aware scheduling. PACT 2010, Sep 2010.
[35]
T. Yang, T. Liu, E. D. Berger, S. F. Kaplan, and J. E. B. Moss. Redline: first class support for interactivity in commodity operating systems. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI'08, pages 73--86, Berkeley, CA, USA, 2008. USENIX Association.
[36]
E. Zhang, Y. Jiang, and X. Shen. Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs? PPoPP 2010, pages 203--212, 2010.
[37]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. ASPLOS 2010, Mar 2010.

Cited By

View all
  • (2024)Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00098(468-477)Online publication date: 27-May-2024
  • (2023)Online Performance Modeling and Prediction for Single-VM Applications in Multi-Tenant CloudsIEEE Transactions on Cloud Computing10.1109/TCC.2021.307869011:1(97-110)Online publication date: 1-Jan-2023
  • (2023)MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071035(828-841)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
      March 2013
      574 pages
      ISBN:9781450318709
      DOI:10.1145/2451116
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
        ASPLOS '13
        March 2013
        540 pages
        ISSN:0163-5964
        DOI:10.1145/2490301
        Issue’s Table of Contents
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 48, Issue 4
        ASPLOS '13
        April 2013
        540 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2499368
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 March 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. compiler
      2. contention
      3. cross-core interference
      4. datacenter
      5. dynamic techniques
      6. multicore
      7. online adaptation
      8. quality of service
      9. runtime systems
      10. warehouse scale computers

      Qualifiers

      • Research-article

      Conference

      ASPLOS '13

      Acceptance Rates

      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)25
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00098(468-477)Online publication date: 27-May-2024
      • (2023)Online Performance Modeling and Prediction for Single-VM Applications in Multi-Tenant CloudsIEEE Transactions on Cloud Computing10.1109/TCC.2021.307869011:1(97-110)Online publication date: 1-Jan-2023
      • (2023)MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071035(828-841)Online publication date: Feb-2023
      • (2022)Guaranteeing Performance SLAs of Cloud Applications Under Resource StormsIEEE Transactions on Cloud Computing10.1109/TCC.2020.298537210:2(1329-1343)Online publication date: 1-Apr-2022
      • (2022)PIMCloud: QoS-Aware Resource Management of Latency-Critical Applications in Clouds with Processing-in-Memory2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00083(1086-1099)Online publication date: Apr-2022
      • (2020)uPredict: A User-Level Profiler-Based Predictive Framework in Multi-Tenant Clouds2020 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E48712.2020.00015(73-82)Online publication date: Apr-2020
      • (2020)DiHi: Distributed and Hierarchical Performance Modeling of Multi-VM Cloud Running Applications2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS50907.2020.00002(1-10)Online publication date: Dec-2020
      • (2020)CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00025(193-206)Online publication date: Feb-2020
      • (2019)DICERProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337891(1-10)Online publication date: 5-Aug-2019
      • (2019)CaliperACM Transactions on Architecture and Code Optimization10.1145/332309016:3(1-25)Online publication date: 17-Jun-2019
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media