skip to main content
10.1145/3330345.3330364acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

QoSMT: supporting precise performance control for simultaneous multithreading architecture

Published: 26 June 2019 Publication History

Abstract

Simultaneous multithreading (SMT) technology improves CPU throughput, but also causes unpredictable performance fluctuations for co-located workloads. Although recent major SMT processors have adopted some techniques to promote hardware support for quality-of-service (QoS), achieving both precise performance control and high throughput on SMT architectures is still a challenging open problem.
In this paper, we perform some comprehensive experiments on real SMT systems and cycle-accurate simulators. From these experiments, we observe that almost all in-core resources may suffer from severe contention as workloads vary. We consider this observation as the fundamental reason leading to the challenging problem above. Thus, we introduce QoSMT, a novel hardware scheme that leverages a closed-loop controlling mechanism to enforce precise performance control for specific targets, e.g. achieving 85%, 90% or 95% of the performance of a workload running alone respectively. We implement a prototype on GEM5 simulator. Experimental results show that the control error is only 1.4%, 0.5% and 3.6%.

References

[1]
Alibaba. 2018. Alibaba Innovative Research. https://102.alibaba.com/fund/proposalAbout.htm.
[2]
AMD. 2016. The Zen Core Architecture AMD. http://www.amd.com/en-gb/innovations/software-technologies/zen-cpu.
[3]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7.
[4]
C. Boneti, F. J. Cazorla, R. Gioiosa, A. Buyuktosunoglu, C. Y. Cher, and M. Valero. 2008. Software-Controlled Priority Characterization of POWER5 Processor. In 2008 International Symposium on Computer Architecture. 415--426.
[5]
David M. Brooks, Vivek Tiwari, and Margaret Martonosi. 2000. Wattch: a framework for architectural-level power analysis and optimizations. In 27th International Symposium on Computer Architecture (ISCA 2000), June 10--14, 2000, Vancouver, BC, Canada. 83--94.
[6]
Francisco J. Cazorla, Peter M. W. Knijnenburg, Rizos Sakellariou, Enrique Fernandez, Alex Ramirez, and Mateo Valero. 2006. Predictable Performance in SMT Processors: Synergy Between the OS and SMTs. IEEE Trans. Comput. 55, 7 (July 2006), 785--799.
[7]
Francisco J. Cazorla, Alex Ramirez, Mateo Valero, and Enrique Fernandez. 2004. Dynamically Controlled Resource Allocation in SMT Processors. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 37). IEEE Computer Society, Washington, DC, USA, 171--182.
[8]
F. J. Cazorla, A. Ramirez, M. Valero, P. M. W. Knijnenburg, R. Sakellariou, and E. Fernandez. 2004. QoS for high-performance SMT processors in embedded systems. IEEE Micro 24, 4 (July 2004), 24--31.
[9]
Seungryul Choi and Donald Yeung. 2006. Learning-Based SMT Processor Resource Distribution via Hill-Climbing. SIGARCH Comput. Archit. News 34, 2 (May 2006), 239--251.
[10]
Seungryul Choi and Donald Yeung. 2009. Hill-climbing SMT Processor Resource Distribution. ACM Trans. Comput. Syst. 27, 1, Article 1 (Feb. 2009), 47 pages.
[11]
Gautham K. Dorai and Donald Yeung. 2002. Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT '02). IEEE Computer Society, Washington, DC, USA, 30-. http://dl.acm.org/citation.cfm?id=645989.674324
[12]
Stijn Eyerman and Lieven Eeckhout. 2009. Per-thread Cycle Accounting in SMT Processors. SIGPLAN Not. 44, 3 (March 2009), 133--144.
[13]
Stijn Eyerman and Lieven Eeckhout. 2010. Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling. SIGPLANNot. 45, 3 (March 2010), 91--102.
[14]
Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2006. A performance counter architecture for computing accurate CPI components. (2006), 175--184.
[15]
Stijn Eyerman, James E. Smith, and Lieven Eeckhout. 2006. Characterizing the branch misprediction penalty. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2006, March 19--21, 2006, Austin, Texas, USA, Proceedings. 48--58.
[16]
J. Feliu, S. Eyerman, J. Sahuquillo, and S. Petit. 2016. Symbiotic job scheduling on the IBM POWER8. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 669--680.
[17]
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.
[18]
INTEL. 2005. Introduction to Cache Allocation Technology in the Intel Xeon Processor E5 v4 Family. https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology.
[19]
INTEL. 2016. 64-ia-32-architectures-software-developer-vol-3b-part-2-manual. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf.
[20]
S. Jasmine Madonna, Satish Kumar Sadasivam, and Prathiba Kumar. 2015. Bandwidth-Aware Resource Optimization for SMT Processors. Springer International Publishing, Cham, 49--59.
[21]
D. Koufaty and D. T. Marr. 2003. Hyperthreading technology in the netburst microarchitecture. IEEE Micro 23, 2 (March 2003), 56--65.
[22]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), December 12--16, 2009, New York, New York, USA. 469--480.
[23]
Microsoft. 2017. Azure SQL Database. https://azure.microsoft.com/en-us/pricing/details/sql-database/elastic/.
[24]
A. Morari, C. Boneti, F. J. Cazorla, R. Gioiosa, C. Y. Cher, A. Buyuktosunoglu, P. Bose, and M. Valero. 2013. SMT Malleability in IBM POWER5 and POWER6 Processors. IEEE Trans. Comput. 62, 4 (April 2013), 813--826.
[25]
Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for Accurate and Efficient Simulation. SIGMETRICS Perform. Eval. Rev. 31, 1 (June 2003), 318--319.
[26]
Steven E. Raasch and Steven K. Reinhardt. 2003. The Impact of Resource Partitioning on SMT Processors. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques(PACT 03). IEEE Computer Society, Washington, DC, USA, 15-. http://dl.acm.org/citation.cfm?id=942806.943858
[27]
Joseph Sharkey, Deniz Balkan, and Dmitry Ponomarev. 2006. Adaptive Reorder Buffers for SMT Processors. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT '06). ACM, New York, NY, USA, 244--253.
[28]
Balaram Sinharoy, Ronald N Kalla, Joel M Tendler, Richard J Eickemeyer, and Jody B Joyner. 2005. POWER5 system microarchitecture. IBM journal of research and development 49, 4.5 (2005), 505--521.
[29]
Balaram Sinharoy, JA Van Norstrand, Richard J Eickemeyer, Hung Q Le, Jens Leenstra, Dung Q Nguyen, B Konigsburg, K Ward, MD Brown, José E Moreira, et al. 2015. IBM POWER8 processor core microarchitecture. IBM Journal of Research and Development 59, 1 (2015), 2--1.
[30]
Hans Vandierendonck and André Seznec. 2011. Managing SMT Resource Usage Through Speculative Instruction Window Weighting. ACM Trans. Archit. Code Optim. 8, 3, Article 12 (Oct. 2011), 20 pages.
[31]
Vladimir Čakarević, Petar Radojković, Javier Verdú, Alex Pajuelo, Francisco J. Cazorla, Mario Nemirovsky, and Mateo Valero. 2009. Characterizing the Resource-sharing Levels in the UltraSPARC T2 Processor. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA, 481--492.
[32]
Ricardo A Velásquez, Pierre Michaud, and André Seznec. 2013. Selecting benchmark combinations for the evaluation of multicore throughput. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on. IEEE, 173--182.
[33]
Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang. 2014. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 406--418.

Cited By

View all
  • (2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
  • (2024)Flexible Computing: A New Framework for Improving Resource Allocation and Scheduling in Elastic ComputingIEEE Transactions on Services Computing10.1109/TSC.2024.3489433(1-14)Online publication date: 2024
  • (2024)Exploring Machine Learning Approaches for QoS Prediction on SMT Processors2024 11th International Conference on Future Internet of Things and Cloud (FiCloud)10.1109/FiCloud62933.2024.00054(302-307)Online publication date: 19-Aug-2024
  • Show More Cited By

Index Terms

  1. QoSMT: supporting precise performance control for simultaneous multithreading architecture

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '19: Proceedings of the ACM International Conference on Supercomputing
    June 2019
    533 pages
    ISBN:9781450360791
    DOI:10.1145/3330345
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. QoS
    2. SMT interference
    3. data center
    4. performance predictability

    Qualifiers

    • Research-article

    Funding Sources

    • National Key R&D Program of China (2016YFB1000201)
    • Primary Research & Development Plan of Shaanxi Province(2019TSLGY08-03)
    • Youth Innovation Promotion Association of Chinese Academy of Sciences (2013073)
    • National Natural Science Foundation of China (Grant No. 61420106013 and 61702480)

    Conference

    ICS '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
    • (2024)Flexible Computing: A New Framework for Improving Resource Allocation and Scheduling in Elastic ComputingIEEE Transactions on Services Computing10.1109/TSC.2024.3489433(1-14)Online publication date: 2024
    • (2024)Exploring Machine Learning Approaches for QoS Prediction on SMT Processors2024 11th International Conference on Future Internet of Things and Cloud (FiCloud)10.1109/FiCloud62933.2024.00054(302-307)Online publication date: 19-Aug-2024
    • (2023)GQoSMT: On Guaranteeing the Quality of Service Requirements of Simultaneous Multithreading Processors2023 8th International Conference on Computer Science and Engineering (UBMK)10.1109/UBMK59864.2023.10286669(234-239)Online publication date: 13-Sep-2023
    • (2022)SDCBench: A Benchmark Suite for Workload Colocation and Evaluation in DatacentersIntelligent Computing10.34133/2022/98106912022Online publication date: Jan-2022
    • (2020)A Learning-based Fetch Thread Gating Mechanism for A Simultaneous Multithreading Processor2020 Eighth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR51075.2020.00011(1-10)Online publication date: Nov-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media