research-article

QoSMT: supporting precise performance control for simultaneous multithreading architecture

Authors:

Yungang BaoAuthors Info & Claims

ICS '19: Proceedings of the ACM International Conference on Supercomputing

Pages 206 - 216

https://doi.org/10.1145/3330345.3330364

Published: 26 June 2019 Publication History

Abstract

Simultaneous multithreading (SMT) technology improves CPU throughput, but also causes unpredictable performance fluctuations for co-located workloads. Although recent major SMT processors have adopted some techniques to promote hardware support for quality-of-service (QoS), achieving both precise performance control and high throughput on SMT architectures is still a challenging open problem.

In this paper, we perform some comprehensive experiments on real SMT systems and cycle-accurate simulators. From these experiments, we observe that almost all in-core resources may suffer from severe contention as workloads vary. We consider this observation as the fundamental reason leading to the challenging problem above. Thus, we introduce QoSMT, a novel hardware scheme that leverages a closed-loop controlling mechanism to enforce precise performance control for specific targets, e.g. achieving 85%, 90% or 95% of the performance of a workload running alone respectively. We implement a prototype on GEM5 simulator. Experimental results show that the control error is only 1.4%, 0.5% and 3.6%.

References

[1]

Alibaba. 2018. Alibaba Innovative Research. https://102.alibaba.com/fund/proposalAbout.htm.

[2]

AMD. 2016. The Zen Core Architecture AMD. http://www.amd.com/en-gb/innovations/software-technologies/zen-cpu.

[3]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7.

Digital Library

[4]

C. Boneti, F. J. Cazorla, R. Gioiosa, A. Buyuktosunoglu, C. Y. Cher, and M. Valero. 2008. Software-Controlled Priority Characterization of POWER5 Processor. In 2008 International Symposium on Computer Architecture. 415--426.

Digital Library

[5]

David M. Brooks, Vivek Tiwari, and Margaret Martonosi. 2000. Wattch: a framework for architectural-level power analysis and optimizations. In 27th International Symposium on Computer Architecture (ISCA 2000), June 10--14, 2000, Vancouver, BC, Canada. 83--94.

Digital Library

[6]

Francisco J. Cazorla, Peter M. W. Knijnenburg, Rizos Sakellariou, Enrique Fernandez, Alex Ramirez, and Mateo Valero. 2006. Predictable Performance in SMT Processors: Synergy Between the OS and SMTs. IEEE Trans. Comput. 55, 7 (July 2006), 785--799.

Digital Library

[7]

Francisco J. Cazorla, Alex Ramirez, Mateo Valero, and Enrique Fernandez. 2004. Dynamically Controlled Resource Allocation in SMT Processors. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 37). IEEE Computer Society, Washington, DC, USA, 171--182.

Digital Library

[8]

F. J. Cazorla, A. Ramirez, M. Valero, P. M. W. Knijnenburg, R. Sakellariou, and E. Fernandez. 2004. QoS for high-performance SMT processors in embedded systems. IEEE Micro 24, 4 (July 2004), 24--31.

Digital Library

[9]

Seungryul Choi and Donald Yeung. 2006. Learning-Based SMT Processor Resource Distribution via Hill-Climbing. SIGARCH Comput. Archit. News 34, 2 (May 2006), 239--251.

Digital Library

[10]

Seungryul Choi and Donald Yeung. 2009. Hill-climbing SMT Processor Resource Distribution. ACM Trans. Comput. Syst. 27, 1, Article 1 (Feb. 2009), 47 pages.

Digital Library

[11]

Gautham K. Dorai and Donald Yeung. 2002. Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT '02). IEEE Computer Society, Washington, DC, USA, 30-. http://dl.acm.org/citation.cfm?id=645989.674324

Digital Library

[12]

Stijn Eyerman and Lieven Eeckhout. 2009. Per-thread Cycle Accounting in SMT Processors. SIGPLAN Not. 44, 3 (March 2009), 133--144.

Digital Library

[13]

Stijn Eyerman and Lieven Eeckhout. 2010. Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling. SIGPLANNot. 45, 3 (March 2010), 91--102.

Digital Library

[14]

Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2006. A performance counter architecture for computing accurate CPI components. (2006), 175--184.

Digital Library

[15]

Stijn Eyerman, James E. Smith, and Lieven Eeckhout. 2006. Characterizing the branch misprediction penalty. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2006, March 19--21, 2006, Austin, Texas, USA, Proceedings. 48--58.

[16]

J. Feliu, S. Eyerman, J. Sahuquillo, and S. Petit. 2016. Symbiotic job scheduling on the IBM POWER8. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 669--680.

[17]

John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.

Digital Library

[18]

INTEL. 2005. Introduction to Cache Allocation Technology in the Intel Xeon Processor E5 v4 Family. https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology.

[19]

INTEL. 2016. 64-ia-32-architectures-software-developer-vol-3b-part-2-manual. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf.

[20]

S. Jasmine Madonna, Satish Kumar Sadasivam, and Prathiba Kumar. 2015. Bandwidth-Aware Resource Optimization for SMT Processors. Springer International Publishing, Cham, 49--59.

[21]

D. Koufaty and D. T. Marr. 2003. Hyperthreading technology in the netburst microarchitecture. IEEE Micro 23, 2 (March 2003), 56--65.

Digital Library

[22]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), December 12--16, 2009, New York, New York, USA. 469--480.

Digital Library

[23]

Microsoft. 2017. Azure SQL Database. https://azure.microsoft.com/en-us/pricing/details/sql-database/elastic/.

[24]

A. Morari, C. Boneti, F. J. Cazorla, R. Gioiosa, C. Y. Cher, A. Buyuktosunoglu, P. Bose, and M. Valero. 2013. SMT Malleability in IBM POWER5 and POWER6 Processors. IEEE Trans. Comput. 62, 4 (April 2013), 813--826.

Digital Library

[25]

Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for Accurate and Efficient Simulation. SIGMETRICS Perform. Eval. Rev. 31, 1 (June 2003), 318--319.

Digital Library

[26]

Steven E. Raasch and Steven K. Reinhardt. 2003. The Impact of Resource Partitioning on SMT Processors. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques(PACT 03). IEEE Computer Society, Washington, DC, USA, 15-. http://dl.acm.org/citation.cfm?id=942806.943858

Digital Library

[27]

Joseph Sharkey, Deniz Balkan, and Dmitry Ponomarev. 2006. Adaptive Reorder Buffers for SMT Processors. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT '06). ACM, New York, NY, USA, 244--253.

Digital Library

[28]

Balaram Sinharoy, Ronald N Kalla, Joel M Tendler, Richard J Eickemeyer, and Jody B Joyner. 2005. POWER5 system microarchitecture. IBM journal of research and development 49, 4.5 (2005), 505--521.

Digital Library

[29]

Balaram Sinharoy, JA Van Norstrand, Richard J Eickemeyer, Hung Q Le, Jens Leenstra, Dung Q Nguyen, B Konigsburg, K Ward, MD Brown, José E Moreira, et al. 2015. IBM POWER8 processor core microarchitecture. IBM Journal of Research and Development 59, 1 (2015), 2--1.

Digital Library

[30]

Hans Vandierendonck and André Seznec. 2011. Managing SMT Resource Usage Through Speculative Instruction Window Weighting. ACM Trans. Archit. Code Optim. 8, 3, Article 12 (Oct. 2011), 20 pages.

Digital Library

[31]

Vladimir Čakarević, Petar Radojković, Javier Verdú, Alex Pajuelo, Francisco J. Cazorla, Mario Nemirovsky, and Mateo Valero. 2009. Characterizing the Resource-sharing Levels in the UltraSPARC T2 Processor. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA, 481--492.

Digital Library

[32]

Ricardo A Velásquez, Pierre Michaud, and André Seznec. 2013. Selecting benchmark combinations for the evaluation of multicore throughput. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on. IEEE, 173--182.

[33]

Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang. 2014. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 406--418.

Digital Library

Cited By

Liao HLiu TGuo JHuang BYang DDing J(2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
https://doi.org/10.1109/TPDS.2024.3494879
Cao WGu JMing ZCai ZWang YJi CXiao ZFeng YLiu YZhang L(2024)Flexible Computing: A New Framework for Improving Resource Allocation and Scheduling in Elastic ComputingIEEE Transactions on Services Computing10.1109/TSC.2024.3489433(1-14)Online publication date: 2024
https://doi.org/10.1109/TSC.2024.3489433
Sari SDemir OKucuk G(2024)Exploring Machine Learning Approaches for QoS Prediction on SMT Processors2024 11th International Conference on Future Internet of Things and Cloud (FiCloud)10.1109/FiCloud62933.2024.00054(302-307)Online publication date: 19-Aug-2024
https://doi.org/10.1109/FiCloud62933.2024.00054
Show More Cited By

Index Terms

QoSMT: supporting precise performance control for simultaneous multithreading architecture
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing

Recommendations

Predictable performance in SMT processors
CF '04: Proceedings of the 1st conference on Computing frontiers

Current instruction fetch policies in SMT processors are oriented towards optimization of overall throughput and/or fairness. However, they provide no control over how individual threads are executed, leading to performance unpredictability, since the ...
Predictable Performance in SMT Processors: Synergy between the OS and SMTs

Current Operating Systems (OS) perceive the different contexts of Simultaneous Multithreaded (SMT) processors as multiple independent processing units, although, in reality, threads executed in these units compete for the same hardware resources. ...
Architectural support for real-time task scheduling in SMT processors
CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems

In Simultaneous Multithreaded (SMT) architectures most hardware resources are shared between threads. This provides a good cost/performance trade-off which renders these architectures suitable for use in embedded systems. However, since threads share ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '19: Proceedings of the ACM International Conference on Supercomputing

June 2019

533 pages

ISBN:9781450360791

DOI:10.1145/3330345

General Chair:
Rudolf Eigenmann
University of Delaware
,
Program Chairs:
Chen Ding
University of Rochester
,
Sally A. McKee
Clemson University

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China (2016YFB1000201)
Primary Research & Development Plan of Shaanxi Province(2019TSLGY08-03)
Youth Innovation Promotion Association of Chinese Academy of Sciences (2013073)
National Natural Science Foundation of China (Grant No. 61420106013 and 61702480)

Conference

ICS '19

Sponsor:

SIGARCH

ICS '19: 2019 International Conference on Supercomputing

June 26 - 28, 2019

Arizona, Phoenix

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
380
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liao HLiu TGuo JHuang BYang DDing J(2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
https://doi.org/10.1109/TPDS.2024.3494879
Cao WGu JMing ZCai ZWang YJi CXiao ZFeng YLiu YZhang L(2024)Flexible Computing: A New Framework for Improving Resource Allocation and Scheduling in Elastic ComputingIEEE Transactions on Services Computing10.1109/TSC.2024.3489433(1-14)Online publication date: 2024
https://doi.org/10.1109/TSC.2024.3489433
Sari SDemir OKucuk G(2024)Exploring Machine Learning Approaches for QoS Prediction on SMT Processors2024 11th International Conference on Future Internet of Things and Cloud (FiCloud)10.1109/FiCloud62933.2024.00054(302-307)Online publication date: 19-Aug-2024
https://doi.org/10.1109/FiCloud62933.2024.00054
Küçük GTokatlı NNezir UPektaş EMete EGökçek GGüney MAlsharif SBaiat Z(2023)GQoSMT: On Guaranteeing the Quality of Service Requirements of Simultaneous Multithreading Processors2023 8th International Conference on Computer Science and Engineering (UBMK)10.1109/UBMK59864.2023.10286669(234-239)Online publication date: 13-Sep-2023
https://doi.org/10.1109/UBMK59864.2023.10286669
Yang YKong XZhao LLi YZhang HLi JQi HLi K(2022)SDCBench: A Benchmark Suite for Workload Colocation and Evaluation in DatacentersIntelligent Computing10.34133/2022/98106912022Online publication date: Jan-2022
https://doi.org/10.34133/2022/9810691
Ide YYamasaki N(2020)A Learning-based Fetch Thread Gating Mechanism for A Simultaneous Multithreading Processor2020 Eighth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR51075.2020.00011(1-10)Online publication date: Nov-2020
https://doi.org/10.1109/CANDAR51075.2020.00011

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten