research-article

PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization

Authors:
Pu Pang

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

https://orcid.org/0009-0004-3685-0901
View Profile

,
Yaoxuan Li

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

https://orcid.org/0009-0007-4894-3840
View Profile

,
Bo Liu

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

https://orcid.org/0009-0004-5163-9661
View Profile

,
Quan Chen

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

https://orcid.org/0000-0001-5832-0347
View Profile

,
Zhou Yu

Shuhai Lab, Huawei Cloud Computing Technologies Co., Ltd, Hangzhou, China

Shuhai Lab, Huawei Cloud Computing Technologies Co., Ltd, Hangzhou, China

https://orcid.org/0009-0004-8160-9770
View Profile

,
Zhibin Yu

Shuhai Lab, Huawei Cloud Computing Technologies Co., Ltd, Shenzhen, China

Shuhai Lab, Huawei Cloud Computing Technologies Co., Ltd, Shenzhen, China

https://orcid.org/0000-0001-8067-9612
View Profile

,
Deze Zeng

School of Computer Science, China University of Geosciences, Wuhan, China

School of Computer Science, China University of Geosciences, Wuhan, China

https://orcid.org/0000-0003-3276-1202
View Profile

,
Jingwen Leng

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

https://orcid.org/0000-0002-5660-5493
View Profile

,
Jieru Zhao

Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai, China

Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai, China

https://orcid.org/0000-0001-8211-2812
View Profile

,
Minyi Guo

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

https://orcid.org/0000-0003-0034-2302
View Profile

ICS '23: Proceedings of the 37th International Conference on SupercomputingJune 2023Pages 75–86https://doi.org/10.1145/3577193.3593709

Published:21 June 2023Publication History

ICS '23: Proceedings of the 37th International Conference on Supercomputing

Pages 75–86

ABSTRACT

Latency-critical applications directly interact with end users and often experience the diurnal load pattern. In production, best-effort applications are often co-located with them to utilize the idle cores at the low load. Meanwhile, modern computers are evolving towards heterogeneous NUMA architecture, where the cores have different computation abilities, memory access latencies and network communication delays. Prior co-location scheduling work did not consider the NUMA architecture, and failed to maximize the throughput of best-effort applications while ensuring the required QoS of latency-critical applications. Our investigation shows that NUMA effect has complex impacts on the latency of latency-critical applications and the throughput of best-effort applications. We therefore propose PAC, a preference-aware co-location scheduling scheme that considers the NUMA effect for heterogeneous NUMA architectures. PAC has a performance monitor and a core scheduler. Specifically, the performance monitor identifies the "dangerous" latency-critical applications that require upgrading core allocations. We propose two low-overhead scheduling strategies for the scheduler. The strategies identify the bottlenecks of applications and adjust core allocations accordingly. Experimental result shows that PAC improves the throughput of best-effort applications by 3.87× while ensuring the required QoS of latency-critical applications.

References

2023. Nginx. http://nginx.org.Google Scholar
2023. The xapian project. https://xapian.org.Google Scholar
Luiz André Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (2007), 33--37.Google ScholarDigital Library
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In 17th International Conference on Parallel Architectures and Compilation Techniques. 72--81.Google ScholarDigital Library
Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, and Minyi Guo. 2019. Avalon: Towards QoS Awareness and Improved Utilization through Multi-Resource Management in Datacenters. In ACM International Conference on Supercomputing. 272--283.Google ScholarDigital Library
Quan Chen, Shuai Xue, Shang Zhao, Shanpei Chen, Yihao Wu, Yu Xu, Zhuo Song, Tao Ma, Yong Yang, and Minyi Guo. 2020. Alita: comprehensive performance isolation through bias resource management for public clouds. In International Conference for High Performance Computing, Networking, Storage, and Analysis.Google ScholarCross Ref
Shuang Chen, Christina Delimitrou, and José F. Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In 24th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 107--120.Google Scholar
Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, and Minyi Guo. 2021. Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction. In International Conference for High Performance Computing, Networking, Storage and Analysis.Google Scholar
Dormando. 2023. Memcached - a distributed memory object caching system. http://memcached.org.Google Scholar
Peter I. Frazier. 2018. A Tutorial on Bayesian Optimization. arXiv:1807.02811Google Scholar
Will Glozer. 2023. wrk2. https://github.com/giltene/wrk2.Google Scholar
Md. Enamul Haque, Yuxiong He, Sameh Elnikety, Thu D. Nguyen, Ricardo Bianchini, and Kathryn S. McKinley. 2017. Exploiting heterogeneity for tail latency and energy efficiency. In 50th Annual IEEE/ACM International Symposium on Microarchitecture. 625--638.Google Scholar
Haowei Huang, Pu Pang, Quan Chen, Jieru Zhao, Wenli Zheng, and Minyi Guo. 2022. CSC: Collaborative System Configuration for I/O-Intensive Applications in Multi-Tenant Clouds. In IEEE International Parallel and Distributed Processing Symposium. 1327--1337.Google Scholar
Intel. 2023. Performance Hybrid Architecture. https://www.intel.com/content/www/us/en/developer/articles/technical/hybrid-architecture.html.Google Scholar
Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization. IEEE, 1--10.Google ScholarCross Ref
Kenji Kawaguchi, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. 2015. Bayesian Optimization with Exponential Convergence. In Annual Conference on Neural Information Processing Systems. 2809--2817.Google Scholar
The kernel development community. 2023. CFS Bandwidth Control. https://docs.kernel.org/scheduler/sched-bwc.html.Google Scholar
Michael Kerrisk. 2023. numastat(8) --- Linux manual page. https://man7.org/linux/man-pages/man8/numastat.8.html.Google Scholar
Michael Kerrisk. 2023. taskset(1) --- Linux manual page. https://man7.org/linux/man-pages/man1/taskset.1.html.Google Scholar
Alexey Kopytov. 2023. sysbench. https://github.com/akopytov/sysbench.Google Scholar
Cheng Li, Sunil Gupta, Santu Rana, Vu Nguyen, Svetha Venkatesh, and Alistair Shilton. 2017. High Dimensional Bayesian Optimization using Dropout. In 26th International Joint Conference on Artificial Intelligence, Carles Sierra (Ed.). 2096--2102.Google ScholarCross Ref
Zijun Li, Quan Chen, Shuai Xue, Tao Ma, Yong Yang, Zhuo Song, and Minyi Guo. 2020. Amoeba: QoS-Awareness and Reduced Resource Usage of Microservices with Serverless Computing. In IEEE International Parallel and Distributed Processing Symposium. 399--408.Google Scholar
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In 42nd Annual International Symposium on Computer Architecture. 450--462.Google ScholarDigital Library
Rajiv Nishtala, Paul M. Carpenter, Vinicius Petrucci, and Xavier Martorell. 2017. Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads. In 23rd IEEE International Symposium on High Performance Computer Architecture. 409--420.Google Scholar
Rajiv Nishtala, Vinicius Petrucci, Paul M. Carpenter, and Magnus Själander. 2020. Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services. In 2020 IEEE International Symposium on High Performance Computer Architecture. 167--179.Google Scholar
Oracle. 2023. MySQL. https://www.mysql.com.Google Scholar
Pu Pang, Quan Chen, Deze Zeng, and Minyi Guo. 2021. Adaptive Preference-Aware Co-Location for Improving Resource Utilization of Power Constrained Datacenters. IEEE Transactions on Parallel and Distributed Systems 32, 2 (2021), 441--456.Google ScholarCross Ref
EPFL PARSA. 2023. Data Caching. https://github.com/parsa-epfl/cloudsuite/blob/CSv3/docs/benchmarks/data-caching.md.Google Scholar
Tirthak Patel and Devesh Tiwari. 2020. CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers. In 2020 IEEE International Symposium on High Performance Computer Architecture. IEEE, 193--206.Google Scholar
Vinicius Petrucci, Michael A. Laurenzano, John Doherty, Yunqi Zhang, Daniel Mossé, Jason Mars, and Lingjia Tang. 2015. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers. In 21st IEEE International Symposium on High Performance Computer Architecture. 246--258.Google ScholarCross Ref
Amir M. Rahmani, Bryan Donyanavard, Tiago Mück, Kasra Moazzemi, Axel Jantsch, Onur Mutlu, and Nikil D. Dutt. 2018. SPECTR: Formal Supervisory Control and Coordination for Many-core Systems Resource Management. In 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 169--183.Google Scholar
Rohan Basu Roy, Tirthak Patel, and Devesh Tiwari. 2021. SATORI: Efficient and Fair Resource Partitioning by Sacrificing Short-Term Benefits for Long-Term Gains*. In 48th ACM/IEEE Annual International Symposium on Computer Architecture. 292--305.Google ScholarDigital Library
Rohan Basu Roy, Tirthak Patel, and Devesh Tiwari. 2021. SATORI: Efficient and Fair Resource Partitioning by Sacrificing Short-Term Benefits for Long-Term Gains. In 48th ACM/IEEE Annual International Symposium on Computer Architecture. 292--305.Google ScholarDigital Library
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N De Freitas. 2015. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 1 (2015), 148--175.Google ScholarCross Ref
Jiuchen Shi, Jiawen Wang, Kaihua Fu, Quan Chen, Deze Zeng, and Minyi Guo. 2022. QoS-awareness of Microservices with Excessive Loads via Inter-Datacenter Scheduling. In IEEE International Parallel and Distributed Processing Symposium. 324--334.Google ScholarCross Ref
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems 25 (2012).Google Scholar
Arm Techcon. 2011. Big.LITTLE processing with ARM Cortex-A15 & Cortex-A7. Eetimes Com (2011).Google Scholar
Arunchandar Vasan, Anand Sivasubramaniam, Vikrant Shimpi, T Sivabalan, and Rajesh Subbiah. 2010. Worth their watts?-an empirical study of datacenter servers. In 16th International Conference on High Performance Computer Architecture. IEEE, 1--10.Google ScholarCross Ref
Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. In 25th International Conference on Architectural Support for Programming Languages and Operating Systems. 33--47.Google ScholarDigital Library

Index Terms

PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems

Recommendations

Heterogeneous- and NUMA-aware scheduling for many-core architectures
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage Conference

As the number of cores increases in a single chip processor, several challenges arise: wire delays, contention for out-of-chip accesses, and core heterogeneity. In order to address these issues and the applications demands, future large-scale many-core ...
Read More
Asymmetry-Aware Scheduling in Heterogeneous Multi-core Architectures
NPC 2013: Proceedings of the 10th IFIP International Conference on Network and Parallel Computing - Volume 8147

As threads of execution in a multi-programmed computing environment have different characteristics and hardware resource requirements, heterogeneous multi-core processors can achieve higher performance as well as power efficiency than homogeneous multi-...
Read More
Scheduling Methods for Accelerating Applications on Architectures with Heterogeneous Cores
IPDPSW '14: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops

Intra-node architectures for high performance machines have been rapidly evolving over the recent years. We are seeing a diverse set of architectures, most of them with heterogeneous cores. This leads to two important questions for HPC programming: 1) ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '23: Proceedings of the 37th International Conference on Supercomputing
June 2023
505 pages
ISBN:9798400700569
DOI:10.1145/3577193
Chair:
Kyle Gallivan,
Co-chair:
Efstratios Gallopoulos,
Program Co-chairs:
Dimitrios S. Nikolopoulos,
Ramon Beivide
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
heterogeneous cores
NUMA architectures
core scheduling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 289
  Total Downloads
- Downloads (Last 12 months)289
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization

ICS '23: Proceedings of the 37th International Conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Heterogeneous- and NUMA-aware scheduling for many-core architectures

Asymmetry-Aware Scheduling in Heterogeneous Multi-core Architectures

Scheduling Methods for Accelerating Applications on Architectures with Heterogeneous Cores

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization

ICS '23: Proceedings of the 37th International Conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Heterogeneous- and NUMA-aware scheduling for many-core architectures

Asymmetry-Aware Scheduling in Heterogeneous Multi-core Architectures

Scheduling Methods for Accelerating Applications on Architectures with Heterogeneous Cores

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media