skip to main content
10.1145/3577193.3593709acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization

Authors Info & Claims
Published:21 June 2023Publication History

ABSTRACT

Latency-critical applications directly interact with end users and often experience the diurnal load pattern. In production, best-effort applications are often co-located with them to utilize the idle cores at the low load. Meanwhile, modern computers are evolving towards heterogeneous NUMA architecture, where the cores have different computation abilities, memory access latencies and network communication delays. Prior co-location scheduling work did not consider the NUMA architecture, and failed to maximize the throughput of best-effort applications while ensuring the required QoS of latency-critical applications. Our investigation shows that NUMA effect has complex impacts on the latency of latency-critical applications and the throughput of best-effort applications. We therefore propose PAC, a preference-aware co-location scheduling scheme that considers the NUMA effect for heterogeneous NUMA architectures. PAC has a performance monitor and a core scheduler. Specifically, the performance monitor identifies the "dangerous" latency-critical applications that require upgrading core allocations. We propose two low-overhead scheduling strategies for the scheduler. The strategies identify the bottlenecks of applications and adjust core allocations accordingly. Experimental result shows that PAC improves the throughput of best-effort applications by 3.87× while ensuring the required QoS of latency-critical applications.

References

  1. 2023. Nginx. http://nginx.org.Google ScholarGoogle Scholar
  2. 2023. The xapian project. https://xapian.org.Google ScholarGoogle Scholar
  3. Luiz André Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (2007), 33--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In 17th International Conference on Parallel Architectures and Compilation Techniques. 72--81.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, and Minyi Guo. 2019. Avalon: Towards QoS Awareness and Improved Utilization through Multi-Resource Management in Datacenters. In ACM International Conference on Supercomputing. 272--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Quan Chen, Shuai Xue, Shang Zhao, Shanpei Chen, Yihao Wu, Yu Xu, Zhuo Song, Tao Ma, Yong Yang, and Minyi Guo. 2020. Alita: comprehensive performance isolation through bias resource management for public clouds. In International Conference for High Performance Computing, Networking, Storage, and Analysis.Google ScholarGoogle ScholarCross RefCross Ref
  7. Shuang Chen, Christina Delimitrou, and José F. Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In 24th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 107--120.Google ScholarGoogle Scholar
  8. Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, and Minyi Guo. 2021. Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction. In International Conference for High Performance Computing, Networking, Storage and Analysis.Google ScholarGoogle Scholar
  9. Dormando. 2023. Memcached - a distributed memory object caching system. http://memcached.org.Google ScholarGoogle Scholar
  10. Peter I. Frazier. 2018. A Tutorial on Bayesian Optimization. arXiv:1807.02811Google ScholarGoogle Scholar
  11. Will Glozer. 2023. wrk2. https://github.com/giltene/wrk2.Google ScholarGoogle Scholar
  12. Md. Enamul Haque, Yuxiong He, Sameh Elnikety, Thu D. Nguyen, Ricardo Bianchini, and Kathryn S. McKinley. 2017. Exploiting heterogeneity for tail latency and energy efficiency. In 50th Annual IEEE/ACM International Symposium on Microarchitecture. 625--638.Google ScholarGoogle Scholar
  13. Haowei Huang, Pu Pang, Quan Chen, Jieru Zhao, Wenli Zheng, and Minyi Guo. 2022. CSC: Collaborative System Configuration for I/O-Intensive Applications in Multi-Tenant Clouds. In IEEE International Parallel and Distributed Processing Symposium. 1327--1337.Google ScholarGoogle Scholar
  14. Intel. 2023. Performance Hybrid Architecture. https://www.intel.com/content/www/us/en/developer/articles/technical/hybrid-architecture.html.Google ScholarGoogle Scholar
  15. Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization. IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  16. Kenji Kawaguchi, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. 2015. Bayesian Optimization with Exponential Convergence. In Annual Conference on Neural Information Processing Systems. 2809--2817.Google ScholarGoogle Scholar
  17. The kernel development community. 2023. CFS Bandwidth Control. https://docs.kernel.org/scheduler/sched-bwc.html.Google ScholarGoogle Scholar
  18. Michael Kerrisk. 2023. numastat(8) --- Linux manual page. https://man7.org/linux/man-pages/man8/numastat.8.html.Google ScholarGoogle Scholar
  19. Michael Kerrisk. 2023. taskset(1) --- Linux manual page. https://man7.org/linux/man-pages/man1/taskset.1.html.Google ScholarGoogle Scholar
  20. Alexey Kopytov. 2023. sysbench. https://github.com/akopytov/sysbench.Google ScholarGoogle Scholar
  21. Cheng Li, Sunil Gupta, Santu Rana, Vu Nguyen, Svetha Venkatesh, and Alistair Shilton. 2017. High Dimensional Bayesian Optimization using Dropout. In 26th International Joint Conference on Artificial Intelligence, Carles Sierra (Ed.). 2096--2102.Google ScholarGoogle ScholarCross RefCross Ref
  22. Zijun Li, Quan Chen, Shuai Xue, Tao Ma, Yong Yang, Zhuo Song, and Minyi Guo. 2020. Amoeba: QoS-Awareness and Reduced Resource Usage of Microservices with Serverless Computing. In IEEE International Parallel and Distributed Processing Symposium. 399--408.Google ScholarGoogle Scholar
  23. David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In 42nd Annual International Symposium on Computer Architecture. 450--462.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rajiv Nishtala, Paul M. Carpenter, Vinicius Petrucci, and Xavier Martorell. 2017. Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads. In 23rd IEEE International Symposium on High Performance Computer Architecture. 409--420.Google ScholarGoogle Scholar
  25. Rajiv Nishtala, Vinicius Petrucci, Paul M. Carpenter, and Magnus Själander. 2020. Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services. In 2020 IEEE International Symposium on High Performance Computer Architecture. 167--179.Google ScholarGoogle Scholar
  26. Oracle. 2023. MySQL. https://www.mysql.com.Google ScholarGoogle Scholar
  27. Pu Pang, Quan Chen, Deze Zeng, and Minyi Guo. 2021. Adaptive Preference-Aware Co-Location for Improving Resource Utilization of Power Constrained Datacenters. IEEE Transactions on Parallel and Distributed Systems 32, 2 (2021), 441--456.Google ScholarGoogle ScholarCross RefCross Ref
  28. EPFL PARSA. 2023. Data Caching. https://github.com/parsa-epfl/cloudsuite/blob/CSv3/docs/benchmarks/data-caching.md.Google ScholarGoogle Scholar
  29. Tirthak Patel and Devesh Tiwari. 2020. CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers. In 2020 IEEE International Symposium on High Performance Computer Architecture. IEEE, 193--206.Google ScholarGoogle Scholar
  30. Vinicius Petrucci, Michael A. Laurenzano, John Doherty, Yunqi Zhang, Daniel Mossé, Jason Mars, and Lingjia Tang. 2015. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers. In 21st IEEE International Symposium on High Performance Computer Architecture. 246--258.Google ScholarGoogle ScholarCross RefCross Ref
  31. Amir M. Rahmani, Bryan Donyanavard, Tiago Mück, Kasra Moazzemi, Axel Jantsch, Onur Mutlu, and Nikil D. Dutt. 2018. SPECTR: Formal Supervisory Control and Coordination for Many-core Systems Resource Management. In 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 169--183.Google ScholarGoogle Scholar
  32. Rohan Basu Roy, Tirthak Patel, and Devesh Tiwari. 2021. SATORI: Efficient and Fair Resource Partitioning by Sacrificing Short-Term Benefits for Long-Term Gains*. In 48th ACM/IEEE Annual International Symposium on Computer Architecture. 292--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rohan Basu Roy, Tirthak Patel, and Devesh Tiwari. 2021. SATORI: Efficient and Fair Resource Partitioning by Sacrificing Short-Term Benefits for Long-Term Gains. In 48th ACM/IEEE Annual International Symposium on Computer Architecture. 292--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N De Freitas. 2015. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 1 (2015), 148--175.Google ScholarGoogle ScholarCross RefCross Ref
  35. Jiuchen Shi, Jiawen Wang, Kaihua Fu, Quan Chen, Deze Zeng, and Minyi Guo. 2022. QoS-awareness of Microservices with Excessive Loads via Inter-Datacenter Scheduling. In IEEE International Parallel and Distributed Processing Symposium. 324--334.Google ScholarGoogle ScholarCross RefCross Ref
  36. Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems 25 (2012).Google ScholarGoogle Scholar
  37. Arm Techcon. 2011. Big.LITTLE processing with ARM Cortex-A15 & Cortex-A7. Eetimes Com (2011).Google ScholarGoogle Scholar
  38. Arunchandar Vasan, Anand Sivasubramaniam, Vikrant Shimpi, T Sivabalan, and Rajesh Subbiah. 2010. Worth their watts?-an empirical study of datacenter servers. In 16th International Conference on High Performance Computer Architecture. IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  39. Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. In 25th International Conference on Architectural Support for Programming Languages and Operating Systems. 33--47.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICS '23: Proceedings of the 37th International Conference on Supercomputing
      June 2023
      505 pages
      ISBN:9798400700569
      DOI:10.1145/3577193

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 June 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate584of2,055submissions,28%
    • Article Metrics

      • Downloads (Last 12 months)289
      • Downloads (Last 6 weeks)19

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader