Abstract
While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.
Similar content being viewed by others
References
Antonopoulos C D, Nikolopoulos D S, Papatheodorou T S. Scheduling algorithms with bus bandwidth considerations for SMPs. In: Proceedings of 2003 International Conference on Parallel Processing, 2003. 2003, 547–554
Xu D, Wu C, Yew P C. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In: Proceedings of the 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT). 2010, 237–247
Chang J, Sohi G S. Cooperative cache partitioning for chip multiprocessors. In: Proceedings of ACM International Conference on Supercomputing 25th Anniversary Volume. 2007, 402–412
Kim S, Chandra D, Solihin Y. Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. 2004, 111–122
Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). 2006, 423–432
Lee G, Tolia N, Ranganathan P, Katz R H. Topology-aware resource allocation for data-intensive workloads. In: Proceedings of the 1st ACM Asia-Pacific Workshop on Workshop on Systems. 2010, 1–6
Lv Q, Shi X, Zhou L. Based on ant colony algorithm for cloud management platform resources scheduling. In: Proceedings of World Automation Congress 2012. 2012, 1–4
Wen X, Huang M, Shi J. Study on resources scheduling based on ACO allgorithm and PSO algorithm in cloud computing. In: Proceedings of 2012 the 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science. 2012, 219–222
Zhu K, Song H, Liu L, Gao J, Cheng G. Hybrid genetic algorithm for cloud computing applications. In: Proceedings of 2011 IEEE Asia-Pacific Services Computing Conference. 2011, 182–187
Enable split locked accesses detection.see https://lwn.net/Articles/784864/website, 2020
Koukis E, Koziris N. Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of SMPs. In: Proceedings of the 12th International Conference on Parallel and Distributed Systems-(ICPADS’06). 2006, 10
Pinel F, Pecero J E, Bouvry P, Khan S U. Memory-aware green scheduling on multi-core processors. In: Proceedings of the 2010 39th International Conference on Parallel Processing Workshops. 2010, 485–488
Stone H S, Turek J, Wolf J L. Optimal partitioning of cache memory. IEEE Transactions on Computers, 1992, 41(9): 1054–1068
Sato M, Kotera I, Egawa R, Takizawa H, Kobayashi H. A cache-aware thread scheduling policy for multi-core processors. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks. 2009, 109–114
Fedorova A, Blagodurov S, Zhuravlev S. Managing contention for shared resources on multicore processors. Communications of the ACM, 2010, 53(2): 49–57
Chen Q, Huang Z, Guo M, Zhou J. CAB: Cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: Proceedings of 2011 International Conference on Parallel Processing. 2011, 722–732
Chen Q, Guo M, Huang Z. CATS: Cache aware task-stealing based on online profiling in multi-socket multi-core architectures. In: Proceedings of the 26th ACM International Conference on Supercomputing. 2012, 163–172
Chen Q, Chen Y, Huang Z, Guo M. WATS: Workload-aware task scheduling in asymmetric multi-core architectures. In: Proceedings of 2012 IEEE the 26th International Parallel and Distributed Processing Symposium. 2012, 249–260
Chen Q, Guo M, Guan H. LAWS: locality-aware work-stealing for multi-socket multi-core architectures. In: Proceedings of the 28th ACM International Conference on Supercomputing. 2014, 3–12
Feliu J, Petit S, Sahuquillo J, Duato J. Cache-hierarchy contention-aware scheduling in CMPs. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(3): 581–590
Mars J, Tang L, Hundt R, Skadron K, Soffa M L. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible colocations. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 2011, 248–259
Yang H, Breslow A, Mars J, Tang L. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. ACM SIGARCH Computer Architecture News, 2013, 41(3): 607–618
Delimitrou C, Kozyrakis C. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 2013, 48(4): 77–88
Delimitrou C, Kozyrakis C. Quasar: Resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices, 2014, 49(4): 127–144
Lo D, Cheng L, Govindaraju R, Ranganathan P, Kozyrakis C. Heracles: Improving resource efficiency at scale. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. 2015, 450–462
Chen S, Delimitrou C, Marténez J F. Parties: QoS-aware resource partitioning for multiple interactive services. In: Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 2019, 107–120
Dasari D, Andersson B, Nelis V, Petters S M, Easwaran A, Lee J. Response time analysis of cots-based multicores considering the contention on the shared memory bus. In: Proceedings of 2011 IEEE the 10th International Conference on Trust, Security and Privacy in Computing and Communications. 2011, 1068–1075
Rashid S A, Nelissen G, Tovar E. Cache persistence-aware memory bus contention analysis for multicore systems. In: Proceedings of 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). 2020, 442–447
Dasari D, Nelis V, Akesson B. A framework for memory contention analysis in multi-core platforms. Real-Time Systems, 2016, 52(3): 272–322
Dasari D, Nelis V. An analysis of the impact of bus contention on the WCET in multicores. In: Proceedings of 2012 IEEE the 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems. 2012, 1450–1457
Chen Q, Xue S, Zhao S, Chen S, Wu Y, Xu Y, Song Z, Ma T, Yang Y, Guo M. Alita: comprehensive performance isolation through bias resource management for public clouds. In: Proceedings of 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 2020, 1–13
Bienia C, Kumar S, Singh J P, Li K. The parsec benchmark suite: Characterization and architectural implications. In: Proceedings of the 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT). 2008, 72–81
Cortez E, Bonde A, Muzio A, Russinovich M, Fontoura M, Bianchini R. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In: Proceedings of the 26th Symposium on Operating Systems Principles. 2017, 153–167
Zhang X, Zheng X, Wang Z, Li Q, Fu J, Zhang Y, Shen Y. Fast and scalable VMM live upgrade in large cloud infrastructure. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 2019, 93–105
Kuhn H W. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 1955, 2(1–2): 83–97
Acknowledgements
This work was partially sponsored by the National R&D Program of China (2018YFB1004800), the National Natural Science Foundation of China (Grant Nos. 62022057, 61632017, 61832006) and Alibaba Group. Quan Chen and Minyi Guo are the corresponding authors. We thank Chao Qian for his collaborative effort during data collection. And we also thank anonymous reviewers provided helpful comments on earlier drafts of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Shuai Xue received his BSc degree from Northwestern Polytechnical University, China. He is currently an MSc student in the field of computer science under supervision of Dr. Quan Chen in Department of Computer Engineering Faculty of Shanghai Jiao Tong University, China. His research interests include high performance computing and resource management in datacenters.
Shang Zhao received his BSc degree from Shanghai Jiao Tong University, China. He is currently a graduate student in the field of computer science under the supervision of Dr. Quan Chen in the Department of Computer Science and Engineering of Shanghai Jiao Tong University, China. His research interests include cloud computing and operating system in datacenter.
Quan Chen is a tenure-track associate professor in the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. His research interests include High performance computing, Task Scheduling in various architectures, Resource management in Datacenter, Runtime System and Operating System. He got his PhD degree at June 2014 from the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China.
Zhuo Song received his BS and MS degree from Wuhan University, China. He is a Staff Engineer at Alibaba, China and working on network optimization and development including but not limited to drivers, network stack and related subsystems, containers and kernel bypass technologies. Now, he mainly focuses on software and hardware co-design technologies for cloud and OS architecture.
Shanpei Chen received his BS degree in network engineering from Dalian University of Technology, China in 2010, and Master’s degree in computer science from Peking University, China in 2013. Now, he is working for Alibaba, China and focuses on system software technology, including scheduler, resource isolation, performance evaluation and optimization.
Tao Ma received his MS degree from China University of Geosciences, China. He is a Principal Engineer at Alibaba, China. and he has more than 15 years of software development in operating system, computer architecture and database system. His major focus is on the operating systems for data center, cloud infrastructure, cloud native systems and high performance computing.
Yong Yang received his BS degree from Inner Mongolia University of Technology, China. He is a Senior Staff Engineer at Alibaba, China and has rich development experiences in system software, server and storage industry. Currently, his major focus is on infrastructure software development, which includes server operating system and hardware appliance development in cloud. He also had specical interest on system performance and QoS in data center and cloud.
Wenli Zheng is an assistant professor with the Computer Science and Engineering Department of Shanghai Jiao Tong University, China. He got the PhD degree from The Ohio State University, USA in 2016, and has been working in the area of computer systems since 2012. Specifically, his research interests include parallel and distributed computing, as well as power management of computer systems.
Minyi Guo received the PhD degree in computer science from the University of Tsukuba, Japan. He is currently Zhiyuan Chair professor and head of the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. His present research interests include parallel/distributed computing, compiler optimizations, embedded systems, pervasive computing, big data and cloud computing. He is now on the editorial board of IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Cloud Computing and Journal of Parallel and Distributed Computing. Dr. Guo is a fellow of IEEE, and a fellow of CCF.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Xue, S., Zhao, S., Chen, Q. et al. Kronos: towards bus contention-aware job scheduling in warehouse scale computers. Front. Comput. Sci. 17, 171101 (2023). https://doi.org/10.1007/s11704-021-0418-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-021-0418-5