Abstract
In large-scale data centers, many cloud applications with stringent latency requirements exhibit partition-aggregate patterns. Individual jobs necessitate responses from thousands of software services, thereby demanding that the tail latency of each participating task be maintained within tens to hundreds of microseconds to ensure rapid response to user operations. To meet this requirement, researchers have proposed a series of innovative microsecond-level task scheduling algorithms. However, they primarily focus on intra-server scheduling, overlooking the influence of inter-server scheduling. Furthermore, existing algorithms fail to account for the inconsistency in the execution order of different jobs’ tasks on multiple servers. This inconsistency can lead to delayed completion of jobs and delayed response to users. To solve these problems, this paper proposes a two-level consistent low-latency scheduling algorithm CLLSched for microsecond-level tasks, aiming to achieve both low tail completion time and high CPU utilization on servers. CLLSched employs a server selection strategy based on the power-of-k-choice principle. Within each server, CLLSched implements a fine-grained dynamic core allocation strategy based on task types. According to simulation results, compared to the most advanced counterpart RackSched, CLLSched achieves a 2.78x reduction in tail latency for short tasks. Additionally, CLLSched improves CPU utilization by 1.14 times and achieves a 1.23x increase in cluster throughput. More importantly, it substantially reduces job completion times with a minimum reduction of 60%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Memcached key-value store. http://memcached.org/
Redis data structure store. http://redis.io/
RocksDB. http://rocksdb.org/
Belay, A., Prekas, G., Primorac, M., et al.: The IX operating system: combining low latency, high throughput, and efficiency in a protected dataplane. ACM Trans. Comput. Syst. (TOCS) 34(4), 1–39 (2016)
Prekas, G., et al.: Zygos: achieving low tail latency for microsecond-scale networked tasks. In: 26th Symposium on Operating Systems Principles, pp. 325–341 (2017)
Qin, H., et al.: Arachne: core-aware thread management. In: 13th USENIX Symposium on Operating Systems Design and Implementation, pp. 145–160 (2018)
Ousterhout, A., et al.: Shenango: achieving high CPU efficiency for latency-sensitive datacenter workloads. In: 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 361–378 (2019)
Fried, J., et al.: Caladan: mitigating interference at microsecond timescales. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 281–297 (2020)
Kaffes, K., et al.: Shinjuku: preemptive scheduling for \(\mu \)second-scale tail latency. In: 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 345–360 (2019)
Gog, I., et al.: Firmament: Fast, centralized cluster scheduling at scale. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 99–115 (2016)
Chen, S., Delimitrou, C., Martínez, J.F.: Parties: QOS-aware resource partitioning for multiple interactive services. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 107–120 (2019)
Cho, I., et al.: Overload control for \(\upmu \)s-scale RPCs with breakwater. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 299–314 (2020)
Zhu, H., et al.: RackSched: a microsecond-scale scheduler for rack-scale computers. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 1225–1240 (2020)
Demoulin, H.M., et al.: When idling is ideal: optimizing tail-latency for heavy-tailed datacenter workloads with Perséphone. In: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, pp. 621–637 (2021)
Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)
Tan, L., Su, W., Zhang, W., et al.: In-band network telemetry: a survey. Comput. Netw. 186, 107763 (2021)
McClure, S., et al.: Efficient scheduling policies for microsecond-scale tasks. In: 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 1–18 (2022)
Atikoglu, B, Xu, Y.: Workload analysis of a large-scale key-value store. In: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, pp. 53–64 (2012)
Cooper, B.F., Silberstein, A.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154 (2010)
Acknowledgments
The authors gratefully acknowledge the anonymous reviewers for their constructive comments. This work is supported in part by the Fundamental Research Funds for the Central Universities, NO. NS2023049 and by National Natural Science Foundation of China (NSFC) under Grant No. 62132007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, Q., Zhang, T., Yi, C. (2025). Consistent Low-Latency Scheduling for Microsecond-Scale Tasks in Data Centers. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14999. Springer, Cham. https://doi.org/10.1007/978-3-031-71470-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-71470-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71469-6
Online ISBN: 978-3-031-71470-2
eBook Packages: Computer ScienceComputer Science (R0)