Abstract
It is a challenging task to achieve the minimum average CCT (coflow completion time) and provide isolation guarantees in multi-tenant datacenters without prior knowledge of coflow sizes. State-of-the-art solutions either focus on minimizing the average CCT or providing optimal isolation guarantees. However, achieving the minimum average CCT and isolation guarantees in multi-tenant datacenters is difficult due to the conflicting nature of these objectives. Therefore, we propose FIGCS-TF (Fast and Isolation Guarantees Coflow Scheduling via Traffic Forecasting), a coflow scheduling algorithm that does not require prior knowledge. FIGCS-TF utilizes a lightweight forecasting module to predict the relative scheduling priority of coflows. Moreover, it employs the MDRF (monopolistic dominant resource fairness) strategy for bandwidth allocation, which is based on super-coflows and helps achieve long-term isolation. Through trace-driven simulations, FIGCS-TF demonstrate communication stages that are 1.12\(\times\), 1.99\(\times\), and 5.50\(\times\) faster than DRF (Dominant Resource Fairness), NCDRF (Non-Clairvoyant Dominant Resource Fairness) and Per-Flow Fairness, respectively. In comparison with the theoretically minimum CCT, FIGCS-TF experiences only a 46% increase in average CCT at the top 95th percentile of the dataset. Overall, FIGCS-TF exhibits superior performance in reducing average CCT compared to other algorithms.














Similar content being viewed by others
Data Availability
No datasets were generated or analysed during the current study.
References
Ekanayake J, Gunarathne T, Fox G, Balkir AS, Poulain C, Araujo N, Barga R (2009) DryadLINQ for scientific analyses. In: 2009 Fifth IEEE International Conference on E-Science, pp. 329–336. IEEE, https://doi.org/10.1109/e-Science.2009.53
Apache Spark. http://spark.apache.org/ Accessed 2021-04-04
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. https://doi.org/10.1145/1327452.1327492
Shafiee M, Ghaderi J (2022) Scheduling coflows with dependency graph. IEEE/ACM Trans Netw 30(1):450–463. https://doi.org/10.1109/TNET.2021.3116133
Bai W, Chen L, Chen K, Han D, Tian C, Wang H (2017) PIAS: practical information-agnostic flow scheduling for commodity data centers. IEEE/ACM Trans Netw 25(4):1954–1967. https://doi.org/10.1109/TNET.2017.2669216
Zhou P, He X, Luo S, Yu H, Sun G (2020) JPAS: Job-progress-aware flow scheduling for deep learning clusters. J Netw Comput Appl 158:102590–102604. https://doi.org/10.1016/j.jnca.2020.102590
Wang S, Wang S, Zhou D, Yang Y, Zhang W, Huang T, Huo R, Liu Y (2020) Large-Scale and rapid flow size estimation for improving flow scheduling. In: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 1141–1146. https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9163019
Li C, Zhang H, Ding W, Zhou T (2021) Fair and near-optimal coflow scheduling without prior knowledge of coflow size. J Supercomput 77(7):7690–7717. https://doi.org/10.1007/s11227-020-03614-2
Tian B, Tian C, Wang B, Li B, He Z, Dai H, Liu K, Dou W, Chen G (2019) Scheduling dependent coflows to minimize the total weighted job completion time in datacenters. Comput Netw 158:193–205. https://doi.org/10.1016/j.comnet.2019.05.010
Zhao Y, Tian C, Fan J, Guan T, Zhang X, Qiao C (2021) Joint reducer placement and coflow bandwidth scheduling for computing clusters. IEEE/ACM Trans Netw 29(1):438–451. https://doi.org/10.1109/TNET.2020.3037064
Tan H, Zhang C, Xu C, Li Y, Han Z, Li X-Y (2021) Regularization-based coflow scheduling in optical circuit switches. IEEE/ACM Trans Netw 29(3):1280–1293. https://doi.org/10.1109/TNET.2021.3058164
Chowdhury M, Liu Z, Ghodsi A, Stoica I (2016) HUG: multi-resource fairness for correlated and elastic demands. In: 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pp. 407–424. USENIX, Santa Clara, California
Wang W, Ma S, Li B, Li B (2017) Coflex: navigating the fairness-efficiency tradeoff for coflow scheduling. In: IEEE INFOCOM 2017 - IEEE Conference on Computer Communications, pp. 1–9. IEEE, Atlanta, GA, USA. https://doi.org/10.1109/INFOCOM.2017.8057172
Wang L, Wang W, Li B (2018) Utopia: near-optimal coflow scheduling with isolation guarantee. In: IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, pp. 891–899. IEEE, Honolulu, HI. https://doi.org/10.1109/INFOCOM.2018.8485970
Wang L, Wang W (2018) Fair coflow scheduling without prior knowledge. In: 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 22–32. IEEE, Vienna. https://doi.org/10.1109/ICDCS.2018.00013
Lu Y, Chen G, Luo L, Tan K, Xiong Y, Wang X, Chen E (2017) One more queue is enough: minimizing flow completion time with explicit priority notification. In: IEEE INFOCOM 2017 - IEEE Conference on Computer Communications, pp. 1–9. IEEE, https://doi.org/10.1109/INFOCOM.2017.8056946
Wang S, Li D, Geng J (2020) Geryon: accelerating distributed CNN training by network-level flow scheduling. In: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, pp. 1678–1687. https://doi.org/10.1109/INFOCOM41043.2020.9155282
Goyal P, Shah P, Zhao K, Nikolaidis G, Alizadeh M, Anderson TE (2022) Backpressure flow control. In: 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp. 779–805. https://doi.org/10.1145/3375235.3375239
Chowdhury M, Zhong Y, Stoica I (2014) Efficient coflow scheduling with Varys. In: Proceedings of the 2014 ACM Conference on SIGCOMM - SIGCOMM ’14, pp. 443–454. ACM Press, Chicago, Illinois, USA. https://doi.org/10.1145/2619239.2626315
Dogar FR, Karagiannis T, Ballani H, Rowstron A (2014) Decentralized task-aware scheduling for data center networks. In: Proceedings of the 2014 ACM Conference on SIGCOMM - SIGCOMM ’14, pp. 431–442. ACM Press, Chicago, Illinois, USA. https://doi.org/10.1145/2619239.2626322
Luo S, Fan P, Xing H, Yu H (2023) Meeting coflow deadlines in data center networks with policy-based selective completion. IEEE/ACM Trans Netw 31(1):178–191. https://doi.org/10.1109/TNET.2022.3187821
Zhou Q, Wang K, Li P, Zeng D, Guo S, Ye B, Guo M (2019) Fast coflow scheduling via traffic compression and stage pipelining in datacenter. Networks 68(12):1755–1771. https://doi.org/10.1109/TC.2019.2931716
Jajoo A, Hu YC, Lin X (2022) A case for sampling-based learning techniques in coflow scheduling. IEEE/ACM Trans Netw 30(4):1494–1508. https://doi.org/10.1109/TNET.2021.3138923
Li C, Zhang H, Zhou T (2019) Coflow scheduling algorithm based density peaks clustering. Futur Gener Comput Syst 97:805–813. https://doi.org/10.1016/j.future.2019.03.035
Guo C, Lu G, Wang HJ, Yang S, Kong C, Sun P, Wu W, Zhang Y (2010) SecondNet: a data center network virtualization architecture with bandwidth guarantees. In: Proceedings of the 6th International Conference on - Co-NEXT ’10, pp. 1–12. ACM Press, Philadelphia, USA. https://doi.org/10.1145/1921168.1921188
Ballani H, Costa P, Karagiannis T, Rowstron A (2011) Towards predictable datacenter networks. In: Proceedings of the ACM SIGCOMM 2011 Conference on SIGCOMM - SIGCOMM ’11, vol. 41, pp. 242–253. ACM Press, Toronto, Ontario, Canada. https://doi.org/10.1145/2018436.2018465
Popa L, Kumar G, Chowdhury M, Krishnamurthy A, Ratnasamy S, Stoica I (2012) FairCloud: sharing the network in cloud computing. ACM SIGCOMM Comput Commun Rev 42(4):187–198. https://doi.org/10.1145/2377677.2377717
Jeyakumar V, Alizadeh M, Mazieres D, Prabhakar B, Kim C, Greenberg A (2013) EyeQ: practical network performance isolation at the edge. In: 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13), pp. 297–311. USENIX, Lombard, IL
Wang W, Jin A-L (2016) Friends or foes: revisiting strategy-proofness in cloud network sharing. In: 2016 IEEE 24th International Conference on Network Protocols (ICNP), pp. 1–10. IEEE, Singapore. https://doi.org/10.1109/ICNP.2016.7784425
Zhang T, Shu R, Shan Z, Ren F (2019) Distributed bottleneck-aware coflow scheduling in data centers. IEEE Trans Parallel Distrib Syst 30(7):1565–1579. https://doi.org/10.1109/TPDS.2018.2889685
Ben Yedder H, Ding Q, Zakia U, Li Z, Haeri S, Trajkovic L (2017) Comparison of virtualization algorithms and topologies for data center networks. In: 2017 26th International Conference on Computer Communication and Networks (ICCCN), pp. 1–6. https://doi.org/10.1109/ICCCN.2017.8038524
Namyar P, Supittayapornpong S, Zhang M, Yu M, Govindan R (2021) A throughput-centric view of the performance of datacenter topologies. In: Proceedings of the 2021 ACM SIGCOMM 2021 Conference, pp. 349–369. ACM, https://doi.org/10.1145/3452296.3472913
Chowdhury NMMK, Phd. (2015) University of California, Berkeley
Inotify(7) - Linux Manual Page
Coflow Benchmark Based on Facebook Traces (2023)
Funding
This study was supported by the Natural Science Foundation of Shandong Province, China (Grant No.ZR2022QF143), Natural Science Foundation of Shandong Province, China (Grant No.ZR2021QF130), Shaanxi key Laboratory of Information Communication Network and Security (Xi’an university of Posts and Telecommunications) open project (Grant No.ICNS202202), Hubei Key Laboratory of intelligent Robot (Wuhan Institute of Technology) open project (Grant No.HBIR202201), Wuhan knowledge innovation special project (Grant No.30106230186).
Author information
Authors and Affiliations
Contributions
C.L. wrote the main manuscript text and designed the model of the manuscript.H.Z., Y.F. and S.H. collected the data, performed the analysis.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, C., Zhang, H., Yang, F. et al. Fast and isolation guaranteed coflow scheduling via traffic forecasting in multi-tenant environment. J Supercomput 80, 26726–26750 (2024). https://doi.org/10.1007/s11227-024-06457-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06457-3