skip to main content
research-article

Decentralized Scheduling for Data-Parallel Tasks in the Cloud

Published: 08 June 2024 Publication History

Abstract

For latency-sensitive data processing applications in the cloud, concurrent data-parallel tasks need to be scheduled and processed quickly. A data-parallel task usually consists of a set of sub-tasks, generating a set of flows that are collectively referred to as coflows. The state-of-the-art schedulers collect coflow information in the cloud to optimize coflow-level performance. However, most of the coflows, classified as small coflows because they consist of only short flows, have been largely overlooked. This article presents OptaX, a decentralized network scheduling service that collaboratively schedules data-parallel tasks’ small coflows. OptaX adopts a cross-layer, commercial off-the-shelf switch-compatible design that leverages the sendbuffer information in the kernel to adaptively optimize flow scheduling in the network. Specifically, OptaX (i) monitors the system calls (syscalls) in the hosts to obtain their sendbuffer footprints, and (ii) recognizes small coflows and assigns high priorities to their flows. OptaX transfers these flows in a FIFO manner by adjusting TCP’s two attributes: window size and round-trip time. We have implemented OptaX as a Linux kernel module. The evaluation shows that OptaX is at least 2.2× faster than fair sharing and 1.2× faster than only assigning small coflows with the highest priority. We further apply OptaX to improve the small I/O performance of Ursa, a distributed block storage system that provides virtual disks where small I/O is dominant. Ursa with OptaX achieves significant improvement compared to the original Ursa for small I/O latency.

References

[1]
Alexandru Agache and Costin Raiciu. 2015. Oh flow, are thou happy? TCP sendbuffer advertising for make benefit of clouds and tenants. In Proceedings of the 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud ’15). https://www.usenix.org/conference/hotcloud15/workshop-program/presentation/agache
[2]
Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. 2010. Hedera: Dynamic flow scheduling for data center networks. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI ’10). 19. http://dl.acm.org/citation.cfm?id=1855711.1855730
[3]
Mohammad Alizadeh, Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, and Masato Yasuda. 2012. Less is more: Trading a little bandwidth for ultra-low latency in the data center. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’12). 253–266. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/alizadeh
[4]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pFabric: Minimal near-optimal datacenter transport. In Proceedings of the 2013 ACM SIGCOMM Conference (SIGCOMM ’13). ACM, New York, NY, USA, 435–446. DOI:
[5]
W. Bai, K. Chen, H. Wu, W. Lan, and Y. Zhao. 2014. PAC: Taming TCP incast congestion using proactive ACK control. In Proceedings of the 2014 IEEE 22nd International Conference on Network Protocols. 385–396. DOI:
[6]
Wei Bai, Li Chen, Kai Chen, Dongsu Han, Chen Tian, and Hao Wang. 2015. Information-agnostic flow scheduling for commodity data centers. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15). 455–468. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/bai
[7]
Theophilus Benson, Ashok Anand, Aditya Akella, and Ming Zhang. 2011. MicroTE: Fine grained traffic engineering for data centers. In Proceedings of the 7th Conference on Emerging Networking Experiments and Technologies (CoNEXT ’11). ACM, New York, NY, USA, Article 8, 12 pages. DOI:
[8]
Theophilus A. Benson, Ashok Anand, Aditya Akella, and Ming Zhang. 2009. Understanding data center traffic characteristics. In Proceedings of the ACM SIGCOMM Workshop: Research on Enterprise Networking. https://www.microsoft.com/en-us/research/publication/understanding-data-center-traffic-characteristics/
[9]
Li Chen, Kai Chen, Wei Bai, and Mohammad Alizadeh. 2016. Scheduling mix-flows in commodity datacenters with Karuna. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM ’16). ACM, New York, NY, USA, 174–187. DOI:
[10]
L. Chen, W. Cui, B. Li, and B. Li. 2016. Optimizing coflow completion times with utility max-min fairness. In Proceedings of the 35th Annual IEEE International Conference on Computer Communications (INFOCOM ’16). 1–9. DOI:
[11]
Mosharaf Chowdhury, Samir Khuller, Manish Purohit, Sheng Yang, and Jie You. 2019. Near optimal coflow scheduling in networks. In Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’19). ACM, New York, NY, USA, 123–134. DOI:
[12]
Mosharaf Chowdhury and Ion Stoica. 2012. Coflow: A networking abstraction for cluster applications. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks (HotNets ’12). ACM, New York, NY, USA, 31–36. DOI:
[13]
Mosharaf Chowdhury and Ion Stoica. 2015. Efficient coflow scheduling without prior knowledge. In Proceedings of the 2015 ACM SIGCOMM Conference (SIGCOMM ’15). ACM, New York, NY, USA, 393–406. DOI:
[14]
Mosharaf Chowdhury, Matei Zaharia, Justin Ma, Michael I. Jordan, and Ion Stoica. 2011. Managing data transfers in computer clusters with orchestra. In Proceedings of the 2011 ACM SIGCOMM Conference (SIGCOMM ’11). ACM, New York, NY, USA, 98–109. DOI:
[15]
Mosharaf Chowdhury, Yuan Zhong, and Ion Stoica. 2014. Efficient coflow scheduling with Varys. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM ’14). ACM, New York, NY, USA, 443–454. DOI:
[16]
Fahad R. Dogar, Thomas Karagiannis, Hitesh Ballani, and Antony Rowstron. 2014. Decentralized task-aware scheduling for data center networks. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM ’14). ACM, New York, NY, USA, 431–442. DOI:
[17]
Z. Fu, T. Song, S. Wang, F. Wang, and Z. Qi. 2015. Seagull—A real-time coflow scheduling system. In Proceedings of the 2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing. 540–545. DOI:
[18]
Y. Gao, H. Yu, S. Luo, and S. Yu. 2016. Information-agnostic coflow scheduling with optimal demotion thresholds. In Proceedings of the 2016 IEEE International Conference on Communications (ICC ’16). 1–6. DOI:
[19]
Matthew P. Grosvenor, Malte Schwarzkopf, Ionel Gog, Robert N. M. Watson, Andrew W. Moore, Steven Hand, and Jon Crowcroft. 2015. Queues don’t matter when you can JUMP them! In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15). 1–14. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/grosvenor
[20]
Chi-Yao Hong, Matthew Caesar, and P. Brighten Godfrey. 2012. Finishing flows quickly with preemptive scheduling. In Proceedings of the 2012 ACM SIGCOMM SIGCOMM Conference (SIGCOMM ’12). ACM, New York, NY, USA, 127–138. DOI:
[21]
Shuihai Hu, Kai Chen, Haitao Wu, Wei Bai, Chang Lan, Hao Wang, Hongze Zhao, and Chuanxiong Guo. 2015. Explicit path control in commodity data centers: Design and applications. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15). 15–28. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/hu
[22]
Xin Sunny Huang, Xiaoye Steven Sun, and T. S. Eugene Ng. 2016. Sunflow: Efficient optical circuit scheduling for coflows. In Proceedings of the 12th International on Conference on Emerging Networking Experiments and Technologies (CoNEXT ’16). ACM, New York, NY, USA, 297–311. DOI:
[23]
J. Jiang, S. Ma, B. Li, and B. Li. 2016. Tailor: Trimming coflow completion times in datacenter networks. In Proceedings of the 2016 25th International Conference on Computer Communication and Networks (ICCCN ’16). 1–9. DOI:
[24]
Samir Khuller and Manish Purohit. 2016. Brief announcement: Improved approximation algorithms for scheduling co-flows. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’16). ACM, New York, NY, USA, 239–240. DOI:
[25]
Huiba Li, Yiming Zhang, Dongsheng Li, Zhiming Zhang, Shengyun Liu, Peng Huang, Zheng Qin, Kai Chen, and Yongqiang Xiong. 2019. URSA: Hybrid block storage for cloud-scale virtual disks. In Proceedings of the 14th EuroSys Conference (EuroSys ’19). ACM, Article 15, 17 pages. DOI:
[26]
Yupeng Li, Shaofeng H.-C. Jiang, Haisheng Tan, Chenzi Zhang, Guihai Chen, Jipeng Zhou, and Francis C. M. Lau. 2016. Efficient online coflow routing and scheduling. In Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc ’16). ACM, New York, NY, USA, 161–170. DOI:
[27]
Z. Li, Y. Zhang, Y. Zhao, and D. Li. 2016. Efficient semantic-aware coflow scheduling for data-parallel jobs. In Proceedings of the 2016 IEEE International Conference on Cluster Computing (CLUSTER ’16). 154–155. DOI:
[28]
S. Luo, H. Yu, and L. Li. 2016. Decentralized deadline-aware coflow scheduling for datacenter networks. In Proceedings of the 2016 IEEE International Conference on Communications (ICC ’16). 1–6. DOI:
[29]
S. Luo, H. Yu, Y. Zhao, S. Wang, S. Yu, and L. Li. 2016. Towards practical and near-optimal coflow scheduling for data center networks. IEEE Transactions on Parallel and Distributed Systems 27, 11 (Nov, 2016), 3366–3380. DOI:
[30]
S. Ma, J. Jiang, B. Li, and B. Li. 2016. Chronos: Meeting coflow deadlines in data center networks. In Proceedings of the 2016 IEEE International Conference on Communications (ICC ’16). 1–6. DOI:
[31]
Ali Munir, Ghufran Baig, Syed M. Irteza, Ihsan A. Qazi, Alex X. Liu, and Fahad R. Dogar. 2014. Friends, not foes: Synthesizing existing transport strategies for data center networks. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM ’14). ACM, New York, NY, USA, 491–502. DOI:
[32]
Kanthi Nagaraj, Dinesh Bharadia, Hongzi Mao, Sandeep Chinchali, Mohammad Alizadeh, and Sachin Katti. 2016. NUMFabric: Fast and flexible bandwidth allocation in datacenters. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM ’16). ACM, New York, NY, USA, 188–201. DOI:
[33]
Y. Peng, K. Chen, G. Wang, W. Bai, Z. Ma, and L. Gu. 2014. HadoopWatch: A first step towards comprehensive traffic forecasting in cloud computing. In Proceedings of the 2014 IEEE Conference on Computer Communications (IEEE INFOCOM ’14). 19–27. DOI:
[34]
Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, and Hans Fugal. 2014. Fastpass: A centralized “zero-queue” datacenter network. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM ’14). ACM, New York, NY, USA, 307–318. DOI:
[35]
Zhen Qiu, Cliff Stein, and Yuan Zhong. 2015. Minimizing the total weighted completion time of coflows in datacenter networks. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, USA, 294–303.
[36]
Zhen Qiu, Cliff Stein, and Yuan Zhong. 2015. Minimizing the total weighted completion time of coflows in datacenter networks. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’15). ACM, New York, NY, USA, 294–303. DOI:
[37]
Mehrnoosh Shafiee and Javad Ghaderi. 2018. An improved bound for minimizing the total weighted completion time of coflows in datacenters. IEEE/ACM Transactions on Networking 26, 4 (2018), 1674–1687.
[38]
H. Susanto, Hao Jin, and Kai Chen. 2016. Stream: Decentralized opportunistic inter-coflow scheduling for datacenter networks. In Proceedings of the 2016 IEEE 24th International Conference on Network Protocols (ICNP ’16). 1–10. DOI:
[39]
H. Wang, L. Chen, K. Chen, Z. Li, Y. Zhang, H. Guan, Z. Qi, D. Li, and Y. Geng. 2015. FLOWPROPHET: Generic and accurate traffic prediction for data-parallel cluster computing. In Proceedings of the 2015 IEEE 35th International Conference on Distributed Computing Systems. 349–358. DOI:
[40]
Haitao Wu, Zhenqian Feng, Chuanxiong Guo, and Yongguang Zhang. 2010. ICTCP: Incast congestion control for TCP in data center networks. In Proceedings of the 6th International Conference (Co-NEXT ’10). ACM, New York, NY, USA, Article 13, 12 pages. DOI:
[41]
R. Yu, G. Xue, X. Zhang, and J. Tang. 2016. Non-preemptive coflow scheduling and routing. In Proceedings of the 2016 IEEE Global Communications Conference (GLOBECOM ’16). 1–6. DOI:
[42]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. 10.
[43]
Hong Zhang, Li Chen, Bairen Yi, Kai Chen, Mosharaf Chowdhury, and Yanhui Geng. 2016. CODA: Toward automatically identifying and scheduling coflows in the dark. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM ’16). ACM, New York, NY, USA, 160–173. DOI:
[44]
Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu, Haitao Wu, and Yongqiang Xiong. 2015. CubicRing: Enabling one-hop failure detection and recovery for distributed in-memory storage systems. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15). 529–542. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/zhang
[45]
Y. Zhao, K. Chen, W. Bai, M. Yu, C. Tian, Y. Geng, Y. Zhang, D. Li, and S. Wang. 2015. Rapier: Integrating routing and scheduling for coflow-aware data center networks. In Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM ’15). 424–432. DOI:

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 11, Issue 2
June 2024
164 pages
EISSN:2329-4957
DOI:10.1145/3613599
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2024
Online AM: 16 March 2024
Accepted: 28 February 2024
Revised: 28 February 2024
Received: 03 March 2023
Published in TOPC Volume 11, Issue 2

Check for updates

Author Tags

  1. Decentralized scheduling
  2. data-parallel tasks
  3. coflows
  4. cross-layer scheduling

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • Natural Science Foundation of Hunan Province

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 310
    Total Downloads
  • Downloads (Last 12 months)310
  • Downloads (Last 6 weeks)21
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media