skip to main content
10.1145/3098822.3098840acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

Credit-Scheduled Delay-Bounded Congestion Control for Datacenters

Published: 07 August 2017 Publication History

Abstract

Small RTTs (~tens of microseconds), bursty flow arrivals, and a large number of concurrent flows (thousands) in datacenters bring fundamental challenges to congestion control as they either force a flow to send at most one packet per RTT or induce a large queue build-up. The widespread use of shallow buffered switches also makes the problem more challenging with hosts generating many flows in bursts. In addition, as link speeds increase, algorithms that gradually probe for bandwidth take a long time to reach the fair-share. An ideal datacenter congestion control must provide 1) zero data loss, 2) fast convergence, 3) low buffer occupancy, and 4) high utilization. However, these requirements present conflicting goals.
This paper presents a new radical approach, called ExpressPass, an end-to-end credit-scheduled, delay-bounded congestion control for datacenters. ExpressPass uses credit packets to control congestion even before sending data packets, which enables us to achieve bounded delay and fast convergence. It gracefully handles bursty flow arrivals. We implement ExpressPass using commodity switches and provide evaluations using testbed experiments and simulations. ExpressPass converges up to 80 times faster than DCTCP in 10 Gbps links, and the gap increases as link speeds become faster. It greatly improves performance under heavy incast workloads and significantly reduces the flow completion times, especially, for small and medium size flows compared to RCP, DCTCP, HULL, and DX under realistic workloads.

Supplementary Material

WEBM File (creditscheduleddelayboundedcongestioncontrolfordatacenters.webm)

References

[1]
Alexandru Agache and Costin Raiciu. 2015. Oh Flow, Are Thou Happy? TCP Sendbuffer Advertising for Make Benefit of Clouds and Tenants. In Proceedings of the 7th USENIX Conference on Hot Topics in Cloud Computing.
[2]
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In ACM SIGCOMM.
[3]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center TCP (dctcp). In ACM SIGCOMM.
[4]
Mohammad Alizadeh, Adel Javanmard, and Balaji Prabhakar. 2011. Analysis of DCTCP: stability, convergence, and fairness. In Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems.
[5]
Mohammad Alizadeh, Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, and Masato Yasuda. 2012. Less is more: trading a little bandwidth for ultra-low latency in the data center. In USENIX Symposium on Networked Systems Design and Implementation.
[6]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pfabric: Minimal near-optimal datacenter transport. In ACM SIGCOMM.
[7]
Ganesh Ananthanarayanan, Srikanth Kandula, Albert G Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the Outliers in Map-Reduce Clusters using Mantri. In USENIX OSDI.
[8]
Arista Networks. 2016. Architecting Low Latency Cloud Networks. https://www.arista.com/assets/data/pdf/CloudNetworkLatency.pdf. (2016). [Online; accessed Jan-2017].
[9]
Arista Networks. 2016. Arista 7280R Series Data Center Switch Router Data Sheet. https://www.arista.com/assets/data/pdf/Datasheets/7280R-DataSheet.pdf. (2016). [Online; accessed Jan-2017].
[10]
Arista Networks. 2017. 7050SX Series 10/40G Data Center Switches Data Sheet. https://www.arista.com/assets/data/pdf/Datasheets/7050SX-128_64_Datasheet.pdf. (2017). [Online; accessed Jan-2017].
[11]
Wei Bai, Li Chen, Kai Chen, Dongsu Han, Chen Tian, and Hao Wang. 2015. Information-agnostic flow scheduling for commodity data centers. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15).
[12]
Andreas Bechtolsheim, Lincoln Dale, Hugh Holbrook, and Ang Li. 2016. Why Big Data Needs Big Buffer Switches. https://www.arista.com/assets/data/pdf/Whitepapers/BigDataBigBuffers-WP.pdf. (2016). [Online; accessed Jan-2017].
[13]
Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network Traffic Characteristics of Data Centers in the Wild. In Proc. 10th ACM SIGCOMM Conference on Internet Measurement.
[14]
Bob Briscoe and Koen De Schepper. 2015. Scaling tcp's congestion window for small round trip times. Technical report TR-TUB8-2015-002, BT (2015).
[15]
Broadcom. 2012. Smart-Hash --- Broadcom. https://docs.broadcom.com/docs/12358326. (2012). [Online; accessed Jan-2017].
[16]
Jay Chen, Janardhan Iyengar, Lakshminarayanan Subramanian, and Bryan Ford. 2011. TCP Behavior in Sub Packet Regimes. In Proc. ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 2.
[17]
Cisco. 2013. Nexus 7000 FabricPath. http://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-series-switches/white_paper_c11-687554.html. (2013). [Online; accessed Jan-2017; Section 7.2.1 Equal-Cost Multipath Forwarding].
[18]
Chelsio Communications. 2013. Preliminary Ultra Low Latency Report. http://www.chelsio.com/wp-content/uploads/2013/10/Ultra-Low-Latency-Report.pdf. (2013). [Online; accessed Jan-2017].
[19]
Sujal Das and Rochan Sankar. 2012. Broadcom Smart-Buffer Technology in Data Center Switches for Cost-Effective Performance Scaling of Cloud Applications. https://docs.broadcom.com/docs/12358325. (2012). [Online; accessed Jan-2017].
[20]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008).
[21]
Dell. 2015. Dell Networking Configuration Guide for the MXL 10/40GbE Switch I/O Module 9.9(0.0). http://topics-cdn.dell.com/pdf/force10-mxl-blade_Service%20Manual4_en-us.pdf. (2015). [Online; accessed Jan-2017. Enabling Deterministic ECMP Next Hop (pp.329)].
[22]
Advait Dixit, Pawan Prakash, Y Charlie Hu, and Ramana Rao Kompella. 2013. On the impact of packet spraying in data center networks. In INFOCOM, 2013 Proceedings IEEE. IEEE.
[23]
Nandita Dukkipati. 2008. Rate Control Protocol (RCP): Congestion control to make flows complete quickly. Stanford University.
[24]
Nandita Dukkipati, Masayoshi Kobayashi, Rui Zhang-Shen, and Nick McKeown. 2005. Processor sharing flows in the internet. In International Workshop on Quality of Service.
[25]
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 12.
[26]
Peter X Gao, Akshay Narayan, Gautam Kumar, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2015. pHost: Distributed near-optimal datacenter transport over commodity network fabric. In ACM CoNEXT.
[27]
Rajib Ghosh and George Varghese. 2001. Modifying Shortest Path Routing Protocols to Create Symmetrical Routes. (2001). UCSD technical report CS2001-0685, September 2001.
[28]
Albert Greenberg, James R Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: a scalable and flexible data center network. In ACM SIGCOMM.
[29]
Sangtae Ha, Injong Rhee, and Lisong Xu. 2008. CUBIC: a new TCP-friendly high-speed TCP variant. ACM SIGOPS Operating Systems Review 42, 5 (2008).
[30]
Dongsu Han, Robert Grandl, Aditya Akella, and Srinivasan Seshan. 2013. FCP: A Flexible Transport Framework for Accommodating Diversity. In ACM SIGCOMM.
[31]
Sangjin Han, Keon Jang, Aurojit Panda, Shoumik Palkar, Dongsu Han, and Sylvia Ratnasamy. 2015. SoftNIC: A software NIC to augment hardware. In Technical Report UCB/EECS-2015-155. EECS Department, University of California, Berkeley.
[32]
Jiawei Huang, Yi Huang, Jianxin Wang, and Tian He. 2015. Packet slicing for highly concurrent TCPs in data center networks with COTS switches. In IEEE ICNP.
[33]
Raj Jain, Dah-Ming Chiu, and William R Hawe. 1984. A quantitative measure of fairness and discrimination for resource allocation in shared computer system. (1984).
[34]
Lavanya Jose, Lisa Yan, Mohammad Alizadeh, George Varghese, Nick McKeown, and Sachin Katti. 2015. High speed networks need proactive congestion control. In Proceedings of the 14th ACM Workshop on Hot Topics in Networks.
[35]
Dina Katabi, Mark Handley, and Charlie Rohrs. 2002. Congestion control for high bandwidth-delay product networks. In ACM SIGCOMM.
[36]
HT Kung, Trevor Blackwell, and Alan Chapman. 1994. Credit-based flow control for ATM networks: credit update protocol, adaptive credit allocation and statistical multiplexing. In ACM SIGCOMM.
[37]
Jean-Yves Le Boudec and Patrick Thiran. 2001. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet. Springer-Verlag, Berlin, Heidelberg.
[38]
Changhyun Lee, Chunjong Park, Keon Jang, Sue Moon, and Dongsu Han. 2015. Accurate latency-based congestion feedback for datacenters. In USENIX Annual Technical Conference.
[39]
Steven McCanne, Sally Floyd, Kevin Fall, Kannan Varadhan, and others. 1997. Network simulator ns-2. (1997).
[40]
Microsoft. 2015. Azure support for Linux RDMA. https://azure.microsoft.com/en-us/updates/azure-support-for-linux-rdma. (2015). Online; accessed 12-July-2016.
[41]
Radhika Mittal, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, David Zats, and others. 2015. TIMELY: RTT-based Congestion Control for the Datacenter. In ACM SIGCOMM.
[42]
Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, and Scott Shenker. 2014. Recursively Cautious Congestion Control. In USENIX Conference on Networked Systems Design and Implementation.
[43]
Ali Munir, Ghufran Baig, Syed M Irteza, Ihsan A Qazi, Alex X Liu, and Fahad R Dogar. 2014. Friends, not foes: synthesizing existing transport strategies for data center networks. In ACM SIGCOMM.
[44]
Kanthi Nagaraj, Dinesh Bharadia, Hongzi Mao, Sandeep Chinchali, Mohammad Alizadeh, and Sachin Katti. 2016. NUMFabric: Fast and Flexible Bandwidth Allocation in Datacenters. In ACM SIGCOMM. 14.
[45]
Juniper Networks. 2016. Configuring PIC-Level Symmetrical Hashing for Load Balancing on 802.3ad LAGs for MX Series Routers. https://www.juniper.net/techpubs/en_US/junos15.1/topics/task/configuration/802-3ad-lags-load-balancing-symmetric-hashing-mx-series-pic-level-configuring.html. (2016). [Online; accessed Jan-2017].
[46]
Jitendra Padhye, Victor Firoiu, Don Towsley, and Jim Kurose. 1998. Modeling TCP throughput: A simple model and its empirical validation. ACM SIGCOMM Computer Communication Review 28, 4 (1998).
[47]
Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, and Hans Fugal. 2014. Fastpass: A centralized zero-queue datacenter network. In ACM SIGCOMM.
[48]
Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. 2014. SENIC: Scalable NIC for End-Host Rate Limiting. In NSDI, Vol. 14.
[49]
Sivasankar Radhakrishnan, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. 2013. NicPic: Scalable and Accurate End-Host Rate Limiting. In USENIX HotCloud.
[50]
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. 2015. Inside the social network's (datacenter) network. In ACM SIGCOMM Computer Communication Review, Vol. 45. ACM.
[51]
M. Schlansker, J. Tourrilhes, and Y. Turner. 2015. Method for routing data packets in a fat tree network. (April 14 2015). https://www.google.com/patents/US9007895 US Patent 9,007,895.
[52]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, and others. 2015. Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network. In ACM SIGCOMM.
[53]
David Slogsnat, Alexander Giese, and Ulrich Brüning. 2007. A Versatile, Low Latency HyperTransport Core. In ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 8.
[54]
Jim Warner. 2014. Packet Buffer. https://people.ucsc.edu/~warner/buffer.html. (2014). [Online; accessed Jan-2017].
[55]
H. Wu, Z. Feng, C. Guo, and Y. Zhang. 2013. ICTCP: Incast Congestion Control for TCP in Data-Center Networks. IEEE/ACM Transactions on Networking 21, 2 (2013).
[56]
Lisong Xu, Khaled Harfoush, and Injong Rhee. 2004. Binary increase congestion control (BIC) for fast long-distance networks. In INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies, Vol. 4. IEEE.
[57]
Xiaowei Yang, David Wetherall, and Thomas Anderson. 2005. A DoS-limiting Network Architecture. In ACM SIGCOMM.
[58]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion control for large-scale RDMA deployments. In ACM SIGCOMM.
[59]
Yibo Zhu, Monia Ghobadi, Vishal Misra, and Jitendra Padhye. 2016. ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY. In ACM CoNEXT.

Cited By

View all
  • (2025)EDM: An Ultra-Low Latency Ethernet Fabric for Memory DisaggregationProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707221(377-394)Online publication date: 30-Mar-2025
  • (2025)GraphCC: A practical graph learning-based approach to Congestion Control in datacentersComputer Networks10.1016/j.comnet.2024.110981257(110981)Online publication date: Feb-2025
  • (2025)A Survey on Congestion Control in Large Data CentersPower Devices and Internet of Things for Intelligent System Design10.1002/9781394311613.ch2(25-86)Online publication date: 13-Feb-2025
  • Show More Cited By

Index Terms

  1. Credit-Scheduled Delay-Bounded Congestion Control for Datacenters

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication
    August 2017
    515 pages
    ISBN:9781450346535
    DOI:10.1145/3098822
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Congestion Control
    2. Credit-based
    3. Datacenter Network

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SIGCOMM '17
    Sponsor:
    SIGCOMM '17: ACM SIGCOMM 2017 Conference
    August 21 - 25, 2017
    CA, Los Angeles, USA

    Acceptance Rates

    Overall Acceptance Rate 462 of 3,389 submissions, 14%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)636
    • Downloads (Last 6 weeks)62
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)EDM: An Ultra-Low Latency Ethernet Fabric for Memory DisaggregationProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707221(377-394)Online publication date: 30-Mar-2025
    • (2025)GraphCC: A practical graph learning-based approach to Congestion Control in datacentersComputer Networks10.1016/j.comnet.2024.110981257(110981)Online publication date: Feb-2025
    • (2025)A Survey on Congestion Control in Large Data CentersPower Devices and Internet of Things for Intelligent System Design10.1002/9781394311613.ch2(25-86)Online publication date: 13-Feb-2025
    • (2024)Revisiting congestion control for lossless ethernetProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691833(131-148)Online publication date: 16-Apr-2024
    • (2024)Flow scheduling with imprecise knowledgeProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691831(95-111)Online publication date: 16-Apr-2024
    • (2024)Opportunistic Packet Forwarding for Proactive Transport in Datacenters2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619903(1-9)Online publication date: 3-Jun-2024
    • (2024)PB-FS: Postcard-Based Fast Start2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619894(86-94)Online publication date: 3-Jun-2024
    • (2024)POSTER: Opportunistic Credit-Based Transport for Reconfigurable Data Center Networks with TidalProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673714(4-6)Online publication date: 4-Aug-2024
    • (2024)Rethinking Transport Protocols for Reconfigurable Data Centers: An Empirical StudyProceedings of the 1st SIGCOMM Workshop on Hot Topics in Optical Technologies and Applications in Networking10.1145/3672201.3674120(7-13)Online publication date: 4-Aug-2024
    • (2024)COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol CodesignACM Transactions on Architecture and Code Optimization10.1145/366052521:3(1-26)Online publication date: 22-Apr-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media