skip to main content
10.1145/3341558.3342200acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Efficient Dynamic Isolation of Congestion in Lossless DataCenter Networks

Published: 14 August 2019 Publication History

Abstract

The architecture of modern DataCenters (DCs) has evolved to meet the stringent communication latency requirements of applications. RDMA technologies such as RoCEv2 have become mainstream to reduce latency, but their performance is impaired in systems with lossy networks due to the overload introduced by packet retransmissions. Thus, lossless networks are increasingly used in DCs to avoid retransmissions delays. However, lossless networks favor the occurrence of congestion, degrading network and system performance. Traditional congestion solutions, such as backpressure or injection throttling, may be ineffective when congestion arises from traffic generated by DC applications. Hence, new efficient congestion management strategies suited to the lossless networks of modern DCs are required. In this paper, we analyze congestion and its negative effects in these scenarios. In addition, we propose and evaluate a congestion management strategy that effectively eliminates the main negative effects of congestion, based on the dynamic isolation of congested flows in special queues. Unlike previous proposals based on this approach, a single special queue is shared by all the congested flows reaching a port. We also propose enhancements to this basic strategy to optimize its efficiency.

Supplementary Material

MP4 File (p15-gonzalez-naharro.mp4)

References

[1]
T. Anderson, S. Owicki, J. Saxe, and C. Thacker. 1993. High-Speed Switch Scheduling for Local-Area Networks. ACM Transactions on Computer Systems 11, 4 (November 1993), 319--352.
[2]
Jasmeet Bagga George Porter Arjun Roy, Hongyi Zeng and Alex C. Snoeren. 2015. Inside the Social Network's (Datacenter) Network. In Proceedings of SIGCOMM '15, August 17-21, 2015, London, United Kingdom.
[3]
M. Arpaci and J. A. Copeland. 2000. Buffer management for shared-memory ATM switches. IEEE Communications Surveys Tutorials 3, 1 (First 2000), 2--10.
[4]
Wei Bai, Kai Chen, Shuihai Hu, Kun Tan, and Yongqiang Xiong. 2017. Congestion Control for High-speed Extremely Shallow-buffered Datacenter Networks. In Proceedings of the First Asia-Pacific Workshop on Networking (APNet'17). ACM, New York, NY, USA, 29--35.
[5]
Congdon, Paul. 2018. IEEE 802 Nendica Report: The Lossless Network for Data Centers. IEEE-SA Industry Connections White Paper (2018), 29. https://mentor.ieee.org/802.1/dcn/18/1-18-0042-00-ICne.pdf
[6]
Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, Jose Flich, and José Duato. 2008. FBICM: Efficient Congestion Management for High-Performance Networks Using Distributed Deterministic Routing. In High Performance Computing - HiPC 2008, 15th International Conference, Bangalore, India, December 17-20, 2008. Proceedings. 503--517.
[7]
Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, José Flich, and José Duato. 2011. OBQA: Smart and cost-efficient queue scheme for Head-of-Line blocking elimination in fat-trees. J. Parallel Distrib. Comput. 71, 11 (2011), 1460--1472.
[8]
Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, Jose Flich, and José Duato. 2013. An Effective and Feasible Congestion Management Technique for High-Performance MINs with Tag-Based Distributed Routing. IEEE Trans. Parallel Distrib. Syst. 24, 10 (2013), 1918--1929.
[9]
Hugh Barrass et al. 2018. Proposal for Priority Based Flow Control. http://www.ieee802.org/1/files/public/docs2008/new-dcb-pelissier-pfc-proposal-0308.pdf. (2018).
[10]
K. Ramakrishnan et al. 2001. The Addition of Explicit Congestion Notification (ECN) to IP https://tools.ietf.org/html/rfc3168. (2001).
[11]
P. J. García, J. Flich, J. Duato, I. Johnson, F. J. Quiles, and F. Naven. 2005. Dynamic Evolution of Congestion Trees: Analysis and Impact on Switch Architecture. In High Performance Embedded Architectures and Compilers (Lecture Notes in Computer Science). Springer, Berlin, Heidelberg, 266--285.
[12]
Pedro Javier García, Francisco J. Quiles, Jose Flich, José Duato, Ian Johnson, and Finbar Naven. 2006. Efficient, Scalable Congestion Management for Interconnection Networks. IEEE Micro 26, 5 (2006), 52--66.
[13]
Wei Lin Guay, Bartosz Bogdanski, Sven-Arne Reinemo, Olav Lysne, and Tor Skeie. 2011. vFtree - A Fat-tree Routing Algorithm using Virtual Lanes to Alleviate Congestion. In Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium, Yuan Xin (Ed.). IEEE Computer Society Press, 197--208.
[14]
InfiniBand Trade Association 2014. InfiniBand™ Architecture Specification Release 1.2.1 Annex A17: RoCEv2. InfiniBand Trade Association.
[15]
Sundar Iyer and Nick McKeown. 2001. Techniques for Fast Shared Memory Switches. (2001).
[16]
Abdul Kabbani, Mohammad Alizadeh, Masato Yasuda, Rong Pan, and Balaji Prabhakar. 2010. AF-QCN: Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers. In IEEE 18th Annual Symposium on High Performance Interconnects, HOTI 2010, Google Campus, Mountain View, California, USA, August 18-20, 2010. 58--65.
[17]
M. Karol, M. Hluchyj, and S. Morgan. 1987. Input Versus Output Queueing on a Space-Division Packet Switch. IEEE Transactions on Communications 35, 12 (Dec. 1987), 1347--1356.
[18]
Manolis Katevenis, Panagiota Vatsolaki, and Aristides Efthymiou. 1995. Pipelined Memory Shared Buffer for VLSI Switches. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM '95). ACM, New York, NY, USA, 39--48.
[19]
G. Kesidis and N. McKeown. 1997. Output-buffer ATM packet switching for integrated-services communication networks. In Proceedings of ICC'97 - International Conference on Communications, Vol. 3. 1684--1688 vol.3.
[20]
Gaspar Mora, Pedro Javier García, Jose Flich, and José Duato. 2007. RECN-IQ: A Cost-Effective Input-Queued Switch Architecture with Congestion Management. In 2007 International Conference on Parallel Processing (ICPP 2007), September 10-14, 2007, Xi-An, China. 74.
[21]
T. Nachiondo, J. Flich, and J. Duato. 2010. Buffer Management Strategies to Reduce HoL Blocking. IEEE Transactions on Parallel and Distributed Systems 21 (2010), 739--753.
[22]
Network Working Group (IETF). 2000. Analysis of an Equal-Cost Multi-Path Algorithm. (2000). https://tools.ietf.org/html/rfc2992.
[23]
S. O'Kane, S. Sezer, and C. Toal. 2005. Design and implementation of a shared buffer architecture for a gigabit Ethernet packet switch. In Proceedings 2005 IEEE International SOC Conference. 283--286.
[24]
Wladek Olesinski, Hans Eberle, and Nils Gura. 2009. Scalable Alternatives to Virtual Output Queuing. In Proceedings of IEEE International Conference on Communications, ICC 2009, Dresden, Germany, 14-18 June 2009. 1--6.
[25]
Eugene Opsasnick. 2007. Buffer Management and Flow Control Mechanishm Including Packet-Based Dynamic Thresholding. Patent Application Publication USA US 2007/0104102 A1. Assignee: Broadcomm Corporation (2007).
[26]
Amar Phanishayee, Elie Krevat, Vijay Vasudevan, David G. Andersen, Gregory R. Ganger, Garth A. Gibson, and Srinivasan Seshan. 2008. Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems. In 6th USENIX Conference on File and Storage Technologies, FAST 2008, February 26-29, 2008, San Jose, CA, USA. 175--188. http://www.usenix.org/events/fast08/tech/phanishayee.html
[27]
Yuval Tamir and Gregory L. Frazier. 1992. Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches. IEEE Trans. Computers 41, 6 (June 1992), 725--737.
[28]
Pen-Chung Yew, Nian-Feng Tzeng, and D. H. Lawrie. 1987. Distributing Hot-Spot Addressing in Large-Scale Multiprocessors. IEEE Trans. Comput. C-36, 4 (April 1987), 388--395.
[29]
Eitan Zahavi, Greg Johnson, Darren J. Kerbyson, and Michael Lang. 2010. Optimized InfiniBand™ fat-tree routing for shift all-to-all communication patterns. Journal of CCPE 22, 2 (2010), 217--231.
[30]
Yibo Zhu. 2016. NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer switch. Technical Report. https://github.com/bobzhuyb/ns3-rdma

Cited By

View all
  • (2024)A New Mechanism to Identify Congesting Packets in High-Performance Interconnection Networks2024 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI63208.2024.00016(24-32)Online publication date: 21-Aug-2024
  • (2024)A Hybrid Solution to Provide End-to-End Flow Control and Congestion Management in High-Performance Interconnection Networks2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00011(8-17)Online publication date: 6-May-2024
  • (2021)DVL-Lossy: Isolating Congesting Flows to Optimize Packet Dropping in Lossy Data-Center NetworksIEEE Micro10.1109/MM.2020.304226341:1(37-44)Online publication date: 1-Jan-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
NEAT'19: Proceedings of the ACM SIGCOMM 2019 Workshop on Networking for Emerging Applications and Technologies
August 2019
61 pages
ISBN:9781450368766
DOI:10.1145/3341558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Congestion isolation
  2. Datacenter networks
  3. Head-of-line blocking
  4. Shared-buffer switches
  5. Simulation tools

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGCOMM '19
Sponsor:
SIGCOMM '19: ACM SIGCOMM 2019 Conference
August 19, 2019
Beijing, China

Acceptance Rates

NEAT'19 Paper Acceptance Rate 8 of 18 submissions, 44%;
Overall Acceptance Rate 8 of 18 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A New Mechanism to Identify Congesting Packets in High-Performance Interconnection Networks2024 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI63208.2024.00016(24-32)Online publication date: 21-Aug-2024
  • (2024)A Hybrid Solution to Provide End-to-End Flow Control and Congestion Management in High-Performance Interconnection Networks2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00011(8-17)Online publication date: 6-May-2024
  • (2021)DVL-Lossy: Isolating Congesting Flows to Optimize Packet Dropping in Lossy Data-Center NetworksIEEE Micro10.1109/MM.2020.304226341:1(37-44)Online publication date: 1-Jan-2021
  • (2020)Optimizing Packet Dropping by Efficient Congesting-Flow Isolation in Lossy Data-Center Networks2020 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI51249.2020.00022(47-54)Online publication date: Aug-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media