ABSTRACT
We present a simple and near optimal randomized parallel scheduling algorithm for scheduling packets in routers based on the Switch-Memory-Switch (SMS)architecture, which emulates 'output queuing' by using a collection of small memories within the switch to buffer packets, and which forms the basis of the fastest routers in use today. For a router with N inputs and N outputs, our algorithm computes the schedule in O(log* N) rounds, where a round is a communication of a few bits between input ports and memory together with simple local computation at the inputs and memory. Furthermore, by using an O(log* N) deep pipeline at each input, our algorithm computes the schedule in a constant number of rounds. Our pipelined algorithm is quite simple and achieves optimal (i.e.,constant) throughput with a tiny O(log* N) delay.We show that the total amount of buffer memory required by our algorithm is close to the minimum required. We also show that the number of buffer memories is within an εN additive term of 2N -- 1, for any positive constant ù>0 (and is within an additive term of o(N)for the basic scheduler), where 2N -- 1 is the minimum number of memories needed under adversarial placement of packets. Furthermore we show that the number of extra memories that we use over the minimum of N that is required in the offline version, is within a constant factor of the minimum required by any on-line scheduler, even if that scheduler is allowed to fail occasionally.Our scheduling algorithm is randomized and works with high probability in N. We also prove that it has the 'self-stabilizing' property, i.e., it resumes its normal behavior if occasional lapses occur due to the probabilistic nature of the algorithm.
- Noga Alon and Joel H. Spencer. The Probabilistic Method. Wiley, John & Sons, Incorporated, 2000.Google Scholar
- T. Anderson, S. Owicki, J. Saxe, and C. Thacker. High-speed switch scheduling for local area networks. ACM Transactions on Computer Systems, November 1993. Google ScholarDigital Library
- A. Aziz, A. Prakash, and V. Ramachandran. A log log N stage pipelined scheduler for SMS architecture. manuscript, November 2002.Google Scholar
- R. Barker, P. Massiglia, and L. Krantz. Storage Area Networking Essentials. McGraw-Hill, 2001.Google Scholar
- C.-S. Chang, D.-S. Lee, and Y.-S. Jou. Load balanced Birkhoff-von Neumann switches, part I: one-stage buffering. Computer Communications, 2001. Google ScholarDigital Library
- C.-S. Chang, D.-S. Lee, and C.-M. Lien. Load balanced Birkhoff-von Neumann switches, part II: multi-stage buffering. Computer Communications, 2001. Google ScholarDigital Library
- S.-T. Chuang, A. Goel, N. McKeown, and B. Prabhakar. Matching output queueing with a combined input output queued switch. In IEEE Infocom, 1999.Google Scholar
- A. Czumaj, F. Meyer auf de Heide, and V. Stemann. Contention resolution in hashing based shared memory simulations. SIAM Jour. Comput., 29(5), 2000. Google ScholarDigital Library
- J. Duato. Interconnection Networks. Morgan-Kaufmann, 2002. Google ScholarDigital Library
- A. E. Eckberg and T. C. Hou. Effects of output buffer sharing on buffer requirements in an atdm packet switch. In IEEE Infocom, 1988.Google Scholar
- M. Farley. Building storage area networks. McGraw-Hill, 2001.Google Scholar
- W. Futral. InfiniBand Architecture: Development and Deployment--A Strategic Guide to Server I/O Solutions. Intel Press, 2001.Google Scholar
- John Hennessy, David Patterson, and David Goldberg. Computer Architecture: A Quantitative Approach. Morgan-Kaufmann, third edition, 2002. Google ScholarDigital Library
- M. Hluchyj and M. Karol. Queueing in high-performance packet switches. IEEE Journal on Selected Areas in Communications, 6(9), December 1988.Google ScholarDigital Library
- S. Keshav. An Engineering Approach to Computer Networking. Addison-Wesley, 1997. Google ScholarDigital Library
- S. Keshav and R. Sharma. Issues and Trends in Router Design. IEEE Communication Magazine, 1998. Google ScholarDigital Library
- G. Lev, N. Pippenger, and L. Valiant. A fast parallel algorithm for routing in permutation networks. IEEE Transactions on Computers, 30(2), February 1981.Google ScholarCross Ref
- Y. Matias and U. Vishkin. Towards a theory of nearly constant time parallel algorithms. In Proc. IEEE FOCS, 1991. Google ScholarDigital Library
- N. McKeown. iSLIP: A Scheduling Algorithm for Input-Queued Switches. IEEE Transactions on Networking, 7(2), April 1999. Google ScholarDigital Library
- N. McKeown, V. Anantharam, and J. Walrand. Achieving 100% throughput in an input-queued switch. In IEEE Infocom, 1996. Google ScholarDigital Library
- N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick, and M. Horowitz. The tiny tera: a packet switch core. IEEE Micro, 17(1):27--33, January 1997. Google ScholarDigital Library
- Juniper Networks. High speed switching device. US Patent 5,905,726, 1999.Google Scholar
- A. Pattavina. Switching Theory. Wiley, John & Sons, Incorporated, 2000.Google Scholar
- L. Peterson and B. Davie. Computer Networks. Morgan-Kaufmann, 2000. Google ScholarDigital Library
- A. Prakash, S. Sharif, and A. Aziz. An O(lg2n) algorithm for output queuing. In IEEE Infocom, 2002.Google Scholar
- R. Ramaswami and K. Sivarajan. Optical Networks: A Practical Perspective. Morgan-Kaufmann, 2001. Google ScholarDigital Library
- T. Stern and K. Bala. Multiwavelength optical networks: a layered approach. Prentice-Hall, 1999. Google ScholarDigital Library
- J. van Lint and R. Wilson. A Course in Combinatorics. Cambridge University Press, 1992.Google Scholar
- A. Wilson, J. Schade, and R. Thornburg. Introduction to PCI Express. Intel Press, 2002.Google Scholar
Index Terms
- A near optimal scheduler for switch-memory-switch routers
Recommendations
Toward Optimal Bounds in the Congested Clique: Graph Connectivity and MST
PODC '15: Proceedings of the 2015 ACM Symposium on Principles of Distributed ComputingWe study two fundamental graph problems, Graph Connectivity (GC) and Minimum Spanning Tree (MST), in the well-studied Congested Clique model, and present several new bounds on the time and message complexities of randomized algorithms for these ...
Partially effective randomization in simulations between arbitrary and common PRAMs
It is known that Θ(log n/log log n) steps are needed to simulate one step of ARBITRARY CRCW PRAMs by COMMON CRCW PRAMs, but it was open whether there is a faster simulation when randomization is allowed. This paper gives both positive and negative ...
Subquadratic Algorithms for 3SUM
We obtain subquadratic algorithms for 3SUM on integers and rationals in several models. On a standard word RAM with w-bit words, we obtain a running time of $O(n^{2}/\max\{\frac{w}{\lg^{2}w},\frac{\lg^{2}n}{(\lg\lg n)^{2}}\})$ . In the circuit RAM with ...
Comments