On the necessary and sufficient requirement of a CIOQ switch to emulate an Output Queued switch

doi:10.1016/j.osn.2011.11.006

Optical Switching and Networking

Volume 9, Issue 3, July 2012, Pages 205-213

https://doi.org/10.1016/j.osn.2011.11.006 Get rights and content

Abstract

There has been much interest to emulate the behavior of Output Queued switches. The early result of such attempts was reported by Prabhakar and McKeown using the CIOQ switches with speedup factor of 4. Subsequently, Stoica and Zhang and independently Chuang et al. showed that a speedup of 2 in conjunction with their scheduling schemes would be sufficient for CIOQ switches to emulate Output Queued switches.

Additionally, Chuang et al. showed that in “Average Sense” a speedup of 2−1/N is necessary and sufficient for CIOQ to emulate Output Queued switch behavior.

Our paper reports that in the “Strict Sense” a speedup of 2 is both necessary and sufficient. We show this requirement using examples for 2x2 and 3x3 switches. Then, with a constructed traffic pattern, it is proved that in the “Strict Sense” a speedup of 2 is necessary to emulate the behavior of an Output Queued switch for any switch size N.

Combining this result with the previous scheduling schemes, we conclude that in the “Strict Sense”, a speedup of 2 is the necessary and sufficient condition to emulate the behavior of an Output Queued switch, using a CIOQ switch.

Additionally, easing the assumptions and allowing the packet segmentation, it is shown that the speedup requirement to emulate the behavior of an Output Queued switch can be reduced to values even smaller than 2−1/N. For this case a lower bound of 3/2 and an upper bound of 2 is proved.

Introduction

For the past few decades, there have been constant efforts in comparison and compromise between Input Queued switches and Output Queued switches. The groundbreaking results by Karol and Hluchyj [1] which analytically placed the throughput of input buffering at 58.6%, illustrated the Head-of-Line (HOL) characteristics of input buffering. Since then, attempts to improve this performance or remove its HOL characteristics have been continued. Various simple schemes such as looking ahead in the queues [2], [3], [4], channel grouping [5], [6], [7], [8], using a simple speedup factor [9], [10], [11], [12], [13] or using Virtual Output Queues [14], [15] have been used to improve the throughput of Input Queued switches. Also, to achieve this goal, more complicated schemes such as using non-FIFO buffers [4], priority queueing [16], [17], using parallel or cascaded planes [18], [19] were presented. Scheduling methods such as iSLIP [15], Maximal Matching, PIM, Round Robin [15], [17], [20], [21], LQF (Longest Queue First), OCF (Oldest Cell First), and their variations [22], [23] have been introduced to reach the 100% throughput.

The 100% throughput would be achieved in these schemes, but only after a long time of running the system or when the queues are saturated. However, because of the performance limitation of these solutions during the short time windows, the existence of possible unfairness or starvation amongst different ports, and the possibility of large delays, there have been much interest and many attempts to emulate the behavior of Output Queued switches by Input Queued switches.

The early result of such attempts was reported by Prabhakar and McKeown [24], in which using the Combined Input and Output Queued (CIOQ) switch with limited speedup factor of 4, they were able to emulate the OQ switch. The importance of this result was in their speedup requirement which should be compared to the speedup requirement of an Output Queued switch of size $N$ , which is $N$ . Later on, Stoica and Zhang [25] and independently Chuang et al. [26], [27] introduced other scheduling schemes and showed that a speedup of 2 in conjunction with their scheme would be sufficient for a CIOQ switch to behave identically as an Output Queued switch. Work on this subject is ongoing [17], [23], [28].

In their widely cited papers [26], [27], Chuang et al. have also shown that in “Average Sense” the speedup of $2 - 1 / N$ is both necessary and sufficient for the CIOQ in order to emulate Output Queued switch behavior. In “Average Sense” they measure the average speed up requirement among different cell time.

Our paper, reports that in the “Strict Sense” a speedup of 2 is both necessary and sufficient. By the “Strict Sense”, we mean that the speedup is the same in all cell times, and we compute the “Minimum” speedup that is required in any cell time. We show this requirement using examples for 2×2 and 3×3 switches. Using the same assumptions as in [26], [27] and employing examples for 2×2 and 3×3 switches, it is shown that in the “Strict Sense” the speedup requirement of $2 - 1 / N$ is not sufficient to emulate the behavior of an Output Queued switch.

Also, using a constructed traffic pattern and the same assumptions as in [26], we show that in the “Strict Sense”, a speedup of 2 is necessary to emulate the behavior of an Output Queued switch for any switch size $N$ .

Combining this result with the previous scheduling schemes of Stoica and Zhang [25] or Chuang et al. [26], [27], we show that in the “Strict Sense”, a speedup of 2 is both the necessary and sufficient condition to emulate the behavior of an Output Queued switch using a Combined Input Output Queued switch [29].

Additionally, relaxing the condition of sending each packet as a single unit through the switch and allowing for its segmentation to the smaller units, it is shown that in the “Strict Sense”, the speedup requirement to emulate the behavior of an Output Queued switch might be reduced. For this case a lower bound value of 3/2 and an upper bound value of 2 is proved.

Finally, again in the “Strict Sense”, it is proved that as $N$ approaches the infinity and even if segment size approaches infinitesimally small values, the speedup requirement would be a non-decreasing value of $N$ , and this speedup would converge to a value between 3/2 and 2.

The organization of the paper is as follows. In Section 2, using examples for 2×2 and 3×3 switches insufficiency of $2 - 1 / N$ speedup is demonstrated in the “Strict Sense”. In Section 3, the necessary condition for 2×2 switches is extracted. Using this result, in Section 4, a worst case traffic is constructed to prove that in the “Strict Sense”, a speedup of 2 is necessary for any switch size $N$ . Section 5, demonstrates the improvement in speedup and its limitations when the segmentation of packets into smaller units is allowed. Finally, Section 6 concludes the paper.

Section snippets

Insufficiency argument: examples

In the “Strict Sense”, the following assumptions are used which are the same as those used in [26].

(1)
Packets are of the same size.
(2)
Packets arriving in a given timeslot cannot leave before the start of the next timeslot. In other words they need to be completely received before they can start to be transmitted.
(3)
Packet transmission time is the timeslot length divided by the internal speedup factor.
(4)
If speedup factor is less than 2, two complete packets cannot leave from the same input port or arrive

Main idea: necessary condition for $2 \times 2$ switches

Now, the question arises that: what is the necessary speedup requirement for a 2×2 switch in the “Strict Sense”? To answer this question, the following theorem is used.

Theorem 1

In the “Strict Sense” the speedup 2 is necessary for a 2×2 switch to emulate the OQ behavior.

Proof

The proof is based on the traffic pattern which is depicted in Fig. 5(a). In this figure a systematic traffic pattern is shown and the following observations are made.

(a)
Let us assume that the speedup factor is less than 2, shortly it will

Generalizing the necessary condition for any switch size

Now, it is possible to create a general traffic pattern for any switch size as it is shown in Fig. 6(a). Using this traffic pattern it is shown that in the “Strict Sense”, the same speedup factor of 2 is necessary for any switch size. In fact, as it is shown in Fig. 6(a), for this traffic pattern it is enough to use only two ports from all the input ports. (Please note that actually we can use the same traffic pattern of Fig. 5(a) for an $N \times N$ switch and assume that only 2 of the input ports and

Speedup requirements when packet segmentation is allowed

Till now it has been assumed that packets would be switched intact and without segmentation, (the 5th assumption of Section 2).

It will be shown that relaxing this condition of switching each packet as a unit, or in another word, segmenting each packet into smaller fragments and allowing each fragment to be switched independently; will reduce the speedup requirement of the switch.

Now, there would be a question:

(a)
“What is the speedup requirement for a given fragments size f and/or switch size n?”

Conclusion

In this paper introducing a constructive method of creating a traffic pattern and using the same assumptions as in [26], it is shown that, in the “Strict Sense”, for any size switch the speedup factor of 2 is the necessary condition to emulate the behavior of an Output Queued switch using a Combined Input Output Queued switch (while assuming that packets are switched intact inside the switch).

Combining this result with the previous schemes of [25], [26], in the “Strict Sense”, makes the speedup

References (30)

A. Kesselman et al.
Scheduling policies for CIOQ switches
Elsevier Journal of Algorithms
(2006)
M. Karol et al.
Input versus output queueing on a space-. division packet switch
IEEE Transactions on Communications
(1987)
M. Hluchyj et al.
Queueing in high-performance packet switching
IEEE Journal on Selected Areas in Communications
(1988)
M.J. Karol et al.
Performance analysis of a growable architecture for broad-band packet (ATM) switching
IEEE Transactions on Communications
(1992)
J. Choi et al.
Performance study of an input and output queueing ATM switch with a window scheme and a speed constraint
Springer Journal of Telecommunication Systems
(1996)
A. Pattavina
Multichannel bandwidth allocation in a broadband packet switch
IEEE Journal on Selected Areas in Communications
(1988)
M.J. Karol, K.Y. Eng, H. Obara, Improving the performance of input-queued ATM packet switches, IEEE, INFOCOM ’92....
P.S. Min et al.
A nonblocking architecture for broadband multichannel switching
IEEE/ACM Transactions on Networking (TON)
(1995)
A.Y.M. Lin et al.
On the performance of an ATM switch with multi-channel transmission groups
IEEE Transactions on Communications
(1993)
Y. Oie, M. Murata, K. Kuota, H. Miyahara, Effect of speedup in nonblocking packet switch, IEEE International Conference...

S.C. Liew, Performance of input-buffered and output-buffered ATM switches under bursty traffic: simulation study, IEEE...

A.K. Gupta, N.D. Georganas, Analysis of a packet switch with input and output buffers and speed constraints, in:...

H. Kim et al.

Performance analysis of the multiple input-queued packet switch with the restricted rule

IEEE/ACM Transactions on Networking (TON)

(2003)

Jiang Xie et al.

Speedup and buffer division in input/output queueing ATM switches

IEEE Transactions on Communications

(2003)

W.J. Dally

Virtual-channel flow control

IEEE Transactions on Parallel Distributed Systems

(1992)

Cited by (0)

View full text

On the necessary and sufficient requirement of a CIOQ switch to emulate an Output Queued switch

Abstract

Introduction

Section snippets

Insufficiency argument: examples

Main idea: necessary condition for 2×2 switches

Generalizing the necessary condition for any switch size

Speedup requirements when packet segmentation is allowed

Conclusion

Elsevier Journal of Algorithms

Input versus output queueing on a space-. division packet switch

IEEE Transactions on Communications

Queueing in high-performance packet switching

IEEE Journal on Selected Areas in Communications

Performance analysis of a growable architecture for broad-band packet (ATM) switching

IEEE Transactions on Communications

Performance study of an input and output queueing ATM switch with a window scheme and a speed constraint

Springer Journal of Telecommunication Systems

Multichannel bandwidth allocation in a broadband packet switch

IEEE Journal on Selected Areas in Communications

A nonblocking architecture for broadband multichannel switching

IEEE/ACM Transactions on Networking (TON)

On the performance of an ATM switch with multi-channel transmission groups

IEEE Transactions on Communications

Performance analysis of the multiple input-queued packet switch with the restricted rule

IEEE/ACM Transactions on Networking (TON)

Speedup and buffer division in input/output queueing ATM switches

IEEE Transactions on Communications

Virtual-channel flow control

IEEE Transactions on Parallel Distributed Systems

Main idea: necessary condition for $2 \times 2$ switches