# "SUPER-FAST" ESTIMATION OF CELL LOSS RATE AND CELL DELAY PROBABILITY OF ATM SWITCHES

Junjie Wang, K. Ben Letaief, and M. Hamdi \*

The Hong Kong University of Science and Technology Clear Water Bay, Hong Kong

- Abstract In this paper, we consider the evaluation of the cell loss rate (CLR) and cell delay probability (CDP) in nonblocking ATM switches using computer simulations. Specifically, we investigate the application of *importance sampling* techniques as a "super-fast" alternative to conventional Monte Carlo simulation in finding the CLR and CDP in nonblocking ATM switches. A novel "*split switch*" model is developed to decouple the input and output queueing behavior so as to reduce the simulation complexity. Numerical results demonstrate that considerable computation cost can be saved using importance sampling techniques while achieving a high degree of accuracy.
- Keywords: ATM Switch, Performance Evaluation, Importance Sampling, Cell Loss Rate, Cell Delay Probability.

# **1. INTRODUCTION**

In this paper, we consider the performance of ATM switches with respect to cell loss rate (CLR) and cell delay probability (CDP, i.e., the probability distribution of the cell delay). These are clearly some of the important issues in the switch design and a sizable amount of work has been done on the performance evaluation of ATM switches with regard to these QoS parameters [1]-[3]. Due to the complex traffic model, it turns out that close-form solution is difficult to achieve, if not impossible. Alternatively, researchers resort to simulation-based methodologies. Unfortunately, the required CLR for a typical ATM switch is smaller than  $10^{-6}$  for most practical applications. Likewise, some real-time traffic requires a delay threshold probability (i.e., the probability

<sup>\*</sup>This research work was supported in part by the Hong Kong Research Grant Council under the Grant RGC/HKUST 100/92E.

that a cell experiences a delay in excess of a particular threshold) is also around  $10^{-6}$  or even smaller [7]. Hence, a prohibitive amount of time is needed in conventional Monte Carlo (MC) simulations to obtain an estimate of these rare events for a particular accuracy requirement.

Importance Sampling (IS), as a promising technique, can significantly reduce the simulation time required to obtain accurate estimates [4]-[6]. In this paper, we consider the application of IS to the estimation of the CLR and CDP in nonblocking ATM switches. Note that the application of IS to the estimation of the CLR in ATM switches has been considered in [3; 6; 8]. However, in these studies, the switch model was a space-division ATM switch with *only* output queues. In this paper, we consider the more practical case of ATM switches with both input and output queues, which makes the analysis more complicated.

The rest of the paper is organized as follows. Section 2 gives a brief introduction of the concept of IS for the estimation of rare events. In Section 3, we propose a notion of "*split switch*" model, where we divide the performance estimation into two sub-problems which deal with the input and output queueing respectively, some IS biasing schemes are developed. Section 4 includes some simulation results illustrating the accuracy and efficiency of the proposed IS schemes. Finally, we conclude in Section 5.

# 2. IMPORTANCE SAMPLING OF RARE EVENTS

Using conventional MC simulation, the estimator for  $\alpha$  is simply the sample mean estimator based on a sequence of *i.i.d.* samples  $X^{(1)}, ..., X^{(L)}$  from the density f(.). That is,

$$\hat{\alpha} = \frac{1}{L} \sum_{\ell=1}^{L} I_E(X^{(\ell)}).$$
(1)

where  $I_E$  is an indicator function of event *E*. Practically,  $100/\alpha$  samples is required to to obtain a reliable estimate within a 10% accuracy. Hence, the estimation of probabilities of the order of  $10^{-6} - 10^{-9}$  (which are quite common in communication networks) would be difficult to achieve because of the prohibitive simulation time.

Importance sampling (IS) involves choosing  $f^*(.)$  as the simulation density such that  $f^*(x) > 0$  whenever f(x) > 0 [4]. The IS estimator is then given by

$$\hat{\alpha}^* = \frac{1}{L} \sum_{\ell=1}^{L} I_E(X^{(\ell)}) w(X^{(\ell)})$$
(2)

where  $X^{(1)}, ..., X^{(L)}$  are now *i.i.d* samples from the IS simulation density  $f^*(.)$  and w(.) is called the *importance sampling weight*, which is defined as the ratio of the true density f(.) to  $f^*(.)$ .

A close observation of (2) indicates that IS is completely dependent on the selection of  $f^*(.)$ . Then, it is easily shown that  $\operatorname{Var}^*[\hat{\alpha}^*] = \operatorname{Var}^*[I_E(X) w(X)]/L$ . Let  $L_{\zeta} \triangleq \min \{L : \frac{\sqrt{\operatorname{Var}^*[\hat{\alpha}^*]}}{\alpha} \leq \zeta\}$ . Then,  $L_{\zeta}$  is the minimum required number of IS runs to obtain a  $100 \times \zeta\%$  accuracy. To minimize  $N_{\alpha}$  or equivalently maximize the computational efficiency, we need to minimize  $\operatorname{var}^*[I_E(X) w(X)]$ . The optimal solution  $f_o^*(.)$  is given by  $f_o^*(z) = I_E(x) f(x)/\alpha$ . This optimal solution is quite general and results in an IS estimator with a zero variance [8]! Unfortunately, it is not practical since it involves the unknown estimate. However, it is useful since it indicates the features of good IS densities. For example, it implies that good IS densities which achieve high efficiencies should be biased in a way that "favors" the "important" or rare events of interest to occur more frequently. Thus, the fundamental problem in any IS scheme is to find a suitable IS method which can reduce the variance of the IS estimator and hence the number of runs to achieve a given accuracy.

### 3. FAST ESTIMATION OF CLR AND CDP

In this section, we consider the application of IS to the estimation of CLR and CDP in nonblocking ATM switches as shown in Fig. 1. The dimension of the switch is N, and the capacity of the input and output buffers is K and L, respectively. The switch has a speed-up factor m. Hence, up to m HOL cells can be selected simultaneously in an output conflict. ATM cell arrivals on the N input ports are governed by *i.i.d.* Bernoulli processes with an intensity of  $\lambda$ .



Figure 1 A Nonblocking Space-division ATM switch.

The application of IS to CLR estimation in output-queueing switches was investigated in [8], some efficient schemes were proposed. In this paper, we will extend the methods in [8] to ATM switches with both output and input queueing. Note that for ATM switches with I/O queueing, incoming cells may be lost at both the input and output queues when the cells coming to the queues find no space for them. We denote the total number of arriving cells at the switch as  $N_a$  and the number of lost cells due to input and output buffer overflow as  $N_i$  and  $N_o$ , respectively. Let  $\gamma_i$  and  $\gamma_o$  be the CLR in input and output queues, respectively. It then follows that  $\gamma_i = \frac{N_i}{N_a}$  and  $\gamma_o = \frac{N_o}{N_a - N_i}$ . Since  $N_i$  is quite small compared with  $N_a$ , we can obtain the CLR for the whole switch  $\gamma$  as follows without loss of accuracy.

$$\gamma = \frac{N_o + N_i}{N_a} \approx \gamma_i + \gamma_o. \tag{3}$$

For CDP, if we denote the cell delay incurred at the input queue and output queue as  $d_i$  and  $d_o$  (measured in slot) respectively, we can then easily compose the probability distribution of the overall delay d by convolution. That is,

$$\Pr(d = n) = \sum_{j=0}^{n} \Pr(d_i = j) \Pr(d_o = n - j).$$
(4)

One of the key contributions in this paper is the proposal of a "split switch" model, which decouples the queueing behavior of the input and output buffers according to (3) and (4). Therefore, we can estimate  $\gamma_i$  and  $\gamma_o$  (or  $d_i$  and  $d_o$ ) separately and then combine these results to yield the overall result. In our "split switch" model, two variants of input/output queuing schemes, named VIQ (Variant of Input Queuing) and VOQ (Variant of Output Queuing) as shown in Fig. 2, are developed in order to deal with the input and output queueing, respectively.



Figure 2 VIQ and VOQ schemes.

In the traditional input queuing scheme, the speed-up factor m is usually equal to 1 since the HOL blocking forms a bottleneck for the throughput of the output line. However, in order to be equivalent to the original switch model, the original speed-up factor m (m > 1) is kept in the VIQ scheme. On the other hand, in the traditional output queuing schemes, some cells may be lost if not selected in the output contention. In our VOQ scheme, they can stay in the HOL of the "virtual input queue". Specifically, the VOQ can be viewed as a combination of: (1) a group of virtual input queues; (2) an address filter to solve output contention; and (3) a tagged output queue. It can be seen that, the "split switch" model can precisely describe the operations of the original switch. The key advantage of this model is that CLR and CDP estimation in the original ATM switches can be divided into 2 sub-problems which are considered in VIQ and VOQ parts separately.

## 3.1 IS SCHEMES FOR VIQ

In the VIQ scheme, we can focus on a single input queue to obtain the CLR and CDP estimate. Suppose that  $I_n$  is the length of the tagged queue in the *n*-th slot, and let the number of arriving and departing cells in the *n*-th slot be  $H_n$  and  $G_n$ , respectively. It can be written

$$I_n = \max\left\{\min\{K, I_{n-1} + H_n - G_n\}, 0\right\}$$
(5)

where

$$\Pr(H_n = 1) = \lambda \tag{6}$$

$$\Pr(G_n = 1) = \min\{1, m/D_n\}.$$
(7)

In the above equations,  $D_n$  is the number of HOL cells which have the same destination as the head of the tagged queue. Note that although we focus on a single input queue, there exists strong correlations between the HOL cells in parallel queues, thereby, forming an N-dimensional queueing process. It is hence difficult to get a closed form solution to such a problem without some independence approximations [1; 2].

The basic idea behind the IS scheme for VIQ is to bias the probability that the cell is selected in the output contention. That is, we make the cell in the tagged queue less likely to be selected in the output contention. Thus, it is more likely to stay in the queue to hold back the arriving cells, thereby, incurring longer delay and more loss. This is done as follows: Suppose that the head of the tagged queue is destined for output j, then all the HOL cells which are also destined for output j except the tagged one, i.e.,  $D_n - 1$  cells, contend for (m - 1) winners. After that, all the cells that failed in the first round of selection plus the tagged one contend for the last chance. As a result,

$$\Pr^*(G_n = 1) = \begin{cases} 1, & \text{when } D_n \le m\\ \frac{1}{D_n - (m-1)}, & \text{when } D_n > m \end{cases}$$
(8)

### **3.2 IS SCHEMES FOR VOQ**

Suppose  $A_n^j$  is the number of cells destined for output port j which come to the HOL of input queues in the *n*-th slot. Next, let  $C_n^j$  be the number of cells destined for output port j in the *n*-th slot. It follows

$$C_n^j = \max\left\{C_{n-1}^j - m, 0\right\} + A_n^j.$$
(9)

Now consider the tagged output queue and suppose that  $O_n^j$  is the length of the output queue j during the *n*-th slot. Likewise, let  $S_n^j$  denote the number of cells which arrive at output j during the *n*-th slot. Thus we have

$$O_n^j = \max\{O_{n-1}^j - 1, 0\} + S_n^j, \tag{10}$$

$$\Pr(S_n^j = k) = \begin{cases} \Pr(C_n^j = k), & \text{when } k < m\\ \sum_{l=m}^{N} \Pr(C_n^j = l), & \text{when } k = m\\ 0 & \text{otherwise} \end{cases}$$
(11)

A close observation of (9)-(11) indicates that the only randomness in the system is  $A_n^j$ , which is binomially distributed:

$$\Pr(A_n^j = k) = \binom{F_n}{k} \left(\frac{1}{N}\right)^k \left(1 - \frac{1}{N}\right)^{F_n - k}$$
(12)

where  $F_n$  is the total number of cells coming to the HOL of all input queues in the *n*-th slot, i.e.,  $F_n = \sum_{j=1}^N A_n^j$ . Intuitively, we can derive the probability mass function of  $A_n^j$  in (12), then we can use some IS schemes to bias it to improve the estimation efficiency. However, note that  $F_n$  is not constant but depends on all cell sources. This makes it difficult to derive an explicit probability mass function of  $A_n^j$ , which will in turn makes the application of IS not straightforward.

The "virtual input queue" in the VOQ scheme is next introduced to generate traffic subject to Eqns. (9)-(12). Two biasing schemes are developed for the VOQ scheme as follows.

#### Accurate Biasing Scheme

In the original switch model, each incoming cell has an equal probability to be destined for any output port. That is,

$$\Pr(Destination = j) = \frac{1}{N}, \qquad j = 1, 2, ..., N.$$
 (13)

To apply IS, we bias the routing probability such that the incoming cells are more likely to be destined for the tagged output queue, i.e.

$$Pr^*(Destination = j) = \frac{M}{N}, \qquad j = 1, 2, ..., \frac{N}{M}$$
(14)

where M is defined as the *routing weight*. Hence we have

$$\Pr^*(A_n^j = k) = \binom{F_n}{k} \left(\frac{M}{N}\right)^k \left(1 - \frac{M}{N}\right)^{F_n - k}.$$
(15)

The scheme is called "accurate" since no approximation is made here (in contrast to the other biasing scheme we formulate below).

#### Approximate biasing scheme

In [1], it has been demonstrated that when the size of ATM switches, N, goes to infinity,  $A_n^j$  is subject to a Poisson distribution with intensity  $\rho = \bar{F}_n/N$  ( $\bar{F}_n$  is the mean of  $F_n$ ). That is, we have

$$\mathsf{P}_{\mathbf{k}} = \Pr(A_n^j = k) = \frac{\rho^k e^{-\rho}}{k!}.$$
(16)

Such an approximation is reasonable when a large-scale ATM switch is considered and if only a rough estimate is required. Therefore, we can apply some IS schemes developed in [8] to directly bias the arrival process  $A_n^j$  in (12). For example, we can use an exponential biasing scheme. As a result, we get:

$$\mathbf{P_k}^* = \frac{\xi^k \mathbf{P_k}}{\sum_{i=0}^{\infty} \xi^i \mathbf{P_i}} = \frac{(\xi \rho)^k e^{-\xi \rho}}{k!}, \qquad k > 0$$
(17)

where  $\xi$  is the bias parameter and  $\xi > 1$ . The parameter  $\xi$  should then be chosen in such a way that the sample variance is as small as possible.

# 4. NUMERICAL RESULTS

The CLR estimates under VIQ and VOQ schemes is shown in Fig. 3 as a function of the buffer size. For comparison, we also list the MC estimates which are obtained with the original switch model.



Figure 3 CLR of VIQ for  $\lambda = 0.8$ , N = 16, m = 2 and VOQ for  $\lambda = 0.5$ , N = 16, K = 3 and m = 2.

After obtaining the CLR in VIQ and VOQ schemes respectively, we can combine these estimates to yield the overall CLR of the ATM switches according to Eqn. (3). The simulation is run under the assumption that the total buffer size of the input and output queues is fixed at 32 so that different CLRs

are observed in different allocation approaches. In Fig. 4, the overall CLR is plotted as a function of the input queue size. Thus, it is clear that more buffers should be allocated to the output queue than to the input queue in order to achieve the lowest CLR.



Figure 4 CLR estimation for  $\lambda = 0.8$ , N = 16 and m = 2.

In the CDP estimation, we use IS based on "split switch" model, and then combine the CDP estimates from the VIQ and VOQ schemes according to (4). The results are compared with MC simulation, as shown in Fig. 5. We denote the delay threshold as t (measured in slot) and the delay threshold probability as  $\eta$ . A close observation of this figure indicates an excellent agreement between the two approaches. However, the IS scheme highly relieves the computation burden. Such computational saving is illustrated in Table 1.



Figure 5 CDP estimates for  $\lambda$ =0.8, N = 16, K = 8, L = 12, m=2.

| $\lambda$ | t  | η                     | CPU Time<br>(IS) | CPU Time<br>(MC) |
|-----------|----|-----------------------|------------------|------------------|
| 0.6       | 13 | $1.12 \times 10^{-7}$ | 6.9 minutes      | 6.31 days        |
| 0.7       | 15 | $3.14 \times 10^{-7}$ | 4.4 minutes      | 2.05 days        |
| 0.8       | 21 | $2.51 \times 10^{-7}$ | 5.8 minutes      | 2.51 days        |

Table 1 The computation gains with IS in CDP estimation (N = 16, K = 8, L = 12 and m = 2)

# 5. CONCLUSION

In this paper, we considered the application of IS to the estimation of the CLR and CDP of non-blocking ATM switches. We proposed the "split switch" model as an analytical tool in the performance evaluation of ATM switches with I/O queues. The IS estimates obtained using the proposed methodologies were shown to be in excellent agreement with MC simulations, which are indicative of exact system performance. In addition, it has been demonstrated that a considerable computation burden can be saved using our IS schemes which is an indication of the good potential that these IS techniques have in being used in conjunction with real-time admission control algorithms. Finally, we plan to extend these results in the future to include more realistic traffic models, and to investigate the potential of using IS techniques as a real-time method for estimating *Quality of Service* parameters in ATM networks and their integration with real-time admission control algorithms.

# References

- M. J. Karol, M. G. Hluchuj and S. P. Morgan, "Input and output queuing on a space-division packet switch," *IEEE Trans. on Commun.*, vol. 35, pp. 1347-1356, Dec. 1987.
- [2] M. J. Lee and David S. Ahn, "Cell loss analysis and design trade-offs of nonblocking ATM switches with nonuniform traffic," *IEEE/ACM Trans.* on Networking, vol. 3, No.2, pp. 199-209, Apr. 1995.
- [3] Q. L. Wang and V. S. Frost, "Efficient estimate of cell loss blocking probability for ATM systems," *IEEE/ACM Trans. on Networking*, vol. 1, No. 2, pp. 230-235, Apr. 1993.
- [4] P. Heidelberger, "Fast simulation of rare events in queuing and reliability models," ACM Trans. on Modeling and Computer Simulation, vol. 5, No.