Skip to main content
Log in

Ordinal Optimization and Quantification of Heuristic Designs

  • Published:
Discrete Event Dynamic Systems Aims and scope Submit manuscript

Abstract

This paper focuses on the performance evaluation of complex man-made systems, such as assembly lines, electric power grid, traffic systems, and various paper processing bureaucracies, etc. For such problems, applying the traditional optimization tool of mathematical programming and gradient descent procedures of continuous variables optimization are often inappropriate or infeasible, as the design variables are usually discrete and the accurate evaluation of the system performance via a simulation model can take too much calculation. General search type and heuristic methods are the only two methods to tackle the problems. However, the “goodness” of heuristic methods is generally difficult to quantify while search methods often involve extensive evaluation of systems at many design choices in a large search space using a simulation model resulting in an infeasible computation burden. The purpose of this paper is to address these difficulties simultaneously by extending the recently developed methodology of Ordinal Optimization (OO). Uniform samples are taken out from the whole search space and evaluated with a crude but computationally easy model when applying OO. And, we argue, after ordering via the crude performance estimates, that the lined-up uniform samples can be seen as an approximate ruler. By comparing the heuristic design with such a ruler, we can quantify the heuristic design, just as we measure the length of an object with a ruler. In a previous paper we showed how to quantify a heuristic design for a special case but we did not have the OO ruler idea at that time. In this paper we propose the OO ruler idea and extend the quantifying method to the general case and the multiple independent results case. Experimental results of applying the ruler are also given to illustrate the utility of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. For example we use a short simulation or a simulation with only a few replications.

  2. In this paper we interpret n% as a number between 0 and 1, i.e., the “n” in n% is a number between 0 and 100.

  3. The I.I.D independent noise assumption is an element assumption for OO theory. This assumption is reasonable because roughly all the designs are subject to the same stochastic environments. With this assumption, a truly good design has a better chance to be observed good, and then by comparing the observed performances we are able to select out good enough designs. Also, for quantifying heuristics, this assumption is the basis for building theoretical conclusions on comparison between a heuristic design and uniform samples. But practically, OO can still work well even when the noises are not I.I.D. (Ho 1999).

  4. For the no noise case, if |N| = 1,000, the samples divide the search space into 1,001 quantiles on the average. If a heuristic design is observed better than all the |N| samples, then on the average, we should judge it to be within top 1/1,001 which is about top 0.1% of the search space. But by this formula, we judge it be to within top 11.39%, which is of course worse than the no noise case.

  5. In this paper, N and N i can refer to both the set and number of designs in the set, when causing no ambiguity.

  6. For convenience, let us assume that we always finish sampling such that the last sample of N is the (t − 1)-th sample that is observed no worse than the heuristic design. In this way there will be no unfinished segment in our discussion.

  7. It is not 0.05 here. The reason is that we need to loosen the bound of the Type II error probability in order to make the ruler work. The details are given in the proof in Section 3.3.

  8. Sequential Probability Ratio Test (SPRT) can be used to control the two types of errors at the same time. This can be a future topic.

Abbreviations

SYMBOL:

MEANING

\(\mathit{\Theta}\) :

The whole search space

θ :

An element of the whole search space

θ H :

A heuristic design

J(·):

The true performance of a design

\(\hat{J}(\cdot)\) :

Observed performance of a design

N i (i = 1,2,...,u):

A segment (set) of uniform samples, also the length of the segment when there is no ambiguity

N [r](r = 1,2,...,u):

r-th order statistics of the segments, i.e., r-th smallest

β 0 :

Bounding level for Type II error probability

n%, n 0%:

Denote the ordinal position of a design in \(\mathit{\Theta}\), n% and n 0% are within [0, 1]

N 0 :

A number defined as N 0: = 113.9/n 0%, called “standard length” in this paper

References

  • Armold DV (2002) Noisy optimization with evolution strategies. Springer, New York

    Google Scholar 

  • Balakrishnan N, Rao CR (1998) Handbook of statistics 17, order statistics: applications, edited. Elsevier, Amsterdam

    Google Scholar 

  • Cassandras CG, Lafortune S (1999) Introduction to discrete event systems. Kluwer Academic, Dordrecht

    MATH  Google Scholar 

  • Deng M, Ho YC (1999) An ordinal optimization approach to optimal control problems. Automatica 35:331–338

    Article  MATH  MathSciNet  Google Scholar 

  • Ho YC (1989) Introduction to special issue on dynamics of dynamics of discrete event systems. Proc IEEE 77(1):3–6

    Article  Google Scholar 

  • Ho YC (1999) An explanation of ordinal optimization: soft computing for hard problems. Inf Sci 113:169–192

    Article  MATH  Google Scholar 

  • Ho YC (2005) On centralized optimal control. IEEE Trans Automat Contr 50(4):537–538

    Article  Google Scholar 

  • Ho YC, Sreenivas R, Vakili P (1992) Ordinal optimization of discrete event dynamic systems. J DEDS 2(2):61–88

    MATH  Google Scholar 

  • Ho YC, Zhao QC, Jia QS (2007) Ordinal optimization: soft computing for hard problems. Springer, New York

    MATH  Google Scholar 

  • Hopfield JJ, Tank DW (1985) “Neural” computation of decisions in optimization problems. Biol Cybern 52:141–152

    MATH  MathSciNet  Google Scholar 

  • Knapp AW (2005) Basic real analysis. Birkhäuser, Boston, p. 271

    Google Scholar 

  • Pinedo M (2002) Scheduling theory, algorithms, and systems (2nd ed). Prentice-Hall, New York

    MATH  Google Scholar 

  • Shen Z, Bai H-X, Zhao YJ (2005) Ordinal optimization references list, updated May 2007. http://cfins.au.tsinghua.edu.cn/en/resource/index.php

  • Shen Z, Zhao QC, Jia QS (2009) Quantifying heuristics in the ordinal optimization framework. J DEDS (under review)

  • Wilson GV, Pawley GS (1988) On the stability of the traveling salesman problem algorithm of Hopfield and Tank. Biol Cybern 58:63–70

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank Prof. Christos G. Cassandras of Boston University for insightful comments on the research in this paper, and thank him for helpful discussions in applying the methods in this paper to quantify the heuristic design for the routing problem of the two queues in Section 4.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhen Shen.

Additional information

This work was supported by NSFC Grant (60574067, 60704008, 60721003 and 60736027), the NCET (No. NCET-04-0094) program of China, the 111 International Collaboration Project of China and the high-level graduate student scholarship 2007 of China Scholarship Council.

Appendices

Appendix 1: Proof of the standard length idea for the noisy case

We assume that a heuristic ranks at top \(n_{0}{\mbox\%}\) of the search space \(\mathit{\Theta} \), and then we define the standard length N 0 according to the formula in Theorem 1 as \(N_0:=\frac{1}{n_0{\mbox\%}}\min_{0<c<1}\frac{1}{c\beta_0}\ln\left(\frac{1-c\beta_0}{(1-c)\beta_0}\right)\) or \(N_{0}:= 113.9/n_{0}{\mbox\%}\). When making the judgment by Theorem 1, we evaluate θ H first and then we take out and evaluate uniform samples one by one until we obtain a sample observed no worse than θ H. The uniform samples observed worse than θ H constitute the set N 1. This N 1 can be the length of the first segment in the general case, and can also be the length of the segment of an independent experiment in the multiple independent results case.

This proof is long. We divide it into several subsections.

1.1 A.1 The expression for P(N 1 ≥ N 0)

N 1 is a random variable. We number the uniform samples as θ 1,θ 2,..., according to the sequence of generating them. We have

$$ \label{A1} P(N_1=m)=P(\hat{J}(\theta_\text H)<\hat{J}(\theta_1),\dots,\hat{J}(\theta_\text H)<\hat{J}(\theta_m),\hat{J}(\theta_\text H)\ge\hat{J}(\theta_{m+1})) $$
(27)

We denote the p.d.f of the noise as f w (x) and the c.d.f of the noise as F w (x), then

$$\label{eq21} \begin{array}{rcl} P(N_1=m)&=& \int_{-\infty}^{+\infty} P(J(\theta_\text H)+x<\hat{J}(\theta_1),\dots,J(\theta_\text H)+x<\hat{J}(\theta_m), J(\theta_\text H)\\&&+ x\ge\hat{J}(\theta_{m+1})|W_\text H=x)f_W(x)dx \end{array} $$
(28)

So,

$$\label{A3} \begin{array}{rcl} P(N_1\ge N_0)&=&\sum\limits_{m=N_0}^\infty P(N_1=m)\\ &=&\sum\limits_{m=N_0}^\infty\int_{-\infty}^{+\infty} P(J(\theta_\text H)+x<\hat{J}(\theta_1),\dots,J(\theta_\text H)+x<\hat{J}(\theta_m), J(\theta_\text H)\\&&+ x\ge\hat{J}(\theta_{m+1})| W_\text H=x)f_W(x)dx \end{array} $$
(29)

Next, we will show the summation and the integration are exchangeable. This owns to Fubini’s Theorem (Knapp 2005), which can be stated as follows: if A and B are σ-finite measure spaces, not necessarily complete, and if either \( \int_A(\int_B|f(x,y)|dy)dx<\infty \) or \(\int_B(\int_A|f(x,y)|dx)dy<\infty \), then \(\int_{A\times B}|f(x,y)|dxdy<\infty\) and \(\int_A(\int_B f(x,y)dy)dx=\int_B(\int_A f(x,y)dx)dy=\int_{A\times B} f(x,y)dxdy\). In mathematics, a positive (or signed) measure μ defined on a σ-algebra Σ of subsets of a set X is called finite if μ(X) is a finite real number (rather than ∞ ). The measure μ is called σ-finite if X is the countable union of measurable sets of finite measure. A set in a measure space is said to have σ-finite measure if it is a countable union of sets with finite measure. The Lebesgue measure on the real numbers is not finite, but it is σ-finite. Consider the closed intervals [k,k + 1] for all integers k; there are countably many such intervals, each has measure 1, and their union is the entire real line. Also, the Lebesque measure on the set {N 0, N 0 + 1, N 0 + 2,...} is σ-finite. P(N 1 ≥ N 0) is a well-defined probability. It cannot be larger than 1. So we have

$$\label{A4} \begin{array}{rcl} P(N_1\ge N_0) &=&\int_{-\infty}^{+\infty} \sum\limits_{m=N_0}^\infty P(J(\theta_\text H)+x<\hat{J}(\theta_1),\dots,J(\theta_\text H)+x<\hat{J}(\theta_m), J(\theta_\text H)\\&& +x\ge\hat{J}(\theta_{m+1})| W_\text H=x)f_W(x)dx \\&=&\int_{-\infty}^{+\infty} \sum\limits_{m=N_0}^\infty P(J(\theta_\text H)+x<\hat{J}(\theta_1)|W_\text H=x)^m (1-P(J(\theta_\text H)\\&&+x<\hat{J}(\theta_{1})|W_\text H=x)f_W(x)dx \end{array} $$
(30)

We have

$$\label{A5} \begin{array}{rcl} &&{\kern-6pt} \sum\limits_{m=N_0}^\infty P(J(\theta_\text H)+x<\hat{J}(\theta_1)|W_\text H=x)^m (1-P(J(\theta_\text H)+x<\hat{J}(\theta_{1})|W_\text H=x)\\ &&{\kern4pt} =(1-P(J(\theta_\text H)+x<\hat{J}(\theta_{1})|W_\text H=x)\sum\limits_{m=N_0}^\infty P(J(\theta_\text H)+x<\hat{J}(\theta_1)|W_\text H=x)^m\\ &&{\kern4pt} =P(J(\theta_\text H)+x<\hat{J}(\theta_1)|W_\text H=x)^{N_0} \end{array} $$
(31)

So,

$$ P(N_1\ge N_0) =\int_{-\infty}^{+\infty} P(J(\theta_\text H)+x<\hat{J}(\theta_1)|W_\text H=x)^{N_0} f_W(x)dx \label{A6} $$
(32)

1.2 A.2 An upper bound for P(N 1 ≥ N 0)

In this subsection we are going to prove:

$$ \begin{array}{rcl} P(N_1\ge N_0)&\leq& \int_{-\infty}^{+\infty}(1-n_0{\mbox\%}F_W(x))^{N_0}f_W(x)dx\\&\leq& \min\limits_{p\in[0,1]}p+(1-n_0{\mbox\%}\times p)^{N_0}\times (1-p). \end{array} $$

where F w (x) is the c.d.f of the I.I.D noise. In fact, the content in this subsection has appeared in the proof of Theorem 1 in Shen et al. (2009), but for readers’ convenience, we repeat it here. For the expression inner the integration of Eq. 32, we have

$$ P(J(\theta_\text H)+x<\hat{J}(\theta_1))=P(J(\theta_\text H)+x<J(\theta_1)+W_1) \label{A7} $$
(33)

where W 1 stands for the noise when evaluating θ 1. By the total probability formula, we have

$$\label{A8} \begin{array}{rcl} P(J(\theta_\text H)+x<J(\theta_1)+W_1) &=&P(J(\theta_\text H)+x<J(\theta_1)+W_1|W_1<x)P(W_1<x)\\ &&+P(J(\theta_\text H)+x<J(\theta_1)+W_1|W_1\ge x)P(W_1\ge x)\\ \end{array} $$
(34)

In Eq. 34, we have

$$\label{A9} \begin{array}{rcl} P(J(\theta_\text H)+x<J(\theta_1)+W_1|W_1<x)&=&P(J(\theta_\text H)-J(\theta_1)<W_1-x|W_1<x) \\ &\le& P(J(\theta_\text H)-J(\theta_1)<0|W_1<x)\\ &=&P(J(\theta_\text H)-J(\theta_1)<0) \end{array} $$
(35)

The last “=” in Eq. 35 holds because the noise is independent from the true performance of a uniformly sampled design. For Eq. 35, we have

$$ \label{A10}P(J(\theta_\text H)-J(\theta_1)<0)\leq 1-n_0{\mbox\%} $$
(36)

It is “\(\leq 1-n_0{\mbox\%}\)” not “\(=1-n_0{\mbox\%}\)” because there may be designs having the same true performance with the heuristic design θ H. When we order the designs in the search space \(\mathit{\Theta}\), ties are broken arbitrarily and the tying designs are given different but sequential ranks. There can be some designs having the same performance with θ H but with worse ranks than θ H. From Eqs. 35 and 36, we know Eq. 34 can be changed to

$$\label{A11} \begin{array}{rcl} P(J(\theta_\text H)+x<J(\theta_1)+W_1) &\le& (1-n_0{\mbox\%})P(W_1<x)\\&&+P(J(\theta_\text H)+x<J(\theta_1)+W_1|W_1\ge x)P(W_1\ge x)\\ \end{array} $$
(37)

Since conditional probability cannot be larger than 1, we have

$$ \label{A12} P(J(\theta_\text H)+x<J(\theta_1)+W_1|W_1\ge x)\leq 1 $$
(38)

By Eqs. 37 and 38 , we have

$$\label{A13} \begin{array}{rcl} P(J(\theta_\text H)+x<J(\theta_1)+W_1)&\le& (1-n_0{\mbox\%})P(W_1<x)+P(W_1\ge x)\\ &=&(1-n_0{\mbox\%})F_W(x)+1-F_W(x)=1-n_0{\mbox\%}\times F_W(x)\\ \end{array} $$
(39)

From Eqs. 32, 33 and 39, we have

$$\label{A14} \begin{array}{rcl} P(N_1\kern-1pt\ge\kern-1pt N_0) &\kern-3.5pt=\kern-3.5pt &\int_{-\infty}^{+\infty}\kern-2pt P(J(\theta_\text H) \kern-1pt +\kern-1pt x\kern-1.5pt <\kern-1.5pt J(\theta_1)\kern-1pt +\kern-1pt W_1|W_\text H \kern-1pt =\kern-1pt x)^{N_0}\kern-1pt f_W(x)dx \kern5pt({\rm due \, to \, Eqs.~32, 33}) \\ &\le& \int_{-\infty}^{+\infty} (1-n_0{\mbox\%}\times F_W(x))^{N_0}f_W(x)dx \kern5pt ({\rm due \, to \, Eq.~39}) \end{array} $$
(40)

Next, we shall prove that the right hand side (RHS) of Eq. 40 is no larger than \(\min_{p\in[0,1]}p+(1-n_0{\mbox\%}\times p)^{N_0}\times (1-p)\). We rewrite the RHS of Eq. 40 as follows, i.e., the integration is divided into two parts, separating by x 0, which can be any real number,

$$\label{A15} \begin{array}{rcl} \int_{-\infty}^{+\infty} (1-n_0{\mbox\%}\times F_W(x))^{N_0}f_W(x)dx &=&\int_{-\infty}^{x_0} (1-n_0{\mbox\%}\times F_W(x))^{N_0}f_W(x)dx \\&&+\int_{x_0}^{+\infty} (1-n_0{\mbox\%}\times F_W(x))^{N_0}f_W(x)dx \\ \end{array} $$
(41)

For the first item in the RHS of Eq. 41, we have

$$ \label{A16} F_W(x)\ge 0, \quad x\in(-\infty, x_0) $$
(42)

For the second item in the RHS of Eq. 41, we have

$$ \label{A17} F_W(x)\ge F_W(x_0), \quad x\in[x_0,+\infty) $$
(43)

From Eqs. 42 and 43, Eq. 41 can be changed to

$$\label{A18} \begin{array}{rcl} &&{\kern-4pt} \int_{-\infty}^{x_0} (1-n_0{\mbox\%}\times F_W(x))^{N_0}f_W(x)dx+\int_{x_0}^{+\infty} (1-n_0{\mbox\%}\times F_W(x))^{N_0}f_W(x)dx \\ &&{\kern6pt} \le \int_{-\infty}^{x_0} (1-n_0{\mbox\%}\times 0)^{N_0}f_W(x)dx+\int_{x_0}^{+\infty} (1-n_0{\mbox\%}\times F_W(x_0))^{N_0}f_W(x)dx \\ &&{\kern6pt} =\int_{-\infty}^{x_0} f_W(x)dx+(1-n_0{\mbox\%}\times F_W(x_0))^{N_0}\times \int_{x_0}^{+\infty} f_W(x)dx \\ &&{\kern6pt} =F_W(x_0)+(1-n_0{\mbox\%}\times F_W(x_0))^{N_0}\times (1-F_W(x_0)) \end{array} $$
(44)

From Eqs. 40, 41 and 44, finally,

$$\label{A19} \begin{array}{rcl} P(N_1\ge N_0) \le F_W(x_0)+(1-n_0{\mbox\%}\times F_W(x_0))^{N_0}\times (1-F_W(x_0)), \quad \forall x_0\in(-\infty,+\infty)\\ \end{array} $$
(45)

Since F W (x 0) ∈ [0,1],we have from (A19) that

$$ \label{A20} P(N_1\ge N_0) \le p+(1-n_0 \mbox{\%} \times p)^{N_0}\times (1-p)), \quad \forall p\in [0,1] $$
(46)

Thus, we have

$$\label{A21} \begin{array}{rcl} P(N_1\ge N_0)&\leq& \int_{-\infty}^{+\infty}(1-n_0{\mbox\%}F_W(x))^{N_0}f_W(x)dx\\&\leq& \min_{p\in[0,1]}p+(1-n_0{\mbox\%}\times p)^{N_0}\times (1-p). \end{array} $$
(47)

Now our problem is to optimize over p in Eq. 47. If we can find the minimum of the expression shown in Eq. 47, we will find an upper bound of the probability of making Type II error.

1.3 A.3 Finish the proof by using the upper bound

We will prove that P(N 1 ≥ N 0) ≤ β 0 = 0.05 where N 0 is defined by n 0 as \(N_{0}= 113.9/n_{0}{\mbox\%}\). Before we go further, we give a parameter c 0 defined as follows.

$$ \label{A22} c_0=\arg \min_{0<c<1} \frac{-\ln\left(\frac{(1-c)\beta_0}{1-c\beta_0}\right)}{c\beta_0}, \quad \beta_0=0.05. $$
(48)

As can be easily checked, there is a c makes Eq. 48 reach the minimum, and c 0 is very near to 0.826. When c 0 = 0.826 , we have

$$ \label{A23} \frac{-\ln\left(\frac{(1-c)\beta_0}{1-c\beta_0}\right)}{c\beta_0}=113.9, \quad \beta_0=0.05,\quad c=c_0=0.826. $$
(49)

This implies

$$ \label{e1} n_0{\mbox\%}\times N_0 \geq \frac{-\ln\left(\frac{(1-c_0)\beta_0}{1-c_0\beta_0}\right)}{c_0\beta_0}, $$
(50)

We define p 0 as follows.

$$ \label{A23} p_0=c_0\beta_0,\quad \beta_0=0.05. $$
(51)

From the definition, we know p 0 is within [0,1]. From Eq. 47, we have

$$\label{A24} \begin{array}{rcl} P(N_1\ge N_0)&\leq& \min\limits_{p\in[0,1]}p+(1-n_0{\mbox\%}\times p)^{N_0}\times (1-p) \\ &\le& p_0+(1-n_0{\mbox\%}\times p_0)^{N_0}\times (1-p_0) \end{array} $$
(52)

With p 0, Eq. 50 can be written as

$$ \label{e2} n_0{\mbox\%}\times N_0 \geq \frac{-\ln\left(\frac{\beta_0-p_0}{1-p_0}\right)}{p_0}, $$
(53)

As can be easily proved, \(1-\exp(x) \le -x\), for any real value x. So we know

$$ \label{A26} 1-\exp\left(\frac{\ln\left(\frac{\beta_0-p_0}{1-p_0}\right)}{N_0}\right) \le -\frac{\ln\left(\frac{\beta_0-p_0}{1-p_0}\right)}{N_0} $$
(54)

It then follows from Eq. 53 that

$$ \label{e3} 1-\exp\left(\frac{\ln\left(\frac{\beta_0-p_0}{1-p_0}\right)}{N_0}\right) \le n_0{\mbox\%}\times p_0 $$
(55)

which implies

$$ \label{e4} p_0+(1-n_0{\mbox\%}\times p_0)^{N_0}\times (1-p_0)\le \beta_0 $$
(56)

As a result, from Eq. 56 we have

$$ \label{A24} P(N_1\ge N_0)\leq \min_{p\in[0,1]}p+(1-n_0{\mbox\%}\times p)^{N_0}\times (1-p) \le \beta_0 $$
(57)

Appendix 2: Explanation of Lemma 1

Lemma 1

\(\sum_{i=0}^{k}\big({{u}\atop{i}}\big) \beta^i(1-\beta)^{u-i}\) is a non-increasing function of β , given u , k . β ∈ (0, 1), k < u, k, u are integers.

We give an explanation here:

The cumulative distribution function (c.d.f) of the binomial distribution has the following format.

$$ F(k;u,\beta)=P(X\le k)=\sum_{i=0}^{k}\left( \begin{array}{c} u \\ i \\ \end{array} \right) \beta^i(1-\beta)^{u-i} =I_{1-\beta}(u-k,k+1) $$

where I x (a,b) is the regularized incomplete beta function,

$$ I_{x}(a,b)=\frac{B(x;a,b)}{B(a,b)}=\frac{1}{B(a,b)}\int_{0}^{x}t^{a-1}(1-t)^{b-1}dt $$

and B(x;a,b) is the beta function. Thus,

$$ F(k;u,\beta)=P(X\le k)=\frac{1}{B(u-k,k+1)}\int_{0}^{1-\beta}t^{u-k-1}(1-t)^{k}dt $$

As can be easily seen, it is a monotone decreasing function of β.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, Z., Ho, YC. & Zhao, QC. Ordinal Optimization and Quantification of Heuristic Designs. Discrete Event Dyn Syst 19, 317–345 (2009). https://doi.org/10.1007/s10626-009-0067-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10626-009-0067-6

Keywords

Navigation