Skip to main content
Log in

Trading Off Worst and Expected Cost in Decision Tree Problems

  • Published:
Algorithmica Aims and scope Submit manuscript

A Correction to this article was published on 20 March 2018

This article has been updated

Abstract

We characterize the best possible trade-off achievable when optimizing the construction of a decision tree with respect to both the worst and the expected cost. It is known that a decision tree achieving the minimum possible worst case cost can behave very poorly in expectation (even exponentially worse than the optimal), and the vice versa is also true. Led by applications where deciding which optimization criterion might not be easy, several authors recently have focussed on the bicriteria optimization of decision trees. Here we sharply define the limits of the best possible trade-offs between expected and worst case cost. More precisely, we show that for every \(\rho >0\) there is a decision tree D with worst testing cost at most \((1 + \rho )\textit{OPT}_W\) and expected testing cost at most \(\frac{1}{1 - e^{-\rho }} \textit{OPT}_E,\) where \(\textit{OPT}_W\) and \(\textit{OPT}_E\) denote the minimum worst testing cost and the minimum expected testing cost of a decision tree for the given instance. We also show that this is the best possible trade-off in the sense that there are infinitely many instances for which we cannot obtain a decision tree with both worst testing cost smaller than \((1 + \rho )\textit{OPT}_W\) and expected testing cost smaller than \(\frac{1}{1 - e^{-\rho }} \textit{OPT}_E\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Change history

  • 20 March 2018

    This erratum fixes a technical problem in the paper published in Algorithmica, Volume 79, Number 3, November 2017, pp. 886–908. Theorem 1 of this paper gives upper bounds on both worst testing cost and expected testing cost of the decision tree built by Algorithm 1.

Notes

  1. For the sake of readability here we use the notation \(Pr[\,]\) for the probability of objects.

  2. Theorem 4.3.8 of [4] requires weaker conditions on f, h and g but for our purposes it is enough to consider these stronger and more well known conditions on \(\mathbf {x}\).

References

  1. Adler, M., Heeringa, B.: Approximating optimal binary decision trees. Algorithmica 62(3), 1112–1121 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Aslam, J.A., Rasala, A., Stein, C., Young, N.E.: Improved bicriteria existence theorems for scheduling. In: SODA 1999, pp. 846–847 (1999)

  3. Alkhalid, A., Chikalov, I., Moshkov, M.: A tool for study of optimal decision trees. LNCS 6401, 353–360 (2010)

    Google Scholar 

  4. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 2nd edn. Wiley, New York (1993)

    MATH  Google Scholar 

  5. Bellala, G., Bhavnani, S.K., Scott, C.: Group-based active query selection for rapid diagnosis in time-critical situations. IEEE-IT 58(1), 459–478 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Buro, M.: On the maximum length of huffman codes. IPL 45(5), 219–223 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chakaravarthy, V.T., Pandit, V., Roy, S., Awasthi, P., Mohania, M.: Decision trees for entity identification: approximation algorithms and hardness results. ACM Trans. Algorithms 7(2), 15:1–15:22 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Cicalese, F., Jacobs, T., Laber, E., Molinaro, M.: On greedy algorithms for decision trees. In: Proceedings of ISAAC (2010)

  9. Cicalese, F., Laber, E., Saettler, A.: Diagnosis determination: decision trees optimizing simultaneously worst and expected testing cost. In: ICML2014, pp. 414–422 (2014)

  10. Garey, M.R.: Optimal binary identification procedures. SIAM J. Appl. Math. 23(2), 173–186 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  11. Garey, M.R.: Optimal binary search trees with restricted maximal depth. SIAM J. Comput. 3(2), 101–110 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  12. Golovin, D., Krause, A., Ray, D.: Near-optimal bayesian active learning with noisy observations. Adv. Neural Inf. Proc. Syst. 23, 766–774 (2010)

    Google Scholar 

  13. Guillory, A., Bilmes, J.: Average-case active learning with costs. In: ALT’09, pp. 141–155 (2009)

  14. Guillory, A., Bilmes, J.: Interactive submodular set cover. In: Proceedings of ICML10, pp. 415–422 (2010)

  15. Gupta, A., Nagarajan, V., Ravi, R.: Approximation algorithms for optimal decision trees and adaptive TSP problems. In: Proceedings of ICALP’10, pp. 690–701 (2010)

  16. Hussain, S.: Relationships among various parameters for decision tree optimization. Stud. Comput. Intell. 514, 393–410 (2014)

    Google Scholar 

  17. Kelle, P., Schneider, H., Yi, H.: Decision alternatives between expected cost minimization and worst case scenario in emergency supply second revision. Int. J. Prod. Econ. 157, 250–260 (2014)

    Article  Google Scholar 

  18. Kosaraju, S., Przytycka, T., Borgstrom, R.: On an optimal split tree problem. WADS 99, 157–168 (1999)

    MathSciNet  MATH  Google Scholar 

  19. Krause, A.: Optimizing sensing: theory and applications. Ph.D. thesis, Carnegie Mellon University (2008)

  20. Larmore, L.L.: Height restricted optimal binary trees. SICOMP 16(6), 1115–1123 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  21. Larmore, L.L., Hirschberg, D.S.: A fast algorithm for optimal length-limited huffman codes. J. ACM 37(3), 464–473 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  22. Milidi, R.L., Laber, E.S.: Bounding the inefficiency of length-restricted prefix codes. Algorithmica 31(4), 513–529 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  23. Moshkov, M.J.: Greedy algorithm with weights for decision tree construction. Fundamentae Informaticae 104(3), 285–292 (2010)

    MathSciNet  MATH  Google Scholar 

  24. Papadimitriou, C.H., Yannakakis, M.: On the approximability of trade-offs and optimal access of web sources. FOCS 2000, 86–92 (2000)

    MathSciNet  Google Scholar 

  25. Rasala, A., Stein, C., Torng, E., Uthaisombut, P.: Existence theorems, lower bounds and algorithms for scheduling to meet two objectives. In: SODA 2002, pp. 723–731 (2002)

  26. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aline Saettler.

Additional information

A correction to this article is available online at https://doi.org/10.1007/s00453-018-0423-8.

Appendix: Optimality of \(\mathbf {p}^*\)

Appendix: Optimality of \(\mathbf {p}^*\)

Here, we show that the point \(\mathbf {p}^*\) defined in (9)–(10) is an optimal solution of the NLP (5)–(8) defined in Sect. 2. For that, we use Theorem 4.3.8 of [4] that we restate here in a simplified form.

Theorem 4

Let X be an open subset of \(R^n\). Consider the optimization problem P.

$$\begin{aligned} \text{ Minimize }&f(\mathbf {x}) \end{aligned}$$
(26)
$$\begin{aligned} \text{ subject } \text{ to }&g_i(\mathbf {x}) \le 0&\quad \,\,\, \text{ for } i=1,\ldots ,m \end{aligned}$$
(27)
$$\begin{aligned}&h_i(\mathbf {x}) = 0&\quad \,\,\, \text{ for } i=1,\ldots ,l \end{aligned}$$
(28)
$$\begin{aligned}&\mathbf {x} \in X \end{aligned}$$
(29)

Let \(\mathbf {x}\) be a feasible solution and let \(I=\{i|g_i(\mathbf {x})=0\}\). Suppose there exists scalars \(u_i \ge 0\) for \(i=1,\ldots ,m\) and \(v_i\) for \(i=1,\ldots ,l\) such that

$$\begin{aligned} - \nabla f(\mathbf {x}) = \sum _{i \in I } u_i \nabla g_i(\mathbf {x}) + \sum _{i=1}^{l} v_i \nabla h_i(\mathbf {x}) \end{aligned}$$
(30)

If f is linear, \(g_i\) is convex on X for \(i \in I\) and \(h_i\) is linear, then \(\mathbf {x}\) is an optimal solution of P.Footnote 2

The non linear problem defined in (5)–(8) can be rewritten in terms of problem P. In fact, for a point \(\mathbf {p}=(p_1,\ldots ,p_{C+1},z)\) we have that \(f(\mathbf {p}) = - z \),

$$\begin{aligned} h_1(\mathbf {p})= & {} 1 - \sum _{i=1}^{C+1} p_i,\\ g_j(\mathbf {p})= & {} z \left( \sum _{i=1}^{C+1} i \cdot p_i \right) - \sum _{i=1}^j i \cdot p_i - (j+W) \left( \sum _{i=j+1}^{C+1} p_i \right) , \end{aligned}$$

for \(j=0,\ldots ,C\) and

$$\begin{aligned} g_j(\mathbf {p}) = -p_{j-C} \end{aligned}$$

for \(j=C+1,\ldots ,2C+1\).

We define an open set X that contains all feasible solutions of the problem defined in (5)–(8). The motivation is to meet the conditions of Theorem 3.3.7 of [4] that will be used to establish the convexity of \(g_i\)

Let

$$\begin{aligned} X= & {} \left\{ (p_1,p_2,\ldots ,p_{C+1}, z) \mid \sum _{i=1}^{C+1} p_i> 1-\frac{1}{2(C+1)} \text{ and } \right. \\&\left. p_i> -\frac{1}{2(C+1)^2} \text{ and } z > 0 \right\} . \end{aligned}$$

We have to prove that: (a) \(\mathbf {p}^*\) is feasible and \(I=\{0,\ldots ,C\}\); (b) f, \(h_1\) and \(g_i\) satisfy the conditions of Theorem 4 and (c) there are multipliers satisfying condition (30).

1.1 Feasibility of \(\mathbf {p}^*\)

We have that \(\mathbf {p}^*\) is feasible because

$$\begin{aligned} g_j(\mathbf {p}^*)= & {} z^* \left( \sum _{i=1}^{C+1} i \cdot p^*_i \right) - \sum _{i=1}^j i \cdot p^*_i - (j+W) \left( \sum _{i=j+1}^{C+1} p^*_i \right) \\= & {} z^* \left( W-(C+W)\frac{(W-1)^C}{W^C} + (C+1)\frac{(W-1)^C}{W^C} \right) \\&~~~~~~~~~~ - \left( W - (j+W)\frac{(W-1)^j}{W^j} \right) -(j+W)\frac{(W-1)^j}{W^j} \\= & {} z^* \left( W - \frac{(W-1)^{C+1}}{W^{C+1}} \right) -W = 0 \end{aligned}$$

where the last expression holds because

$$\begin{aligned} z^* =\frac{W}{W-\frac{(W-1)^{C+1}}{W^{C+1}} } \end{aligned}$$

We have that \(I=\{0,\ldots ,C\}\) because

$$\begin{aligned} g_j= -p_{j-C}= -\frac{(W-1)^{j-C-1}}{(W)^{j-C}} <0, \end{aligned}$$

for \(j>C\).

1.2 Convexity of \(g_j\)

Because f and \(h_1\) are linear and \(I=\{0,\ldots ,C\}\), we only need to prove that \(g_i\), for \(i=0,\ldots ,C\), is convex in the open set X.

Since \(g_j\), for \(j=0,\ldots ,C\), is twice differentiable it follows from Theorem 3.3.7 of [4] that it is enough to show that the Hessian of \(g_j\), for \(j=0,\ldots ,C\), is semi-positive defined in X. In fact, the Hessian of \(g_j\) is a matrix where all elements, except the first \(C+1\) in the last line and the first \(C+1\) in the last column, are zeros. The matrix has the structure presented below.

$$\begin{aligned} H = \left| \begin{array}{cccccccc} 0 &{} 0 &{} \cdots &{} \cdot &{} \cdot &{} 0 &{} 1 \\ 0 &{} 0 &{} \cdot &{} \cdot &{} \cdot &{} 0 &{} 2 \\ \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot \\ \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot \\ \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot \\ 0 &{} 0 &{} \cdot &{} \cdot &{} \cdot &{} 0 &{} C+1 \\ 1 &{} 2 &{} \cdot &{} \cdot &{} \cdot &{} C+1 &{} 0 \\ \end{array} \right| \cdot \end{aligned}$$

We have that

$$\begin{aligned} \mathbf {pHp} = 2 z ( p_1 + 2p_2 + \cdots + (C+1)p_{C+1} ) \ge 0 \end{aligned}$$

for all \(\mathbf {p}=(p_1,\ldots ,p_{C+1},z) \in X\) because \(z>0\) and

$$\begin{aligned} \sum _{ i} i \cdot p_i \ge \sum _{ i } p_i - (C+1) \frac{1}{2(C+1)^2}> 1- \frac{1}{2(C+1)}- \frac{1}{2(C+1)}>0 \end{aligned}$$

1.3 KKT Conditions

Let \(\lambda _0,\ldots ,\lambda _C\) be the dual variables associated with the constraints \(g_j\), \(j=0,\ldots ,C\). Let \(\lambda _E\) be the dual variable associated with the constraint \(\sum _{ i=1}^{C+1} p_i = 1\).

The multipliers \((\lambda _0,\ldots ,\lambda _C,\lambda _E)\) must satisfy

$$\begin{aligned} - \nabla f(\mathbf {p}^*)= & {} \sum _{i=0}^{C} \lambda _i \nabla g_i(\mathbf {p}^*) + \lambda _E \nabla h_1(\mathbf {p}^*),\\&\lambda _i \ge 0 \,\,\, \text{ for } i=0,\ldots ,C \end{aligned}$$

.

Thus, we must have

$$\begin{aligned} \sum _{j=0}^{i-1} \lambda _j (j+W) + \sum _{j=i}^{C} \lambda _j i + \lambda _{E} = z \sum _{j=0}^{C} \lambda _j i \end{aligned}$$
(31)

for \(i=1,\ldots ,C+1\), and also

$$\begin{aligned} \sum _{j=0}^{C} \lambda _j \left( \sum _{i=1}^{C+1} i \cdot p^*_{i} \right) =1 \end{aligned}$$
(32)

Let

$$\begin{aligned} E= \sum _{i=1}^{C+1} i \cdot p^*_{i} \end{aligned}$$

By subtracting the Eq. (31) when \(i=C\) from the same equation, when \(i=C+1\), using the Eq. (32) and using the fact that \(E \cdot z^* = W\), we get that \(\lambda _C=\frac{1}{E^2}\). In addition, we can prove by induction that \(\lambda _{j-1}=\lambda _j \frac{W-1}{W}\). For that we subtract the Eq. (31) when \(i=j-1\) from the same equation, when \(i=j\), and use the induction hypothesis. Thus, we get that

$$\begin{aligned} \lambda _i =\frac{(W-1)^{C-i}}{W^{C-i} E^2}, \end{aligned}$$

for every i so that all multipliers are positives.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saettler, A., Laber, E. & Cicalese, F. Trading Off Worst and Expected Cost in Decision Tree Problems. Algorithmica 79, 886–908 (2017). https://doi.org/10.1007/s00453-016-0211-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-016-0211-2

Keywords

Navigation