Trading Off Worst and Expected Cost in Decision Tree Problems

Saettler, Aline; Laber, Eduardo; Cicalese, Ferdinando

doi:10.1007/s00453-016-0211-2

Trading Off Worst and Expected Cost in Decision Tree Problems

Published: 15 September 2016

Volume 79, pages 886–908, (2017)
Cite this article

Algorithmica Aims and scope Submit manuscript

333 Accesses
3 Citations
Explore all metrics

A Correction to this article was published on 20 March 2018

This article has been updated

Abstract

We characterize the best possible trade-off achievable when optimizing the construction of a decision tree with respect to both the worst and the expected cost. It is known that a decision tree achieving the minimum possible worst case cost can behave very poorly in expectation (even exponentially worse than the optimal), and the vice versa is also true. Led by applications where deciding which optimization criterion might not be easy, several authors recently have focussed on the bicriteria optimization of decision trees. Here we sharply define the limits of the best possible trade-offs between expected and worst case cost. More precisely, we show that for every $\rho >0$ there is a decision tree D with worst testing cost at most $(1 + \rho )\textit{OPT}_W$ and expected testing cost at most $\frac{1}{1 - e^{-\rho }} \textit{OPT}_E,$ where $\textit{OPT}_W$ and $\textit{OPT}_E$ denote the minimum worst testing cost and the minimum expected testing cost of a decision tree for the given instance. We also show that this is the best possible trade-off in the sense that there are infinitely many instances for which we cannot obtain a decision tree with both worst testing cost smaller than $(1 + \rho )\textit{OPT}_W$ and expected testing cost smaller than $\frac{1}{1 - e^{-\rho }} \textit{OPT}_E$.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal classification trees

Article 03 April 2017

Stackelberg risk preference design

Article 02 April 2024

Data-driven robust optimization

Article 25 February 2017

Change history

20 March 2018
This erratum fixes a technical problem in the paper published in Algorithmica, Volume 79, Number 3, November 2017, pp. 886–908. Theorem 1 of this paper gives upper bounds on both worst testing cost and expected testing cost of the decision tree built by Algorithm 1.

Notes

For the sake of readability here we use the notation $Pr[\,]$ for the probability of objects.
Theorem 4.3.8 of [4] requires weaker conditions on f, h and g but for our purposes it is enough to consider these stronger and more well known conditions on $\mathbf {x}$.

References

Adler, M., Heeringa, B.: Approximating optimal binary decision trees. Algorithmica 62(3), 1112–1121 (2012)
Article MathSciNet MATH Google Scholar
Aslam, J.A., Rasala, A., Stein, C., Young, N.E.: Improved bicriteria existence theorems for scheduling. In: SODA 1999, pp. 846–847 (1999)
Alkhalid, A., Chikalov, I., Moshkov, M.: A tool for study of optimal decision trees. LNCS 6401, 353–360 (2010)
Google Scholar
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 2nd edn. Wiley, New York (1993)
MATH Google Scholar
Bellala, G., Bhavnani, S.K., Scott, C.: Group-based active query selection for rapid diagnosis in time-critical situations. IEEE-IT 58(1), 459–478 (2012)
Article MathSciNet MATH Google Scholar
Buro, M.: On the maximum length of huffman codes. IPL 45(5), 219–223 (1993)
Article MathSciNet MATH Google Scholar
Chakaravarthy, V.T., Pandit, V., Roy, S., Awasthi, P., Mohania, M.: Decision trees for entity identification: approximation algorithms and hardness results. ACM Trans. Algorithms 7(2), 15:1–15:22 (2011)
Article MathSciNet MATH Google Scholar
Cicalese, F., Jacobs, T., Laber, E., Molinaro, M.: On greedy algorithms for decision trees. In: Proceedings of ISAAC (2010)
Cicalese, F., Laber, E., Saettler, A.: Diagnosis determination: decision trees optimizing simultaneously worst and expected testing cost. In: ICML2014, pp. 414–422 (2014)
Garey, M.R.: Optimal binary identification procedures. SIAM J. Appl. Math. 23(2), 173–186 (1972)
Article MathSciNet MATH Google Scholar
Garey, M.R.: Optimal binary search trees with restricted maximal depth. SIAM J. Comput. 3(2), 101–110 (1974)
Article MathSciNet MATH Google Scholar
Golovin, D., Krause, A., Ray, D.: Near-optimal bayesian active learning with noisy observations. Adv. Neural Inf. Proc. Syst. 23, 766–774 (2010)
Google Scholar
Guillory, A., Bilmes, J.: Average-case active learning with costs. In: ALT’09, pp. 141–155 (2009)
Guillory, A., Bilmes, J.: Interactive submodular set cover. In: Proceedings of ICML10, pp. 415–422 (2010)
Gupta, A., Nagarajan, V., Ravi, R.: Approximation algorithms for optimal decision trees and adaptive TSP problems. In: Proceedings of ICALP’10, pp. 690–701 (2010)
Hussain, S.: Relationships among various parameters for decision tree optimization. Stud. Comput. Intell. 514, 393–410 (2014)
Google Scholar
Kelle, P., Schneider, H., Yi, H.: Decision alternatives between expected cost minimization and worst case scenario in emergency supply second revision. Int. J. Prod. Econ. 157, 250–260 (2014)
Article Google Scholar
Kosaraju, S., Przytycka, T., Borgstrom, R.: On an optimal split tree problem. WADS 99, 157–168 (1999)
MathSciNet MATH Google Scholar
Krause, A.: Optimizing sensing: theory and applications. Ph.D. thesis, Carnegie Mellon University (2008)
Larmore, L.L.: Height restricted optimal binary trees. SICOMP 16(6), 1115–1123 (1987)
Article MathSciNet MATH Google Scholar
Larmore, L.L., Hirschberg, D.S.: A fast algorithm for optimal length-limited huffman codes. J. ACM 37(3), 464–473 (1990)
Article MathSciNet MATH Google Scholar
Milidi, R.L., Laber, E.S.: Bounding the inefficiency of length-restricted prefix codes. Algorithmica 31(4), 513–529 (2001)
Article MathSciNet MATH Google Scholar
Moshkov, M.J.: Greedy algorithm with weights for decision tree construction. Fundamentae Informaticae 104(3), 285–292 (2010)
MathSciNet MATH Google Scholar
Papadimitriou, C.H., Yannakakis, M.: On the approximability of trade-offs and optimal access of web sources. FOCS 2000, 86–92 (2000)
MathSciNet Google Scholar
Rasala, A., Stein, C., Torng, E., Uthaisombut, P.: Existence theorems, lower bounds and algorithms for scheduling to meet two objectives. In: SODA 2002, pp. 723–731 (2002)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Informática, PUC-Rio, Rio de Janeiro, Brazil
Aline Saettler & Eduardo Laber
Dipartimento di Informatica, Università di Verona, Verona, Italy
Ferdinando Cicalese

Authors

Aline Saettler
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Laber
View author publications
You can also search for this author in PubMed Google Scholar
Ferdinando Cicalese
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aline Saettler.

Additional information

A correction to this article is available online at https://doi.org/10.1007/s00453-018-0423-8.

Appendix: Optimality of $\mathbf {p}^*$

Here, we show that the point $\mathbf {p}^*$ defined in (9)–(10) is an optimal solution of the NLP (5)–(8) defined in Sect. 2. For that, we use Theorem 4.3.8 of [4] that we restate here in a simplified form.

Theorem 4

Let X be an open subset of $R^n$. Consider the optimization problem P.

$$\begin{aligned} \text{ Minimize }&f(\mathbf {x}) \end{aligned}$$

(26)

$$\begin{aligned} \text{ subject } \text{ to }&g_i(\mathbf {x}) \le 0&\quad \,\,\, \text{ for } i=1,\ldots ,m \end{aligned}$$

(27)

$$\begin{aligned}&h_i(\mathbf {x}) = 0&\quad \,\,\, \text{ for } i=1,\ldots ,l \end{aligned}$$

(28)

$$\begin{aligned}&\mathbf {x} \in X \end{aligned}$$

(29)

Let $\mathbf {x}$ be a feasible solution and let $I=\{i|g_i(\mathbf {x})=0\}$. Suppose there exists scalars $u_i \ge 0$ for $i=1,\ldots ,m$ and $v_i$ for $i=1,\ldots ,l$ such that

$$\begin{aligned} - \nabla f(\mathbf {x}) = \sum _{i \in I } u_i \nabla g_i(\mathbf {x}) + \sum _{i=1}^{l} v_i \nabla h_i(\mathbf {x}) \end{aligned}$$

(30)

If f is linear, $g_i$ is convex on X for $i \in I$ and $h_i$ is linear, then $\mathbf {x}$ is an optimal solution of P.^{Footnote 2}

The non linear problem defined in (5)–(8) can be rewritten in terms of problem P. In fact, for a point $\mathbf {p}=(p_1,\ldots ,p_{C+1},z)$ we have that $f(\mathbf {p}) = - z $,

$$\begin{aligned} h_1(\mathbf {p})= & {} 1 - \sum _{i=1}^{C+1} p_i,\\ g_j(\mathbf {p})= & {} z \left( \sum _{i=1}^{C+1} i \cdot p_i \right) - \sum _{i=1}^j i \cdot p_i - (j+W) \left( \sum _{i=j+1}^{C+1} p_i \right) , \end{aligned}$$

for $j=0,\ldots ,C$ and

$$\begin{aligned} g_j(\mathbf {p}) = -p_{j-C} \end{aligned}$$

for $j=C+1,\ldots ,2C+1$.

We define an open set X that contains all feasible solutions of the problem defined in (5)–(8). The motivation is to meet the conditions of Theorem 3.3.7 of [4] that will be used to establish the convexity of $g_i$

Let

$$\begin{aligned} X= & {} \left\{ (p_1,p_2,\ldots ,p_{C+1}, z) \mid \sum _{i=1}^{C+1} p_i> 1-\frac{1}{2(C+1)} \text{ and } \right. \\&\left. p_i> -\frac{1}{2(C+1)^2} \text{ and } z > 0 \right\} . \end{aligned}$$

We have to prove that: (a) $\mathbf {p}^*$ is feasible and $I=\{0,\ldots ,C\}$; (b) f, $h_1$ and $g_i$ satisfy the conditions of Theorem 4 and (c) there are multipliers satisfying condition (30).

1.1 Feasibility of $\mathbf {p}^*$

We have that $\mathbf {p}^*$ is feasible because

$$\begin{aligned} g_j(\mathbf {p}^*)= & {} z^* \left( \sum _{i=1}^{C+1} i \cdot p^*_i \right) - \sum _{i=1}^j i \cdot p^*_i - (j+W) \left( \sum _{i=j+1}^{C+1} p^*_i \right) \\= & {} z^* \left( W-(C+W)\frac{(W-1)^C}{W^C} + (C+1)\frac{(W-1)^C}{W^C} \right) \\&~~~~~~~~~~ - \left( W - (j+W)\frac{(W-1)^j}{W^j} \right) -(j+W)\frac{(W-1)^j}{W^j} \\= & {} z^* \left( W - \frac{(W-1)^{C+1}}{W^{C+1}} \right) -W = 0 \end{aligned}$$

where the last expression holds because

$$\begin{aligned} z^* =\frac{W}{W-\frac{(W-1)^{C+1}}{W^{C+1}} } \end{aligned}$$

We have that $I=\{0,\ldots ,C\}$ because

$$\begin{aligned} g_j= -p_{j-C}= -\frac{(W-1)^{j-C-1}}{(W)^{j-C}} <0, \end{aligned}$$

for $j>C$.

1.2 Convexity of $g_j$

Because f and $h_1$ are linear and $I=\{0,\ldots ,C\}$, we only need to prove that $g_i$, for $i=0,\ldots ,C$, is convex in the open set X.

Since $g_j$, for $j=0,\ldots ,C$, is twice differentiable it follows from Theorem 3.3.7 of [4] that it is enough to show that the Hessian of $g_j$, for $j=0,\ldots ,C$, is semi-positive defined in X. In fact, the Hessian of $g_j$ is a matrix where all elements, except the first $C+1$ in the last line and the first $C+1$ in the last column, are zeros. The matrix has the structure presented below.

$$\begin{aligned} H = \left| \begin{array}{cccccccc} 0 &{} 0 &{} \cdots &{} \cdot &{} \cdot &{} 0 &{} 1 \\ 0 &{} 0 &{} \cdot &{} \cdot &{} \cdot &{} 0 &{} 2 \\ \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot \\ \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot \\ \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot &{} \cdot \\ 0 &{} 0 &{} \cdot &{} \cdot &{} \cdot &{} 0 &{} C+1 \\ 1 &{} 2 &{} \cdot &{} \cdot &{} \cdot &{} C+1 &{} 0 \\ \end{array} \right| \cdot \end{aligned}$$

We have that

$$\begin{aligned} \mathbf {pHp} = 2 z ( p_1 + 2p_2 + \cdots + (C+1)p_{C+1} ) \ge 0 \end{aligned}$$

for all $\mathbf {p}=(p_1,\ldots ,p_{C+1},z) \in X$ because $z>0$ and

$$\begin{aligned} \sum _{ i} i \cdot p_i \ge \sum _{ i } p_i - (C+1) \frac{1}{2(C+1)^2}> 1- \frac{1}{2(C+1)}- \frac{1}{2(C+1)}>0 \end{aligned}$$

1.3 KKT Conditions

Let $\lambda _0,\ldots ,\lambda _C$ be the dual variables associated with the constraints $g_j$, $j=0,\ldots ,C$. Let $\lambda _E$ be the dual variable associated with the constraint $\sum _{ i=1}^{C+1} p_i = 1$.

The multipliers $(\lambda _0,\ldots ,\lambda _C,\lambda _E)$ must satisfy

$$\begin{aligned} - \nabla f(\mathbf {p}^*)= & {} \sum _{i=0}^{C} \lambda _i \nabla g_i(\mathbf {p}^*) + \lambda _E \nabla h_1(\mathbf {p}^*),\\&\lambda _i \ge 0 \,\,\, \text{ for } i=0,\ldots ,C \end{aligned}$$

.

Thus, we must have

$$\begin{aligned} \sum _{j=0}^{i-1} \lambda _j (j+W) + \sum _{j=i}^{C} \lambda _j i + \lambda _{E} = z \sum _{j=0}^{C} \lambda _j i \end{aligned}$$

(31)

for $i=1,\ldots ,C+1$, and also

$$\begin{aligned} \sum _{j=0}^{C} \lambda _j \left( \sum _{i=1}^{C+1} i \cdot p^*_{i} \right) =1 \end{aligned}$$

(32)

Let

$$\begin{aligned} E= \sum _{i=1}^{C+1} i \cdot p^*_{i} \end{aligned}$$

By subtracting the Eq. (31) when $i=C$ from the same equation, when $i=C+1$, using the Eq. (32) and using the fact that $E \cdot z^* = W$, we get that $\lambda _C=\frac{1}{E^2}$. In addition, we can prove by induction that $\lambda _{j-1}=\lambda _j \frac{W-1}{W}$. For that we subtract the Eq. (31) when $i=j-1$ from the same equation, when $i=j$, and use the induction hypothesis. Thus, we get that

$$\begin{aligned} \lambda _i =\frac{(W-1)^{C-i}}{W^{C-i} E^2}, \end{aligned}$$

for every i so that all multipliers are positives.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saettler, A., Laber, E. & Cicalese, F. Trading Off Worst and Expected Cost in Decision Tree Problems. Algorithmica 79, 886–908 (2017). https://doi.org/10.1007/s00453-016-0211-2

Download citation

Received: 03 February 2016
Accepted: 06 September 2016
Published: 15 September 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s00453-016-0211-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trading Off Worst and Expected Cost in Decision Tree Problems

Abstract

Access this article

Similar content being viewed by others

Optimal classification trees

Stackelberg risk preference design

Data-driven robust optimization

Change history

20 March 2018

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Optimality of \(\mathbf {p}^*\)

Theorem 4

1.1 Feasibility of \(\mathbf {p}^*\)

1.2 Convexity of \(g_j\)

1.3 KKT Conditions

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Trading Off Worst and Expected Cost in Decision Tree Problems

Abstract

Access this article

Similar content being viewed by others

Optimal classification trees

Stackelberg risk preference design

Data-driven robust optimization

Change history

20 March 2018

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Optimality of \(\mathbf {p}^*\)

Appendix: Optimality of \(\mathbf {p}^*\)

Theorem 4

1.1 Feasibility of \(\mathbf {p}^*\)

1.2 Convexity of \(g_j\)

1.3 KKT Conditions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation