Abstract
We characterize the best possible trade-off achievable when optimizing the construction of a decision tree with respect to both the worst and the expected cost. It is known that a decision tree achieving the minimum possible worst case cost can behave very poorly in expectation (even exponentially worse than the optimal), and the vice versa is also true. Led by applications where deciding which optimization criterion might not be easy, several authors recently have focussed on the bicriteria optimization of decision trees. Here we sharply define the limits of the best possible trade-offs between expected and worst case cost. More precisely, we show that for every \(\rho >0\) there is a decision tree D with worst testing cost at most \((1 + \rho )\textit{OPT}_W\) and expected testing cost at most \(\frac{1}{1 - e^{-\rho }} \textit{OPT}_E,\) where \(\textit{OPT}_W\) and \(\textit{OPT}_E\) denote the minimum worst testing cost and the minimum expected testing cost of a decision tree for the given instance. We also show that this is the best possible trade-off in the sense that there are infinitely many instances for which we cannot obtain a decision tree with both worst testing cost smaller than \((1 + \rho )\textit{OPT}_W\) and expected testing cost smaller than \(\frac{1}{1 - e^{-\rho }} \textit{OPT}_E\).
Similar content being viewed by others
Change history
20 March 2018
This erratum fixes a technical problem in the paper published in Algorithmica, Volume 79, Number 3, November 2017, pp. 886–908. Theorem 1 of this paper gives upper bounds on both worst testing cost and expected testing cost of the decision tree built by Algorithm 1.
Notes
For the sake of readability here we use the notation \(Pr[\,]\) for the probability of objects.
Theorem 4.3.8 of [4] requires weaker conditions on f, h and g but for our purposes it is enough to consider these stronger and more well known conditions on \(\mathbf {x}\).
References
Adler, M., Heeringa, B.: Approximating optimal binary decision trees. Algorithmica 62(3), 1112–1121 (2012)
Aslam, J.A., Rasala, A., Stein, C., Young, N.E.: Improved bicriteria existence theorems for scheduling. In: SODA 1999, pp. 846–847 (1999)
Alkhalid, A., Chikalov, I., Moshkov, M.: A tool for study of optimal decision trees. LNCS 6401, 353–360 (2010)
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 2nd edn. Wiley, New York (1993)
Bellala, G., Bhavnani, S.K., Scott, C.: Group-based active query selection for rapid diagnosis in time-critical situations. IEEE-IT 58(1), 459–478 (2012)
Buro, M.: On the maximum length of huffman codes. IPL 45(5), 219–223 (1993)
Chakaravarthy, V.T., Pandit, V., Roy, S., Awasthi, P., Mohania, M.: Decision trees for entity identification: approximation algorithms and hardness results. ACM Trans. Algorithms 7(2), 15:1–15:22 (2011)
Cicalese, F., Jacobs, T., Laber, E., Molinaro, M.: On greedy algorithms for decision trees. In: Proceedings of ISAAC (2010)
Cicalese, F., Laber, E., Saettler, A.: Diagnosis determination: decision trees optimizing simultaneously worst and expected testing cost. In: ICML2014, pp. 414–422 (2014)
Garey, M.R.: Optimal binary identification procedures. SIAM J. Appl. Math. 23(2), 173–186 (1972)
Garey, M.R.: Optimal binary search trees with restricted maximal depth. SIAM J. Comput. 3(2), 101–110 (1974)
Golovin, D., Krause, A., Ray, D.: Near-optimal bayesian active learning with noisy observations. Adv. Neural Inf. Proc. Syst. 23, 766–774 (2010)
Guillory, A., Bilmes, J.: Average-case active learning with costs. In: ALT’09, pp. 141–155 (2009)
Guillory, A., Bilmes, J.: Interactive submodular set cover. In: Proceedings of ICML10, pp. 415–422 (2010)
Gupta, A., Nagarajan, V., Ravi, R.: Approximation algorithms for optimal decision trees and adaptive TSP problems. In: Proceedings of ICALP’10, pp. 690–701 (2010)
Hussain, S.: Relationships among various parameters for decision tree optimization. Stud. Comput. Intell. 514, 393–410 (2014)
Kelle, P., Schneider, H., Yi, H.: Decision alternatives between expected cost minimization and worst case scenario in emergency supply second revision. Int. J. Prod. Econ. 157, 250–260 (2014)
Kosaraju, S., Przytycka, T., Borgstrom, R.: On an optimal split tree problem. WADS 99, 157–168 (1999)
Krause, A.: Optimizing sensing: theory and applications. Ph.D. thesis, Carnegie Mellon University (2008)
Larmore, L.L.: Height restricted optimal binary trees. SICOMP 16(6), 1115–1123 (1987)
Larmore, L.L., Hirschberg, D.S.: A fast algorithm for optimal length-limited huffman codes. J. ACM 37(3), 464–473 (1990)
Milidi, R.L., Laber, E.S.: Bounding the inefficiency of length-restricted prefix codes. Algorithmica 31(4), 513–529 (2001)
Moshkov, M.J.: Greedy algorithm with weights for decision tree construction. Fundamentae Informaticae 104(3), 285–292 (2010)
Papadimitriou, C.H., Yannakakis, M.: On the approximability of trade-offs and optimal access of web sources. FOCS 2000, 86–92 (2000)
Rasala, A., Stein, C., Torng, E., Uthaisombut, P.: Existence theorems, lower bounds and algorithms for scheduling to meet two objectives. In: SODA 2002, pp. 723–731 (2002)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
Author information
Authors and Affiliations
Corresponding author
Additional information
A correction to this article is available online at https://doi.org/10.1007/s00453-018-0423-8.
Appendix: Optimality of \(\mathbf {p}^*\)
Appendix: Optimality of \(\mathbf {p}^*\)
Here, we show that the point \(\mathbf {p}^*\) defined in (9)–(10) is an optimal solution of the NLP (5)–(8) defined in Sect. 2. For that, we use Theorem 4.3.8 of [4] that we restate here in a simplified form.
Theorem 4
Let X be an open subset of \(R^n\). Consider the optimization problem P.
Let \(\mathbf {x}\) be a feasible solution and let \(I=\{i|g_i(\mathbf {x})=0\}\). Suppose there exists scalars \(u_i \ge 0\) for \(i=1,\ldots ,m\) and \(v_i\) for \(i=1,\ldots ,l\) such that
If f is linear, \(g_i\) is convex on X for \(i \in I\) and \(h_i\) is linear, then \(\mathbf {x}\) is an optimal solution of P.Footnote 2
The non linear problem defined in (5)–(8) can be rewritten in terms of problem P. In fact, for a point \(\mathbf {p}=(p_1,\ldots ,p_{C+1},z)\) we have that \(f(\mathbf {p}) = - z \),
for \(j=0,\ldots ,C\) and
for \(j=C+1,\ldots ,2C+1\).
We define an open set X that contains all feasible solutions of the problem defined in (5)–(8). The motivation is to meet the conditions of Theorem 3.3.7 of [4] that will be used to establish the convexity of \(g_i\)
Let
We have to prove that: (a) \(\mathbf {p}^*\) is feasible and \(I=\{0,\ldots ,C\}\); (b) f, \(h_1\) and \(g_i\) satisfy the conditions of Theorem 4 and (c) there are multipliers satisfying condition (30).
1.1 Feasibility of \(\mathbf {p}^*\)
We have that \(\mathbf {p}^*\) is feasible because
where the last expression holds because
We have that \(I=\{0,\ldots ,C\}\) because
for \(j>C\).
1.2 Convexity of \(g_j\)
Because f and \(h_1\) are linear and \(I=\{0,\ldots ,C\}\), we only need to prove that \(g_i\), for \(i=0,\ldots ,C\), is convex in the open set X.
Since \(g_j\), for \(j=0,\ldots ,C\), is twice differentiable it follows from Theorem 3.3.7 of [4] that it is enough to show that the Hessian of \(g_j\), for \(j=0,\ldots ,C\), is semi-positive defined in X. In fact, the Hessian of \(g_j\) is a matrix where all elements, except the first \(C+1\) in the last line and the first \(C+1\) in the last column, are zeros. The matrix has the structure presented below.
We have that
for all \(\mathbf {p}=(p_1,\ldots ,p_{C+1},z) \in X\) because \(z>0\) and
1.3 KKT Conditions
Let \(\lambda _0,\ldots ,\lambda _C\) be the dual variables associated with the constraints \(g_j\), \(j=0,\ldots ,C\). Let \(\lambda _E\) be the dual variable associated with the constraint \(\sum _{ i=1}^{C+1} p_i = 1\).
The multipliers \((\lambda _0,\ldots ,\lambda _C,\lambda _E)\) must satisfy
.
Thus, we must have
for \(i=1,\ldots ,C+1\), and also
Let
By subtracting the Eq. (31) when \(i=C\) from the same equation, when \(i=C+1\), using the Eq. (32) and using the fact that \(E \cdot z^* = W\), we get that \(\lambda _C=\frac{1}{E^2}\). In addition, we can prove by induction that \(\lambda _{j-1}=\lambda _j \frac{W-1}{W}\). For that we subtract the Eq. (31) when \(i=j-1\) from the same equation, when \(i=j\), and use the induction hypothesis. Thus, we get that
for every i so that all multipliers are positives.
Rights and permissions
About this article
Cite this article
Saettler, A., Laber, E. & Cicalese, F. Trading Off Worst and Expected Cost in Decision Tree Problems. Algorithmica 79, 886–908 (2017). https://doi.org/10.1007/s00453-016-0211-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-016-0211-2