Abstract
In this paper, we study how well one can approximate arbitrary polytopes using sparse inequalities. Our motivation comes from the use of sparse cutting-planes in mixed-integer programing (MIP) solvers, since they help in solving the linear programs encountered during branch-&-bound more efficiently. However, how well can we approximate the integer hull by just using sparse cutting-planes? In order to understand this question better, given a polyope \(P\) (e.g. the integer hull of a MIP), let \(P^k\) be its best approximation using cuts with at most k non-zero coefficients. We consider \({\text {d}}(P, P^k) = \max _{x \in P^k} \left( \min _{y \in P} \Vert x - y\Vert \right) \) as a measure of the quality of sparse cuts.In our first result, we present general upper bounds on \({\text {d}}(P, P^k)\) which depend on the number of vertices in the polytope. Our bounds imply that if \(P\) has polynomially many vertices, using half sparsity already approximates it very well. Second, we present a lower bound on \({\text {d}}(P, P^k)\) for random polytopes that show that the upper bounds are quite tight. Third, we show that for a class of hard packing IPs, sparse cutting-planes do not approximate the integer hull well, that is \(d(P, P^k)\) is large for such instances unless k is very close to n. Finally, we show that using sparse cutting-planes in extended formulations is at least as good as using them in the original polyhedron, and give an example where the former is actually much better.
Similar content being viewed by others
Notes
If \(k \ge \frac{8\log 4tn }{9}\), then \(\frac{n^{\frac{1}{4}}\sqrt{8\sqrt{n}}\sqrt{\log 4tn} }{\sqrt{k}} \ge \frac{8\sqrt{n}\log 4tn}{3k}\).
References
Achterberg, T.: Personal communication
Amaldi, E., Coniglio, S., Gualandi, S.: Coordinated cutting plane generation via multi-objective separation. Math. Program. 143(1–2), 87–110 (2014). doi:10.1007/s10107-012-0596-x
Andersen, K., Weismantel, R.: Zero-coefficient cuts. In: Eisenbrand, F., Shepherd F.B. (eds.) Integer Programming and Combinatorial Optimization. Lecture Notes in Computer Science, vol. 6080, pp. 57–70. Springer, Berlin, Heidelberg (2010)
Balas, E., Souza, CCd: The vertex separator problem: a polyhedral investigation. Math. Program. 103(3), 583–608 (2005). doi:10.1007/s10107-005-0574-7
Basu, A., Bonami, P., Cornuéjols, G., Margot, F.: On the relative strength of split, triangle and quadrilateral cuts. Math. Program. 126(2), 281–314 (2011)
Basu, A., Cornuéjols, G., Molinaro, M.: A probabilistic analysis of the strength of the split and triangle closures. In: Günlük, O., Woeginger, G. (eds.) Integer Programming and Combinatoral Optimization. Lecture Notes in Computer Science, vol. 6655, pp. 27–38. Springer, Berlin, Heidelberg (2011)
Bixby, R.E.: Solving real-world linear programs: a decade and more of progress. Oper. Res. 50(1), 3–15 (2002). doi:10.1287/opre.50.1.3.17780
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Coleman, T.F.: Large Sparse Numerical Optimization. Springer, New York, NY (1984)
DasGupta, A.: Probability for Statistics and Machine Learning. Springer, Berlin (2011)
David, H., Nagaraja, H.: Order Statistics. Wiley, New York (2003)
Dey, S.S., Iroume, A., Molinaro, M.: Some lower bounds on sparse outer approximations of polytopes. arXiv:1412.3765
Eldersveld, S., Saunders, M.: A block-\(lu\) update for large-scale linear programming. SIAM J. Matrix Anal. Appl. 13(1), 191–201 (1992). doi:10.1137/0613016
Goemans, M.X.: Worst-case comparison of valid inequalities for the tsp. Math. Program. 69(2), 335–349 (1995)
Gu, Z.: Personal communication
Jeroslow, R.: On defining sets of vertices of the hypercube by linear inequalities. Discret. Math. 11, 119–124 (1975)
Kaparis, K., Letchford, A.N.: Separation algorithms for 0–1 knapsack polytopes. Math. Program. 124(1–2), 69–91 (2010)
Koltchinskii, V.: Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Springer, Berlin (2011)
Matousek, J., Vondrak, J.: The Probabilistic Method (2008). Manuscript
Narisetty, A.: Personal communication
Reid, J.: A sparsity-exploiting variant of the Bartels-Golub decomposition for linear programming bases. Math. Program. 24(1), 55–69 (1982). doi:10.1007/BF01585094
Ziegler, G.M.: Lectures on 0/1-polytopes. In: Polytopes Combinatorics and Computation, pp. 1–41. Springer, Berlin (2000)
Author information
Authors and Affiliations
Corresponding author
Additional information
Santanu S. Dey and Qianyi Wang were partially supported by NSF Grant CMMI-1149400.
Appendices
Appendix 1: Concentration inequalities
We state Bernstein’s inequality in a slightly weaker but more convenient form.
Theorem 8
(Bernstein’s Inequality [18], Appendix A.2]) Let \({\varvec{X}}_1, {\varvec{X}}_2, \ldots , {\varvec{X}}_n\) be independent random variables such that \(|{\varvec{X}}_i - {\mathbb {E}}[{\varvec{X}}_i]| \le M\) for all \(i \in [n]\). Let \({\varvec{X}}= \sum _{i = 1}^n {\varvec{X}}_i\) and define \(\sigma ^2 = {\text {Var}}({\varvec{X}})\). Then for all \(t > 0\) we have
Appendix 2: Empirically generating lower bound on \(d(P,P^k)\)
We estimate a lower bound on \(d(P,P^k)\) using the following procedure. The input to the procedure is the set of points \(\{p^1, \ldots , p^t \} \in [0,1 ]^n\) which are vertices of \(P\). For every \(I \in {[n] \atopwithdelims ()k}\), we use PORTA to obtain an inequality description of \(P+ \mathbb {R}^{\bar{I}}\). Putting all these inequalities together we obtain an inequality description of \(P^k\). Unfortunately due to the large number of inequalities, we are unable to find the vertices of \(P^k\) using PORTA. Therefore, we obtain a lower bound on \(d(P,P^k)\) via a shooting experiment.
First observe that given \(u \in \mathbb {R}^n{\setminus } \{0\}\) we obtain a lower bound on \(d(P,P^k)\) as:
Moreover it can be verified that there exists a direction which achieves the correct value of \(d(P, P^k)\). We generated 20,000 random directions u by picking them uniformly in the set \([-1,1]^n\). Also we found that for instances where \(p^j \in \{ x \in \{0,1 \}^n \,:\, \sum _{i = 1}^n x_i = \frac{n}{2}\}\), the directions \((\frac{1}{\sqrt{n}}, \ldots , \frac{1}{\sqrt{n}})\) and \(-(\frac{1}{\sqrt{n}}, \ldots , \frac{1}{\sqrt{n}})\) yield good lower bounds. The Figure in Section 1.3(c) plots the best lower bound among the 20,002 lower bounds found as above.
Appendix 3: Anticoncentration of linear combination of Bernoulli’s
It is convenient to restate Lemma 3 in terms of Rademacher random variables (i.e. that takes values \(-\)1/1 with equal probability).
Lemma 14
(Lemma 3, restated) Let \({\varvec{X}}_1, {\varvec{X}}_2, \ldots , {\varvec{X}}_n\) be independent Rademacher random variables. Then for every \(a \in [-1,1]^n\),
We start with the case where the vector a has all of its coordinates being similar.
Lemma 15
Let \({\varvec{X}}_1, {\varvec{X}}_2, \ldots , {\varvec{X}}_n\) be independent Rademacher random variables. For every \(\epsilon \ge 1/20\) and \(a \in [1 - \epsilon , 1]^n\),
Proof
Since \(a{\varvec{X}}= \sum _{i=1}^n {\varvec{X}}_i - \sum _{i=1}^n (1-a_i) {\varvec{X}}_i\), having \(\sum _{i=1}^n {\varvec{X}}_i \ge 2t\) and \(\sum _{i=1}^n (1-a_i) {\varvec{X}}_i \le t\) implies that \(a{\varvec{X}}\ge t\). Therefore,
where the second inequality comes from union bound. For \(t \in [0, n/8]\), the first term in the right-hand side can be lower bounded by \(e^{-\frac{50 t^2}{n}}\) (see for instance Section 7.3 of [19]). The second term in the right-hand side can be bounded using Bernstein’s inequality: given that \({\text {Var}}(\sum _{i=1}^n (1 -a_i) {\varvec{X}}_i) = \sum _{i=1}^n (1 - a_i)^2 \le n \epsilon ^2\), we get that for all \(t \in [0,n/8]\)
The lemma then follows by plugging these bounds on (3) and using \(t = \alpha \sqrt{n} \ge \frac{\alpha }{\sqrt{n}}\Vert a\Vert _1\). \(\square \)
Proof of Lemma 14
Without loss of generality assume \(a > 0\), since flipping the sign of negative coordinates of a does not change the distribution of \(a{\varvec{Z}}\) neither the term \(\frac{\alpha }{\sqrt{n}} \left( 1 - \frac{2}{n^2}\right) \Vert a\Vert _1\). Also assume without loss of generality that \(\Vert a\Vert _\infty = 1\). The idea of the proof is to bucket the coordinates such that in each bucket the values of a is within a factor of \((1 \pm \epsilon )\) of each other, and then apply Lemma 15 in each bucket.
The first step is to trim the coefficients of a that are very small. Define the trimmed version b of a by setting \(b_i = a_i\) for all i where \(a_i \ge 1/n^3\) and \(b_i = 0\) for all other i. We first show that
and then we argue that the error introduced by considering b instead of a is small.
For \(j \in \{0, 1, \ldots , \frac{3 \log n}{\epsilon }\}\), define the jth bucket as \(I_j = \{i : b_i \in ((1-\epsilon )^{j+1}, (1-\epsilon )^j]\}\). Since \((1-\epsilon )^{\frac{3 \log n}{\epsilon }} \le e^{-3 \log n} = 1/n^3\), we have that every index i with \(b_i > 0\) lies within some bucket.
Now fix some bucket j. Let \(\epsilon = 1/20\) and \(\gamma = \frac{\alpha }{\sqrt{n}}\). Let \(E_j\) be the event that \(\sum _{i \in I_j} b_i {\varvec{Z}}_i \ge \gamma \sum _{i \in I_j} b_i\). Employing Lemma 15 over the vector \((1-\epsilon )^j b|_{I_j}\), gives
But now notice that if in a scenario we have \(E_j\) holding for all j, then in this scenario we have \(b{\varvec{Z}}\ge \gamma \Vert b\Vert _1\). Using the fact that the \(E_j\)’s are independent (due to the independence of the coordinates of \({\varvec{Z}}\)), we have
Now we claim that whenever \(bX \ge \gamma \Vert b\Vert _1\), then we have \(a{\varvec{Z}}\ge \frac{\alpha }{\sqrt{n}} \left( 1 - \frac{2}{n^2} \right) \Vert a\Vert _1\). First notice that \(\Vert b\Vert _1 \ge \Vert a\Vert _1 - 1/n^2 \ge \Vert a\Vert _1 (1 - 1/n^2)\), since \(\Vert a\Vert _1 \ge \Vert a\Vert _\infty = 1\). Moreover, with probability 1 we have \(a{\varvec{Z}}\ge b{\varvec{Z}}- 1/n^2\). Therefore, whenever \(b{\varvec{Z}}\ge \gamma \Vert b\Vert _1\):
This concludes the proof of the lemma. \(\square \)
Appendix 4: Hard packing integer programs
1.1 Proof of Lemma 8
Fix \(i \in [n]\). We have \({\mathbb {E}}[\sum _{j=1}^m {\varvec{A}}^j_i] = \frac{mM}{2}\) and \({\text {Var}}(\sum _{j=1}^m {\varvec{A}}^j_i) \le \frac{mM^2}{4}\). Employing Bernstein’s inequality we get
where the last inequality uses the assumption that \(m \ge 8 \log 8n\). Similarly, we get that
Taking a union bound over the first displayed inequality over all \(i \in [n]\) and also over the last inequality, with probability at least \(1-1/4\) the valid cut \(\sum _{i} \left( \frac{2}{mM} \sum _j {\varvec{A}}^j_i\right) x_i \le \frac{1}{mM} \sum _{i,j} {\varvec{A}}^j_i\) (obtained by aggregating all inequalities in the formulation) has all coefficients on the left-hand side being at least \(\left( 1 - \frac{2 \sqrt{\log 8n}}{\sqrt{m}}\right) \) and the right-hand side at most \(\frac{n}{2} + \frac{\sqrt{n \log 8}}{\sqrt{m}}\). This concludes the proof.
1.2 Proof of Lemma 9
Fix \(j \in [m]\). We have \({\mathbb {E}}[\sum _{i = 1}^n {\varvec{A}}^j_i] = \frac{n M}{2}\) and \({\text {Var}}( \sum _{i = 1}^n {\varvec{A}}^j_i) \le n M^2/4\) and hence by Bernstein’s inequality we get
where the last inequality uses the assumption that \(m \le n\). The lemma then follows by taking a union bound over all \(j \in [m]\).
Rights and permissions
About this article
Cite this article
Dey, S.S., Molinaro, M. & Wang, Q. Approximating polyhedra with sparse inequalities. Math. Program. 154, 329–352 (2015). https://doi.org/10.1007/s10107-015-0925-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0925-y