Abstract
We consider semidefinite optimization in a saddle point formulation where the primal solution is in the spectrahedron and the dual solution is a distribution over affine functions. We present an approximation algorithm for this problem that runs in sublinear time in the size of the data. To the best of our knowledge, this is the first algorithm to achieve this. Our algorithm is also guaranteed to produce low-rank solutions. We further prove lower bounds on the running time of any algorithm for this problem, showing that certain terms in the running time of our algorithm cannot be further improved. Finally, we consider a non-affine version of the saddle point problem and give an algorithm that under certain assumptions runs in sublinear time.
Similar content being viewed by others
Notes
The results presented in this paper are a continuation of preliminary results on sublinear semidefinite optimization presented in [11].
Our results hold also under the weaker assumption that every \(c_i\) has a supergradient everywhere in \(\mathcal {S}\).
As stated before it suffices to assume that \(c_i\) has a supergradient everywhere in \(\mathcal {S}\).
References
Agarwal, A., Charikar, M., Makarychev, K., Makarychev, Y.: O(sqrt(log n)) approximation algorithms for min uncut, min 2cnf deletion, and directed cut problems. In: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, STOC ’05, pp. 573–581 (2005)
Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)
Arora, S., Lee, J.R., Naor, A.: Euclidean distortion and the sparsest cut. In: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, STOC ’05, pp. 553–562 (2005)
Arora, S., Rao, S., Vazirani, U.: Expander flows, geometric embeddings and graph partitioning. In: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, STOC ’04, pp. 222–231 (2004)
Baes, M., Bürgisser, M., Nemirovski, A.: A randomized mirror-prox method for solving structured large-scale matrix saddle-point problems. SIAM J. Optim. 23(2), 934–962 (2013)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York, NY (2006)
Clarkson, K.L., Hazan, E., Woodruff, D.P.: Sublinear optimization for machine learning. J. ACM 59(5), 23 (2012)
D’Aspremont, A.W.: Subsampling algorithms for semidefinite programming. Stoch. Syst. 1, 274–305 (2011). doi:10.1214/10-SSY018
d’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation of sparse PCA using semidefinite programming. SIAM Rev. 3, 41–48 (2004)
Garber, D., Hazan, E.: Approximating semidefinite programs in sublinear time. In: NIPS, pp. 1080–1088 (2011)
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42, 1115–1145 (1995)
Grigoriadis, M.D., Khachiyan, L.G.: A sublinear-time randomized approximation algorithm for matrix games. Oper. Res. Lett. 18(2), 53–58 (1995)
Hazan, E.: Approximate convex optimization by online game playing. CoRR. arXiv:0610119 (2006)
Hazan, E.: Sparse approximate solutions to semidefinite programs. In: Proceedings of the 8th Latin American conference on Theoretical informatics, LATIN’08, pp. 306–316 (2008)
Sra, S., Nowozin, S., Wright, S.J.: Optimization for machine learning. MIT Press (2012)
Juditsky, A., Nemirovski, A., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011). doi:10.1214/10-SSY011
Kuczyński, J., Woźniakowski, H.: Estimating the largest eigenvalues by the power and lanczos algorithms with a random start. SIAM J. Matrix Anal. Appl. 13, 1094–1122 (1992)
Lanckriet, G.R.G., Cristianini, N., Ghaoui, L.E., Bartlett, P., Jordan, M.I.: Learning the kernel matrix with semi-definite programming. In: Journal of Machine Learning Research, pp. 27–72 (2004)
Nemirovski, A.: Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2005)
Nesterov, Y.: Smoothing technique and its applications in semidefinite optimization. Math. Program. 110(2), 245–259 (2007)
Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)
Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trend. Mach. Learn. 4(2), 107–194 (2012)
Shalev-shwartz, S., Singer, Y., Ng, A.Y.: Online and batch learning of pseudo-metrics. In: ICML, pp. 743–750 (2004)
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. Adv. Neural Inf. Process. Syst. 15, 505–512 (2002)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: ICML, pp. 928–936 (2003)
Acknowledgments
We would like to thank both of the anonymous reviewers for their constructive comments which have contributed significantly to the improvement of this paper.
Author information
Authors and Affiliations
Corresponding author
Appendix: Auxiliary lemmas used in the proof of the main theorem
Appendix: Auxiliary lemmas used in the proof of the main theorem
Most of the proofs given below are adopted from [8] to our needs and are brought here in full detail for completeness.
We begin by proving Lemma 5.
Proof
As a first step, note that for \(x>C\) we have \(x-\mathbb {E}[X] \ge C/2\), so that
Hence, we obtain,
Similarly one can prove that \(\mathbb {E}[X] - \mathbb {E}[\bar{X}] \ge -2\text{ Var }[X]/C\), and the result follows.
In the following lemmas we assume only that \(v_t(i) = {{\mathrm{clip}}}(\tilde{v}_t(i),1/\eta )\) is the clipping of a random variable \(\tilde{v}_t(i)\), the variance of \(\tilde{v}_t(i)\) is at most one (\(\text{ Var }[\tilde{v}_t(i)] \le 1\)) and we use the notation \(\mu _t(i) = \mathbb {E}[\tilde{v}_t(i)]\). We also assume that the expectations of \(\tilde{v}_t(i)\) are bounded in absolute value by a constant, \(|\mu _t(i)| \le C\), such that \(1 \le 2C \le 1/\eta \).
The following lemma is a Bernstein-type inequality for martingales. For a proof see [8], Lemma B.3.
Lemma 22
Let \(\{Z_t\}\) be a martingale difference sequence with respect to filtration \(\{S_t\}\) (i.e., \(\mathbb {E}[Z_t|S_1,\ldots ,S_t] = 0\)). Assume the filtration \(\{S_t\}\) is such that the values in \(S_t\) are determined using only those in \(S_{t-1}\), and not any previous history, and so the joint probability distribution satisfies:
In addition, assume for all t, \(\mathbb {E}[Z_t^2 | S_1,\ldots ,S_t]\le s\), and \(|Z_t| \le V\). Then
The following three lemmas prove Lemmas 12, 17.
Lemma 23
For \(\sqrt{\frac{4\log {m}}{3T}} \le \eta \le 1/2C\), with probability at least \(1-O(1/m)\) it holds that
Proof
Lemma 5 implies that \(|\mathbb {E}[{v}_t(i)] - \mu _t(i)| \le 2\eta \), since \(\text{ Var }[\tilde{v}_t(i)]\le 1\).
We show that for given \(i\in [m]\), with probability \(1 - O(1/m^{2})\), \(\sum _{t\in [T]}[v_t(i) - \mathbb {E}[{v}_t(i)]] \le 3 \eta T \), and then apply the union bound over all \(i\in [m]\). This together with the above bound on \(|\mathbb {E}[{v}_t(i)] - \mu _t(i)|\) implies the lemma via the triangle inequality.
Fixing i, let \(Z_t^i \equiv v_t(i) - \mathbb {E}[{v}_t(i)]\), and consider the filtration given by
Using the notation \(\mathbb {E}_t[\cdot ] = \mathbb {E}[\cdot |S_t]\), observe that
-
1.
\(\forall t \ . \ \mathbb {E}_t[(Z_t^i) ^2 ] = \mathbb {E}_t[v_t(i)^2] - \mathbb {E}_t[v_t(i)]^2 = \text{ Var }(v_t(i)) \le 1\).
-
2.
\(|Z_t^i|\le 2/\eta \). This holds since by construction, \(|v_t(i)|\le 1/\eta \), and hence
$$\begin{aligned} |Z_t^i|&= | v_t(i) - \mathbb {E}[v_t(i)]| \le | v_t(i)| + |\mathbb {E}[v_t(i)]| \le \frac{2}{\eta }. \end{aligned}$$
Using these conditions, despite the fact that the \(Z_t^i\) are not independent, we can use Lemma 22, and conclude that \(Z\equiv \sum _{t\in T} Z_t^i\) satisfies the Bernstein-type inequality with \(s=1\) and \(V=2/\eta \), and so
Hence, for \(a \ge 3\) we have that
For \(\eta \ge \sqrt{4\log {m}/aT}\), the above probability is at most \(e^{- 2 \log {m}} = 1/m^2\). Letting \(a=3\) we obtain the statement of the lemma.
Lemma 24
For \(\sqrt{\log {m}/T} \le \eta \le 1/2C\), with probability at least \(1-O(1/m)\),
Proof
This Lemma is proven in essentially the same manner as Lemma 23, and proven below for completeness.
Lemma 5 implies that \( |\mathbb {E}[{v}_t(i)] - \mu _t(i)| \le 2\eta , \) using \(\text{ Var }[\tilde{v}_t(i)]\le 1\). Since \(p_t\) is a distribution, it follows that \( |\mathbb {E}[p_t ^{\top }{v}_t] - p_t ^{\top }\mu _t| \le \eta . \) Let \(Z_t \equiv p_t ^{\top }v_t - \mathbb {E}[p_t ^{\top }{v}_t] = \sum _i p_t(i) Z_t^i\), where \(Z_t^i = v_t(i) - \mathbb {E}[v_t(i)]\). Consider the filtration given by
Using the notation \(\mathbb {E}_t[\cdot ] = \mathbb {E}[\cdot |S_t]\), the quantities \(|Z_t|\) and \(\mathbb {E}_t[Z_t^2] \) can be bounded as follows:
Also, using properties of variance, we have that
With these conditions, we can use Lemma 22 and conclude that \(Z\equiv \sum _{t\in T} Z_t\) satisfies the Bernstein-type inequality with \(s=1\) and \(V=2/\eta \), and so
Hence, for \(a \ge 3\) we have that
For \(\eta \ge \sqrt{2\log {m}/aT}\), the above probability is at most \(e^{-\log {m}} = 1/m\). Letting \(a=2\) we obtain the statement of the lemma.
Lemma 25
For \(\sqrt{\log {m}/T} \le \eta \le 1/4\), with probability at least \(1-O(1/m)\),
Proof
Let \(Z_t \equiv \mu _t(i_t) - p_t ^{\top }\mu _t \), where now \(\mu _t\) is a constant vector and \(i_t\) is the random variable, and consider the filtration given by
The expectation of \(\mu _t(i_t) \), conditioning on \(S_t\) with respect to the random choice of \(i_t\), is \(p_t ^{\top }\mu _t\). Hence \(\mathbb {E}_t[Z_t] = 0\), where \(\mathbb {E}_t[\cdot ]\) denotes \(\mathbb {E}[\cdot |S_t]\). The parameters \(|Z_t|\) and \(\mathbb {E}[Z_t^2] \) can be bounded as follows:
Applying Lemma 22 to \(Z\equiv \sum _{t\in T} Z_t\), with parameters \(s = 4C^2,\ V = 2C\), we obtain
Hence, for \(\eta \le 1/a\) we have that
and if \(\eta \ge \sqrt{10 \log {m}/a^2 T}\), the above probability is no more than 1 / m. Letting \(a=4\) we obtain the lemma.
The following is a proof of Lemma 13.
Proof
For all \(i\in [m]\) it holds that \(\mathbb {E}[v_t(i)^2] \le \mathbb {E}[\tilde{v}_t(i)^2] \le 4\). Thus since \(p_t\) is a distribution we have that \(\mathbb {E}\left[ {\sum _{t\in {[T]}}p_t^{\top }v_t^2}\right] \le 4T\).
The result follows from applying Markov’s inequality to the random variable \(\sum _{t\in {[T]}}p_t^{\top }v_t^2\).
The following is a proof of Lemma 11.
Proof
The proof relies on the analysis for the Lanczos method in [18], Theorem 4.2.
According to [18], given a positive semi-definite matrix M such that \(\Vert {M}\Vert _2 \le \rho \) and parameter \(\epsilon , \delta > 0\), the Lanczos method returns in time \(O\left( {\frac{N}{\sqrt{\epsilon }}\log \frac{n}{\delta }}\right) \) and with probability at least \(1-\delta \) a vector x such that
In our case M need not be positive-semidefinite. Given M such that \(\Vert {M}\Vert _2 \le \rho \), we define \(M' = M + \rho {}\mathbf I \). Thus \(M'\) is positive-semidefinite and it holds that \(\Vert {M'}\Vert _2 \le 2\rho \). Thus if we apply the Lanczos procedure with error parameter \(\epsilon ' = \epsilon /(2\rho )\) we get a unit vector x such that
Now it holds that
Thus,
Rights and permissions
About this article
Cite this article
Garber, D., Hazan, E. Sublinear time algorithms for approximate semidefinite programming. Math. Program. 158, 329–361 (2016). https://doi.org/10.1007/s10107-015-0932-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0932-z