Sublinear time algorithms for approximate semidefinite programming

Garber, Dan; Hazan, Elad

doi:10.1007/s10107-015-0932-z

Sublinear time algorithms for approximate semidefinite programming

Full Length Paper
Series A
Published: 05 July 2015

Volume 158, pages 329–361, (2016)
Cite this article

Mathematical Programming Submit manuscript

Dan Garber¹ &
Elad Hazan¹

873 Accesses
7 Citations
Explore all metrics

Abstract

We consider semidefinite optimization in a saddle point formulation where the primal solution is in the spectrahedron and the dual solution is a distribution over affine functions. We present an approximation algorithm for this problem that runs in sublinear time in the size of the data. To the best of our knowledge, this is the first algorithm to achieve this. Our algorithm is also guaranteed to produce low-rank solutions. We further prove lower bounds on the running time of any algorithm for this problem, showing that certain terms in the running time of our algorithm cannot be further improved. Finally, we consider a non-affine version of the saddle point problem and give an algorithm that under certain assumptions runs in sublinear time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

Notes

The results presented in this paper are a continuation of preliminary results on sublinear semidefinite optimization presented in [11].
Our results hold also under the weaker assumption that every $c_i$ has a supergradient everywhere in $\mathcal {S}$.
As stated before it suffices to assume that $c_i$ has a supergradient everywhere in $\mathcal {S}$.

References

Agarwal, A., Charikar, M., Makarychev, K., Makarychev, Y.: O(sqrt(log n)) approximation algorithms for min uncut, min 2cnf deletion, and directed cut problems. In: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, STOC ’05, pp. 573–581 (2005)
Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)
Article MathSciNet MATH Google Scholar
Arora, S., Lee, J.R., Naor, A.: Euclidean distortion and the sparsest cut. In: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, STOC ’05, pp. 553–562 (2005)
Arora, S., Rao, S., Vazirani, U.: Expander flows, geometric embeddings and graph partitioning. In: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, STOC ’04, pp. 222–231 (2004)
Baes, M., Bürgisser, M., Nemirovski, A.: A randomized mirror-prox method for solving structured large-scale matrix saddle-point problems. SIAM J. Optim. 23(2), 934–962 (2013)
Article MathSciNet MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York, NY (2006)
Book MATH Google Scholar
Clarkson, K.L., Hazan, E., Woodruff, D.P.: Sublinear optimization for machine learning. J. ACM 59(5), 23 (2012)
Article MathSciNet MATH Google Scholar
D’Aspremont, A.W.: Subsampling algorithms for semidefinite programming. Stoch. Syst. 1, 274–305 (2011). doi:10.1214/10-SSY018
Article MathSciNet MATH Google Scholar
d’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation of sparse PCA using semidefinite programming. SIAM Rev. 3, 41–48 (2004)
MATH Google Scholar
Garber, D., Hazan, E.: Approximating semidefinite programs in sublinear time. In: NIPS, pp. 1080–1088 (2011)
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42, 1115–1145 (1995)
Article MathSciNet MATH Google Scholar
Grigoriadis, M.D., Khachiyan, L.G.: A sublinear-time randomized approximation algorithm for matrix games. Oper. Res. Lett. 18(2), 53–58 (1995)
Article MathSciNet MATH Google Scholar
Hazan, E.: Approximate convex optimization by online game playing. CoRR. arXiv:0610119 (2006)
Hazan, E.: Sparse approximate solutions to semidefinite programs. In: Proceedings of the 8th Latin American conference on Theoretical informatics, LATIN’08, pp. 306–316 (2008)
Sra, S., Nowozin, S., Wright, S.J.: Optimization for machine learning. MIT Press (2012)
Juditsky, A., Nemirovski, A., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011). doi:10.1214/10-SSY011
Article MathSciNet MATH Google Scholar
Kuczyński, J., Woźniakowski, H.: Estimating the largest eigenvalues by the power and lanczos algorithms with a random start. SIAM J. Matrix Anal. Appl. 13, 1094–1122 (1992)
Article MathSciNet MATH Google Scholar
Lanckriet, G.R.G., Cristianini, N., Ghaoui, L.E., Bartlett, P., Jordan, M.I.: Learning the kernel matrix with semi-definite programming. In: Journal of Machine Learning Research, pp. 27–72 (2004)
Nemirovski, A.: Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Smoothing technique and its applications in semidefinite optimization. Math. Program. 110(2), 245–259 (2007)
Article MathSciNet MATH Google Scholar
Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)
MathSciNet MATH Google Scholar
Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trend. Mach. Learn. 4(2), 107–194 (2012)
Article MATH Google Scholar
Shalev-shwartz, S., Singer, Y., Ng, A.Y.: Online and batch learning of pseudo-metrics. In: ICML, pp. 743–750 (2004)
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)
Article MathSciNet MATH Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. Adv. Neural Inf. Process. Syst. 15, 505–512 (2002)
Google Scholar
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: ICML, pp. 928–936 (2003)

Download references

Acknowledgments

We would like to thank both of the anonymous reviewers for their constructive comments which have contributed significantly to the improvement of this paper.

Author information

Authors and Affiliations

Department of Industrial Engineering and Management, Technion, 32000, Haifa, Israel
Dan Garber & Elad Hazan

Authors

Dan Garber
View author publications
You can also search for this author in PubMed Google Scholar
Elad Hazan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Garber.

Appendix: Auxiliary lemmas used in the proof of the main theorem

Most of the proofs given below are adopted from [8] to our needs and are brought here in full detail for completeness.

We begin by proving Lemma 5.

Proof

As a first step, note that for $x>C$ we have $x-\mathbb {E}[X] \ge C/2$, so that

$$\begin{aligned} C(x-C) \le 2(x-\mathbb {E}[X])(x-C) \le 2(x-\mathbb {E}[X])^2 . \end{aligned}$$

Hence, we obtain,

$$\begin{aligned} \mathbb {E}[X] - \mathbb {E}[\bar{X}]&= \int _{x<-C} (x+C) d\mu _X + \int _{x>C} (x-C) d\mu _X \\&\le \int _{x>C} (x-C) d\mu _X \\&\le \frac{2}{C} \int _{x>C} (x-\mathbb {E}[X])^2 d\mu _X \\&\le \frac{2}{C} \text{ Var }[X] . \end{aligned}$$

Similarly one can prove that $\mathbb {E}[X] - \mathbb {E}[\bar{X}] \ge -2\text{ Var }[X]/C$, and the result follows.

In the following lemmas we assume only that $v_t(i) = {{\mathrm{clip}}}(\tilde{v}_t(i),1/\eta )$ is the clipping of a random variable $\tilde{v}_t(i)$, the variance of $\tilde{v}_t(i)$ is at most one ($\text{ Var }[\tilde{v}_t(i)] \le 1$) and we use the notation $\mu _t(i) = \mathbb {E}[\tilde{v}_t(i)]$. We also assume that the expectations of $\tilde{v}_t(i)$ are bounded in absolute value by a constant, $|\mu _t(i)| \le C$, such that $1 \le 2C \le 1/\eta $.

The following lemma is a Bernstein-type inequality for martingales. For a proof see [8], Lemma B.3.

Lemma 22

Let $\{Z_t\}$ be a martingale difference sequence with respect to filtration $\{S_t\}$ (i.e., $\mathbb {E}[Z_t|S_1,\ldots ,S_t] = 0$). Assume the filtration $\{S_t\}$ is such that the values in $S_t$ are determined using only those in $S_{t-1}$, and not any previous history, and so the joint probability distribution satisfies:

$$\begin{aligned} {{\mathrm{Pr}}}\left( S_1=s_1, S_2=s_2, \ldots , S_T=s_t\right) = \prod _{t\in [T-1]} {{\mathrm{Pr}}}\left( S_{t+1}=s_{t+1}\mid S_t=s_t\right) . \end{aligned}$$

In addition, assume for all t, $\mathbb {E}[Z_t^2 | S_1,\ldots ,S_t]\le s$, and $|Z_t| \le V$. Then

$$\begin{aligned} \log {{\mathrm{Pr}}}\left( \sum \nolimits _{t\in [T]} Z_t \ge \alpha \right) \le -\frac{\alpha ^2/2}{Ts + \alpha V/3}. \end{aligned}$$

The following three lemmas prove Lemmas 12, 17.

Lemma 23

For $\sqrt{\frac{4\log {m}}{3T}} \le \eta \le 1/2C$, with probability at least $1-O(1/m)$ it holds that

$$\begin{aligned} \max _{i\in [m]}\sum _{t\in [T]}[v_t(i) - \mu _t(i)] \le 5 \eta T. \end{aligned}$$

Proof

Lemma 5 implies that $|\mathbb {E}[{v}_t(i)] - \mu _t(i)| \le 2\eta $, since $\text{ Var }[\tilde{v}_t(i)]\le 1$.

We show that for given $i\in [m]$, with probability $1 - O(1/m^{2})$, $\sum _{t\in [T]}[v_t(i) - \mathbb {E}[{v}_t(i)]] \le 3 \eta T $, and then apply the union bound over all $i\in [m]$. This together with the above bound on $|\mathbb {E}[{v}_t(i)] - \mu _t(i)|$ implies the lemma via the triangle inequality.

Fixing i, let $Z_t^i \equiv v_t(i) - \mathbb {E}[{v}_t(i)]$, and consider the filtration given by

$$\begin{aligned} S_t \equiv (x_t, p_t, w_t, v_{t-1}, i_{t-1}, j_{t-1}, v_{t-1} - \mathbb {E}[{v}_{t-1}]). \end{aligned}$$

Using the notation $\mathbb {E}_t[\cdot ] = \mathbb {E}[\cdot |S_t]$, observe that

1.
$\forall t \ . \ \mathbb {E}_t[(Z_t^i) ^2 ] = \mathbb {E}_t[v_t(i)^2] - \mathbb {E}_t[v_t(i)]^2 = \text{ Var }(v_t(i)) \le 1$.
2.
$|Z_t^i|\le 2/\eta $. This holds since by construction, $|v_t(i)|\le 1/\eta $, and hence
$$\begin{aligned} |Z_t^i|&= | v_t(i) - \mathbb {E}[v_t(i)]| \le | v_t(i)| + |\mathbb {E}[v_t(i)]| \le \frac{2}{\eta }. \end{aligned}$$

Using these conditions, despite the fact that the $Z_t^i$ are not independent, we can use Lemma 22, and conclude that $Z\equiv \sum _{t\in T} Z_t^i$ satisfies the Bernstein-type inequality with $s=1$ and $V=2/\eta $, and so

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge \alpha \right) \ge \frac{\alpha ^2/2}{Ts + \alpha V/3} \ge \frac{\alpha ^2/2}{T + 2\alpha /3\eta } . \end{aligned}$$

Hence, for $a \ge 3$ we have that

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge a \eta T\right) \ge \frac{a^2/2}{1 + 2a/3} \eta ^2 T \ge \frac{a}{2} \eta ^2 T. \end{aligned}$$

For $\eta \ge \sqrt{4\log {m}/aT}$, the above probability is at most $e^{- 2 \log {m}} = 1/m^2$. Letting $a=3$ we obtain the statement of the lemma.

Lemma 24

For $\sqrt{\log {m}/T} \le \eta \le 1/2C$, with probability at least $1-O(1/m)$,

$$\begin{aligned} \Big | \sum _{t\in [T]}p_t ^{\top }v_t - \sum _{t\in [T]}{}p_t^{\top }\mu _t \Big | \le 4 \eta T . \end{aligned}$$

Proof

This Lemma is proven in essentially the same manner as Lemma 23, and proven below for completeness.

Lemma 5 implies that $ |\mathbb {E}[{v}_t(i)] - \mu _t(i)| \le 2\eta , $ using $\text{ Var }[\tilde{v}_t(i)]\le 1$. Since $p_t$ is a distribution, it follows that $ |\mathbb {E}[p_t ^{\top }{v}_t] - p_t ^{\top }\mu _t| \le \eta . $ Let $Z_t \equiv p_t ^{\top }v_t - \mathbb {E}[p_t ^{\top }{v}_t] = \sum _i p_t(i) Z_t^i$, where $Z_t^i = v_t(i) - \mathbb {E}[v_t(i)]$. Consider the filtration given by

$$\begin{aligned} S_t \equiv (x_t, p_t, w_t, v_{t-1}, i_{t-1}, j_{t-1}, v_{t-1} - \mathbb {E}[{v}_{t-1}]) . \end{aligned}$$

Using the notation $\mathbb {E}_t[\cdot ] = \mathbb {E}[\cdot |S_t]$, the quantities $|Z_t|$ and $\mathbb {E}_t[Z_t^2] $ can be bounded as follows:

$$\begin{aligned} |Z_t|&= \!\left| \sum _{i\in [n]}p_t(i) Z_t^i\right| \le \sum _{i\in [n]}p_t(i) \left| Z_t^i\right| \!\le \! 2 \eta ^{-1},&\text{ using }\, \left| Z_t^i\right| \!\le \! 2\eta ^{-1}\, \text{ as } \text{ in } \text{ Lemma } \text{23 }. \end{aligned}$$

Also, using properties of variance, we have that

$$\begin{aligned} \mathbb {E}[Z_t^2] = {{\mathrm{Var}}}[p_t ^{\top }v_t] \le \max _i \text{ Var }[v_t(i)] \le 1. \end{aligned}$$

With these conditions, we can use Lemma 22 and conclude that $Z\equiv \sum _{t\in T} Z_t$ satisfies the Bernstein-type inequality with $s=1$ and $V=2/\eta $, and so

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge \alpha \right) \ge \frac{\alpha ^2/2}{Ts + \alpha V/3} \ge \frac{\alpha ^2/2}{T + 2\alpha /3\eta } . \end{aligned}$$

Hence, for $a \ge 3$ we have that

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge a \eta T\right) \ge \frac{a^2/2}{1 + 2a/3} \eta ^2 T \ge \frac{a}{2} \eta ^2 T. \end{aligned}$$

For $\eta \ge \sqrt{2\log {m}/aT}$, the above probability is at most $e^{-\log {m}} = 1/m$. Letting $a=2$ we obtain the statement of the lemma.

Lemma 25

For $\sqrt{\log {m}/T} \le \eta \le 1/4$, with probability at least $1-O(1/m)$,

$$\begin{aligned} \Big | \sum _{t\in [T]}\mu _t(i_t) - \sum _{t\in [T]}p_t ^{\top }\mu _t \Big | \le 4 C \eta T . \end{aligned}$$

Proof

Let $Z_t \equiv \mu _t(i_t) - p_t ^{\top }\mu _t $, where now $\mu _t$ is a constant vector and $i_t$ is the random variable, and consider the filtration given by

$$\begin{aligned} S_t \equiv (x_t, p_t, w_t, y_t, v_{t-1}, i_{t-1}, j_{t-1}, Z_{t-1}). \end{aligned}$$

The expectation of $\mu _t(i_t) $, conditioning on $S_t$ with respect to the random choice of $i_t$, is $p_t ^{\top }\mu _t$. Hence $\mathbb {E}_t[Z_t] = 0$, where $\mathbb {E}_t[\cdot ]$ denotes $\mathbb {E}[\cdot |S_t]$. The parameters $|Z_t|$ and $\mathbb {E}[Z_t^2] $ can be bounded as follows:

$$\begin{aligned} |Z_t|&\le |\mu _t(i)| + \left| p_t ^{\top }\mu _t\right| \le 2C,\\ \mathbb {E}\left[ Z_t^2\right]&= \mathbb {E}\left[ \left( \mu _t(i) - p_t ^{\top }\mu _t\right) ^2 \right] \le 2 \mathbb {E}\left[ \mu _t(i)^2\right] + 2 \left( p_t ^{\top }\mu _t\right) ^2 \le 4C^2 . \end{aligned}$$

Applying Lemma 22 to $Z\equiv \sum _{t\in T} Z_t$, with parameters $s = 4C^2,\ V = 2C$, we obtain

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge \alpha \right) \ge \frac{\alpha ^2/2}{4C^2T + 2C\alpha /3}. \end{aligned}$$

Hence, for $\eta \le 1/a$ we have that

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge a C \eta T\right) \ge \frac{a^2 \eta ^2 T/2}{4 + 2 a \eta /3} \ge \frac{a^2}{10} \eta ^2 T \end{aligned}$$

and if $\eta \ge \sqrt{10 \log {m}/a^2 T}$, the above probability is no more than 1 / m. Letting $a=4$ we obtain the lemma.

The following is a proof of Lemma 13.

Proof

For all $i\in [m]$ it holds that $\mathbb {E}[v_t(i)^2] \le \mathbb {E}[\tilde{v}_t(i)^2] \le 4$. Thus since $p_t$ is a distribution we have that $\mathbb {E}\left[ {\sum _{t\in {[T]}}p_t^{\top }v_t^2}\right] \le 4T$.

The result follows from applying Markov’s inequality to the random variable $\sum _{t\in {[T]}}p_t^{\top }v_t^2$.

The following is a proof of Lemma 11.

Proof

The proof relies on the analysis for the Lanczos method in [18], Theorem 4.2.

According to [18], given a positive semi-definite matrix M such that $\Vert {M}\Vert _2 \le \rho $ and parameter $\epsilon , \delta > 0$, the Lanczos method returns in time $O\left( {\frac{N}{\sqrt{\epsilon }}\log \frac{n}{\delta }}\right) $ and with probability at least $1-\delta $ a vector x such that

$$\begin{aligned} x^{\top }Mx \ge \lambda _{\max }(M)(1-\epsilon ) . \end{aligned}$$

In our case M need not be positive-semidefinite. Given M such that $\Vert {M}\Vert _2 \le \rho $, we define $M' = M + \rho {}\mathbf I $. Thus $M'$ is positive-semidefinite and it holds that $\Vert {M'}\Vert _2 \le 2\rho $. Thus if we apply the Lanczos procedure with error parameter $\epsilon ' = \epsilon /(2\rho )$ we get a unit vector x such that

$$\begin{aligned} x^{\top }M'x \ge \lambda _{\max }(M') - \frac{\epsilon }{2\rho }\lambda _{\max }(M') \ge \lambda _{\max }(M') - \epsilon . \end{aligned}$$

Now it holds that

$$\begin{aligned} x^{\top }M'x= & {} x^{\top }Mx + x^{\top }\rho {}\mathbf I x = x^{\top }Mx + \rho \\\ge & {} \lambda _{\max }(M' ) - \epsilon = \lambda _{\max }(M) + \rho - \epsilon . \end{aligned}$$

Thus,

$$\begin{aligned} x^{\top }Mx \ge \lambda _{\max }(M) - \epsilon . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garber, D., Hazan, E. Sublinear time algorithms for approximate semidefinite programming. Math. Program. 158, 329–361 (2016). https://doi.org/10.1007/s10107-015-0932-z

Download citation

Received: 20 May 2013
Accepted: 23 June 2015
Published: 05 July 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10107-015-0932-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sublinear time algorithms for approximate semidefinite programming

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Auxiliary lemmas used in the proof of the main theorem

Proof

Lemma 22

Lemma 23

Proof

Lemma 24

Proof

Lemma 25

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Sublinear time algorithms for approximate semidefinite programming

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Auxiliary lemmas used in the proof of the main theorem

Appendix: Auxiliary lemmas used in the proof of the main theorem

Proof

Lemma 22

Lemma 23

Proof

Lemma 24

Proof

Lemma 25

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation