Files

Abstract

Stochastic gradient descent (SGD) and randomized coordinate descent (RCD) are two of the workhorses for training modern automated decision systems. Intriguingly, convergence properties of these methods are not well-established as we move away from the specific case of smooth minimization. In this dissertation, we focus on related problems of nonsmooth optimization and min-max optimization to improve the theoretical understanding of stochastic algorithms. First, we study SGD-based adaptive algorithms and propose a regret analysis framework overcoming the limitations of the existing ones in the convex case. In the nonconvex case, we prove convergence of an adaptive gradient algoritm for solving constrained weakly convex optimization, generalizing the previously known results on unconstrained smooth optimization. We also propose an algorithm combining Nesterov's smoothing with SGD to solve convex problems with infinitely many linear constraints, with optimal rates. Then, we move on to convex-concave min-max problems with bilinear coupling and analyze primal-dual coordinate descent (PDCD) algorithms. We obtain the first PDCD methods with the optimal $\mathcal{O}(1/k)$ rate on the the standard optimality measure expected primal-dual gap, which was an open question since 2014. Our analysis also aims to explain the practical behavior of these algorithms by showing that the last iterate enjoys adaptive linear convergence without altering the parameters, depending on a certain error bound condition. Furthermore, we propose an algorithm combining the favorable properties of two branches of PDCD methods: the new method uses large step sizes with dense data and its per-iteration cost depends on the number of nonzeros of the data matrix. Thanks to these unique properties, this method enjoys compelling practical performance complementing its rigorous theoretical guarantees. Next, we consider monotone variational inequalities that generalize convex-concave min-max problems with nonbilinear coupling. We introduce variance reduced algorithms that converge under the same set of assumptions as their deterministic counterparts and improve the best-known complexities for solving convex-concave min-max problems with finite-sum structure. Optimality of our algorithms for this problem class is established in a recent work via matching lower bounds. Finally, we show our preliminary results on policy optimization methods for solving two player zero-sum Markov games for competitive reinforcement learning (RL). Even though this is a nonconvex-nonconcave min-max problem in general, thanks to the special structure, it is tractable to find an approximate Nash equilibrium. We introduce an algorithm that improves the best-known sample complexity of policy gradient methods. This development combines tools from RL and stochastic primal-dual optimization, showing the importance of techniques from convex-concave optimization.

Details

Actions

Preview