Abstract
Stochastic optimization has experienced significant growth in recent decades, with the increasing prevalence of variance reduction techniques in stochastic optimization algorithms to enhance computational efficiency. In this paper, we introduce two projection-free stochastic approximation algorithms for maximizing diminishing return (DR) submodular functions over convex constraints, building upon the Stochastic Path Integrated Differential EstimatoR (SPIDER) and its variants. Firstly, we present a SPIDER Continuous Greedy (SPIDER-CG) algorithm for the monotone case that guarantees a \((1-e^{-1})\text {OPT}-\varepsilon \) approximation after \(\mathcal {O}(\varepsilon ^{-1})\) iterations and \(\mathcal {O}(\varepsilon ^{-2})\) stochastic gradient computations under the mean-squared smoothness assumption. For the non-monotone case, we develop a SPIDER Frank–Wolfe (SPIDER-FW) algorithm that guarantees a \(\frac{1}{4}(1-\min _{x\in \mathcal {C}}{\Vert x\Vert _{\infty }})\text {OPT}-\varepsilon \) approximation with \(\mathcal {O}(\varepsilon ^{-1})\) iterations and \(\mathcal {O}(\varepsilon ^{-2})\) stochastic gradient estimates. To address the practical challenge associated with a large number of samples per iteration, we introduce a modified gradient estimator based on SPIDER, leading to a Hybrid SPIDER-FW (Hybrid SPIDER-CG) algorithm, which achieves the same approximation guarantee as SPIDER-FW (SPIDER-CG) algorithm with only \(\mathcal {O}(1)\) samples per iteration. Numerical experiments on both simulated and real data demonstrate the efficiency of the proposed methods.
Similar content being viewed by others
References
Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: ICML, pp. 699–707 (2016)
Bach, Francis: Submodular functions: from discrete to continuous domains. Math. Program. 175, 419–459 (2019)
Bian, A.A., Mirzasoleiman, B., Buhmann, J., Krause, A.: Guaranteed non-convex optimization: submodular maximization over continuous domains. In: AISTATS, pp. 111–120 (2017)
Bian, A., Levy, K., Krause, A., Buhmann, J.M.: Continuous DR-submodular maximization: structure and algorithms. Adv. Neural. Inf. Process. Syst. 30, 487–497 (2017)
Calinescu, G., Chekuri, C., Pal, M., Vondrák, J.: Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. 40(6), 1740–1766 (2011)
Chekuri, C., Jayram, T., Vondrák, J.: On multiplicative weight updates for concave and submodular function maximization. In: ITCS, pp. 201–210 (2015)
Chekuri, Chandra, Vondrák, Jan, Zenklusen, Rico: Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM J. Comput. 43(6), 1831–1879 (2014)
Cutkosky, A., Orabona, F.: Momentum-based variance reduction in non-convex SGD. Adv. Neural. Inf. Process. Syst. 32, 15157–15166 (2019)
Du, D., Liu, Z., Wu, C., Xu, D., Zhou, Y.: An improved approximation algorithm for maximizing a DR-submodular function over a convex set. arXiv:2203.14740, pp. 1–8 (2022)
Du, D.: Lyapunov function approach for approximation algorithm design and analysis: with applications in submodular maximization. arXiv:2205.12442, pp. 1–30 (2022)
Dürr, C., Thang, N.K., Srivastav, A., Tible, L.: Non-monotone DR-submodular maximization over general convex sets. In: IJCAI, pp. 2148–2154 (2021)
Elenberg, E., Dimakis, A.G., Feldman, M., Karbasi, A.: Streaming weak submodularity: interpreting neural networks on the fly. Adv. Neural. Inf. Process. Syst. 30, 4045–4055 (2017)
Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path-integrated differential estimator. Adv. Neural. Inf. Process. Syst. 31, 689–699 (2018)
Feldman, M., Naor, J., Schwartz, R.: A unified continuous greedy algorithm for submodular maximization. In: FOCS, pp. 570–579 (2011)
Feldman, Moran: Guess free maximization of submodular and linear sums. Algorithmica 83(3), 853–878 (2021)
Fisher, M.L., Nemhauser, G.L., Wolsey, L.A.: An analysis of approximations for maximizing submodular set functions—II, pp. 73–87. Springer, Berlin, Heidelberg (1978)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Log. 3(1–2), 95–110 (1956)
Guestrin, C., Krause, A., Singh, A.P.: Near-optimal sensor placements in Gaussian processes. In: ICML, pp. 265–272 (2005)
Hassani, H., Soltanolkotabi, M., Karbasi, A.: Gradient methods for submodular maximization. Adv. Neural. Inf. Process. Syst. 30, 5842–5852 (2017)
Hassani, H., Karbasi, A., Mokhtari, A., Shen, Z.: Stochastic conditional gradient++: (non) convex minimization and continuous submodular maximization. SIAM J. Optim. 30(4), 3315–3344 (2020)
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: ICML, pp. 427–435 (2013)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural. Inf. Process. Syst. 26, 315–323 (2013)
Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: KDD, pp. 137–146 (2003)
Les Misérables network dataset—KONECT (2017). http://konect.cc/networks/moreno_lesmis
Lian, Y., Xu, D., Du, D., Zhou, Y.: A stochastic non-monotone DR-submodular maximization problem over a convex set. In: COCOON, pp. 1–11 (2023)
Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: ACL-HLT, pp. 510–520 (2011)
Mokhtari, A., Hassani, H., Karbasi, A.: Stochastic conditional gradient methods: from convex minimization to submodular maximization. J. Mach. Learn. Res. 21(105), 1–49 (2020)
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions—I. Math. Program. 14(1), 265–294 (1978)
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: a novel method for machine learning problems using stochastic recursive gradient. In: ICML, pp. 2613–2621 (2017)
Niazadeh, Rad, Roughgarden, Tim, Wang, Joshua R.: Optimal algorithms for continuous non-monotone submodular and DR-submodular maximization. J. Mach. Learn. Res. 21(1), 4937–4967 (2020)
Reddi, S.J., Hefny, A., Sra, S., Póczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: ICML, pp. 314–323 (2016)
Soma, T., Kakimura, N., Inaba, K., Kawarabayashi, K.-i.: Optimal budget allocation: theoretical guarantee and efficient algorithm. In: ICML, pp. 351–359 (2014)
Soma, T., Yoshida, Y.: Non-monotone DR-submodular function maximization. In: AAAI, vol. 31 (2017)
Staib, M., Jegelka, S.: Robust budget allocation via continuous submodular functions. In: ICML, pp. 3230–3240 (2017)
Sviridenko, M.: A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett. 32(1), 41–43 (2004)
Train bombing network dataset—KONECT (2017). http://konect.cc/networks/moreno_train
Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Math. Program. 191(2), 1005–1071 (2022)
Vondrák, J.: Optimal approximation for the submodular welfare problem in the value oracle model. In: STOC, pp. 67–74 (2008)
Windsurfers network dataset—KONECT (2017). http://konect.cc/networks/moreno_beach
Zhang, Q., Deng, Z., Chen, Z., Hu, H., Yang, Y.: Stochastic continuous submodular maximization: boosting via non-oblivious function. In: ICML, pp. 26116–26134 (2022)
Zhang, M., Shen, Z., Mokhtari, A., Hassani, H., Karbasi, A.: One sample stochastic Frank–Wolfe. In: AISTATS, pp. 4012–4023 (2020)
Acknowledgements
The first author and the fourth author are supported by National Natural Science Foundation of China (No. 12131003), and the first author is also supported by Major Key Project of PCL (No. PCL2022A05). The second author is supported by the NSERC grant (No. 283106) and NSFC grants (Nos. 11771386, 11728104). The third author is supported by the Major Key Project of PCL (No. PCL2022A05) and the National Natural Science Foundation of China (No. 12271278). The fifth author is supported by National Natural Science Foundation of China (No. 12371099).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. The first draft of the manuscript was written by Yuefang Lian and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial, non-financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendices
Appendix A
1.1 A.1. The Proof of Lemma 3.2
Proof
We first prove the conclusion for the monotone case. From Lemma 3.1 and the updating rule, we have
Furthermore, for \(k=0\) with \(\vert \mathcal {M}_{0}\vert =\frac{\sigma ^{2}K^{2}}{L^{2}R^{2}}\), it is easy to obtain
Thus, for \(k\ne 0\) with \(\vert \mathcal {M}_{k}\vert =K\), we have
It is analogous to derive the conclusion for the non-monotone case with \(\alpha _{k}\le \frac{1}{K}\) and the boundedness of \(\mathcal {C}\). \(\square \)
1.2 A.2. The Proof of Lemma 3.4
Proof
From the updating step in Algorithm 2 and \(v_{k}\in [0,1]^{n}\), it implies that
Note that for \(i\in [n]\), there is that,
where \(a_{p_{k}}=(1+p_{k})^{2}\) and \(b_{p_{k}}=p_{k}\), and because of the initial point \(x_{0}\in \arg \min _{x\in \mathcal {C}}\Vert x\Vert _{\infty }\), then,
which implies that
Furthermore, the inequality (9) follows from Property 2. \(\square \)
Appendix B
1.1 B.1. The Proof of Theorem 4.1
Proof
By the definition of the potential function (4), Lemma 3.3 and the Cauchy–Schwarz inequality, we obtain
where the third inequality is from \(p_{k}=\frac{k}{K}\), \(1-e^{-\frac{1}{K}}\le \frac{1}{K}\) and the last one is because of the inequality (13) in Lemma 4.2. Telescoping the above inequality over \(k\in \{0, \ldots , K-1\}\) yields
where \(p_{K}\ge p_{k}, \forall k\le K\) and \(\sum ^{K}_{k=1}k^{-\frac{1}{2}}\le \int ^{K}_{0}x^{-\frac{1}{2}}dx=2K^{\frac{1}{2}}\). Rearranging the inequality, we have
where \(F(x_{0})\ge 0, e^{p_{0}}=1\) and \(e^{p_{K}}=e\). \(\square \)
1.2 B.2. The Proof of Theorem 4.2
Proof
Note that the parameter \(\alpha _{k}=\frac{(p_{k+1}-p_{k})(1+p_{k})}{(1+p_{k+1})^{2}}\le \frac{1}{K}\), which satisfies the condition of Lemma 4.2 for the non-monotone case. From the definition of the potential function (7) and Lemma 3.5, we have
where we use \(\alpha _{k} =\frac{(b_{p_{k+1}}-b_{p_{k}})\sqrt{a_{p_{k}}}}{a_{p_{k+1}}}\) and the Cauchy–Schwarz inequality in the second inequality, and the last inequality is from \(a_{p_{k+1}} \ge a_{p_{k}}, b_{p_{k}}=\frac{k}{K}, \sqrt{a_{p_{k}}}\le 2\), \(\max _{x, y\in \mathcal {C}}\Vert x-y\Vert \le D\) and Lemma 4.2 for non-monotone case.
Summing up the above inequality over \(k\in \{0, \ldots , K-1\}\) yields
where the last inequality comes from \(\sum ^{K-1}_{k=1}k^{-\frac{1}{2}}\le \int ^{K}_{0}x^{-\frac{1}{2}}dx=2K^{\frac{1}{2}}\). Rearranging the inequality, we attain
where the equality follows from \(b_{p_{K}}=1, b_{p_{0}}=0,a_{p_{K}}=4\) and the non-negativeness of F. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lian, Y., Du, D., Wang, X. et al. Stochastic Variance Reduction for DR-Submodular Maximization. Algorithmica 86, 1335–1364 (2024). https://doi.org/10.1007/s00453-023-01195-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-023-01195-z