Skip to main content
Log in

Stochastic Variance Reduction for DR-Submodular Maximization

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Stochastic optimization has experienced significant growth in recent decades, with the increasing prevalence of variance reduction techniques in stochastic optimization algorithms to enhance computational efficiency. In this paper, we introduce two projection-free stochastic approximation algorithms for maximizing diminishing return (DR) submodular functions over convex constraints, building upon the Stochastic Path Integrated Differential EstimatoR (SPIDER) and its variants. Firstly, we present a SPIDER Continuous Greedy (SPIDER-CG) algorithm for the monotone case that guarantees a \((1-e^{-1})\text {OPT}-\varepsilon \) approximation after \(\mathcal {O}(\varepsilon ^{-1})\) iterations and \(\mathcal {O}(\varepsilon ^{-2})\) stochastic gradient computations under the mean-squared smoothness assumption. For the non-monotone case, we develop a SPIDER Frank–Wolfe (SPIDER-FW) algorithm that guarantees a \(\frac{1}{4}(1-\min _{x\in \mathcal {C}}{\Vert x\Vert _{\infty }})\text {OPT}-\varepsilon \) approximation with \(\mathcal {O}(\varepsilon ^{-1})\) iterations and \(\mathcal {O}(\varepsilon ^{-2})\) stochastic gradient estimates. To address the practical challenge associated with a large number of samples per iteration, we introduce a modified gradient estimator based on SPIDER, leading to a Hybrid SPIDER-FW (Hybrid SPIDER-CG) algorithm, which achieves the same approximation guarantee as SPIDER-FW (SPIDER-CG) algorithm with only \(\mathcal {O}(1)\) samples per iteration. Numerical experiments on both simulated and real data demonstrate the efficiency of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: ICML, pp. 699–707 (2016)

  2. Bach, Francis: Submodular functions: from discrete to continuous domains. Math. Program. 175, 419–459 (2019)

    Article  MathSciNet  Google Scholar 

  3. Bian, A.A., Mirzasoleiman, B., Buhmann, J., Krause, A.: Guaranteed non-convex optimization: submodular maximization over continuous domains. In: AISTATS, pp. 111–120 (2017)

  4. Bian, A., Levy, K., Krause, A., Buhmann, J.M.: Continuous DR-submodular maximization: structure and algorithms. Adv. Neural. Inf. Process. Syst. 30, 487–497 (2017)

    Google Scholar 

  5. Calinescu, G., Chekuri, C., Pal, M., Vondrák, J.: Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. 40(6), 1740–1766 (2011)

    Article  MathSciNet  Google Scholar 

  6. Chekuri, C., Jayram, T., Vondrák, J.: On multiplicative weight updates for concave and submodular function maximization. In: ITCS, pp. 201–210 (2015)

  7. Chekuri, Chandra, Vondrák, Jan, Zenklusen, Rico: Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM J. Comput. 43(6), 1831–1879 (2014)

    Article  MathSciNet  Google Scholar 

  8. Cutkosky, A., Orabona, F.: Momentum-based variance reduction in non-convex SGD. Adv. Neural. Inf. Process. Syst. 32, 15157–15166 (2019)

    Google Scholar 

  9. Du, D., Liu, Z., Wu, C., Xu, D., Zhou, Y.: An improved approximation algorithm for maximizing a DR-submodular function over a convex set. arXiv:2203.14740, pp. 1–8 (2022)

  10. Du, D.: Lyapunov function approach for approximation algorithm design and analysis: with applications in submodular maximization. arXiv:2205.12442, pp. 1–30 (2022)

  11. Dürr, C., Thang, N.K., Srivastav, A., Tible, L.: Non-monotone DR-submodular maximization over general convex sets. In: IJCAI, pp. 2148–2154 (2021)

  12. Elenberg, E., Dimakis, A.G., Feldman, M., Karbasi, A.: Streaming weak submodularity: interpreting neural networks on the fly. Adv. Neural. Inf. Process. Syst. 30, 4045–4055 (2017)

    Google Scholar 

  13. Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path-integrated differential estimator. Adv. Neural. Inf. Process. Syst. 31, 689–699 (2018)

    Google Scholar 

  14. Feldman, M., Naor, J., Schwartz, R.: A unified continuous greedy algorithm for submodular maximization. In: FOCS, pp. 570–579 (2011)

  15. Feldman, Moran: Guess free maximization of submodular and linear sums. Algorithmica 83(3), 853–878 (2021)

    Article  MathSciNet  Google Scholar 

  16. Fisher, M.L., Nemhauser, G.L., Wolsey, L.A.: An analysis of approximations for maximizing submodular set functions—II, pp. 73–87. Springer, Berlin, Heidelberg (1978)

  17. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Log. 3(1–2), 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  18. Guestrin, C., Krause, A., Singh, A.P.: Near-optimal sensor placements in Gaussian processes. In: ICML, pp. 265–272 (2005)

  19. Hassani, H., Soltanolkotabi, M., Karbasi, A.: Gradient methods for submodular maximization. Adv. Neural. Inf. Process. Syst. 30, 5842–5852 (2017)

    Google Scholar 

  20. Hassani, H., Karbasi, A., Mokhtari, A., Shen, Z.: Stochastic conditional gradient++: (non) convex minimization and continuous submodular maximization. SIAM J. Optim. 30(4), 3315–3344 (2020)

    Article  MathSciNet  Google Scholar 

  21. Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: ICML, pp. 427–435 (2013)

  22. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural. Inf. Process. Syst. 26, 315–323 (2013)

    Google Scholar 

  23. Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: KDD, pp. 137–146 (2003)

  24. Les Misérables network dataset—KONECT (2017). http://konect.cc/networks/moreno_lesmis

  25. Lian, Y., Xu, D., Du, D., Zhou, Y.: A stochastic non-monotone DR-submodular maximization problem over a convex set. In: COCOON, pp. 1–11 (2023)

  26. Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: ACL-HLT, pp. 510–520 (2011)

  27. Mokhtari, A., Hassani, H., Karbasi, A.: Stochastic conditional gradient methods: from convex minimization to submodular maximization. J. Mach. Learn. Res. 21(105), 1–49 (2020)

    MathSciNet  Google Scholar 

  28. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions—I. Math. Program. 14(1), 265–294 (1978)

    Article  MathSciNet  Google Scholar 

  29. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: a novel method for machine learning problems using stochastic recursive gradient. In: ICML, pp. 2613–2621 (2017)

  30. Niazadeh, Rad, Roughgarden, Tim, Wang, Joshua R.: Optimal algorithms for continuous non-monotone submodular and DR-submodular maximization. J. Mach. Learn. Res. 21(1), 4937–4967 (2020)

    MathSciNet  Google Scholar 

  31. Reddi, S.J., Hefny, A., Sra, S., Póczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: ICML, pp. 314–323 (2016)

  32. Soma, T., Kakimura, N., Inaba, K., Kawarabayashi, K.-i.: Optimal budget allocation: theoretical guarantee and efficient algorithm. In: ICML, pp. 351–359 (2014)

  33. Soma, T., Yoshida, Y.: Non-monotone DR-submodular function maximization. In: AAAI, vol. 31 (2017)

  34. Staib, M., Jegelka, S.: Robust budget allocation via continuous submodular functions. In: ICML, pp. 3230–3240 (2017)

  35. Sviridenko, M.: A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett. 32(1), 41–43 (2004)

    Article  MathSciNet  Google Scholar 

  36. Train bombing network dataset—KONECT (2017). http://konect.cc/networks/moreno_train

  37. Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Math. Program. 191(2), 1005–1071 (2022)

    Article  MathSciNet  Google Scholar 

  38. Vondrák, J.: Optimal approximation for the submodular welfare problem in the value oracle model. In: STOC, pp. 67–74 (2008)

  39. Windsurfers network dataset—KONECT (2017). http://konect.cc/networks/moreno_beach

  40. Zhang, Q., Deng, Z., Chen, Z., Hu, H., Yang, Y.: Stochastic continuous submodular maximization: boosting via non-oblivious function. In: ICML, pp. 26116–26134 (2022)

  41. Zhang, M., Shen, Z., Mokhtari, A., Hassani, H., Karbasi, A.: One sample stochastic Frank–Wolfe. In: AISTATS, pp. 4012–4023 (2020)

Download references

Acknowledgements

The first author and the fourth author are supported by National Natural Science Foundation of China (No. 12131003), and the first author is also supported by Major Key Project of PCL (No. PCL2022A05). The second author is supported by the NSERC grant (No. 283106) and NSFC grants (Nos. 11771386, 11728104). The third author is supported by the Major Key Project of PCL (No. PCL2022A05) and the National Natural Science Foundation of China (No. 12271278). The fifth author is supported by National Natural Science Foundation of China (No. 12371099).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. The first draft of the manuscript was written by Yuefang Lian and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yang Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial, non-financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A subset of this work (Sect. 3.3) appeared in the proceedings of COCOON 2022 [25].

Appendices

Appendix A

1.1 A.1. The Proof of Lemma 3.2

Proof

We first prove the conclusion for the monotone case. From Lemma 3.1 and the updating rule, we have

$$\begin{aligned} \mathbb {E}\left[ \Vert \Phi _{k}\Vert ^{2}\right]\le & {} \frac{L^{2}}{\vert \mathcal {M}_{k}\vert }\frac{1}{K^{2}}\mathbb {E}\left[ \Vert v_{k}\Vert ^{2}\right] +\mathbb {E}\left[ \Vert \Phi _{k-1}\Vert ^{2}\right] \\\le & {} \frac{kL^{2}R^{2}}{K^{2}\vert \mathcal {M}_{k}\vert }+\mathbb {E}\left[ \Vert \Phi _{0}\Vert ^{2}\right] . \end{aligned}$$

Furthermore, for \(k=0\) with \(\vert \mathcal {M}_{0}\vert =\frac{\sigma ^{2}K^{2}}{L^{2}R^{2}}\), it is easy to obtain

$$\begin{aligned} \mathbb {E}\left[ \Vert \Phi _{0}\Vert ^{2}\right] \le \frac{\sigma ^{2}}{\vert \mathcal {M}_{0}\vert }=\frac{L^{2}R^{2}}{K^{2}}. \end{aligned}$$

Thus, for \(k\ne 0\) with \(\vert \mathcal {M}_{k}\vert =K\), we have

$$\begin{aligned} \mathbb {E}\left[ \Vert \Phi _{k}\Vert ^{2}\right] \le \frac{kL^{2}R^{2}}{\vert \mathcal {M}_{k}\vert K^{2}}+\frac{L^{2}R^{2}}{K^{2}} \le \left( 1+\frac{k}{K}\right) \frac{L^{2}R^{2}}{K^{2}}. \end{aligned}$$

It is analogous to derive the conclusion for the non-monotone case with \(\alpha _{k}\le \frac{1}{K}\) and the boundedness of \(\mathcal {C}\). \(\square \)

1.2 A.2. The Proof of Lemma 3.4

Proof

From the updating step in Algorithm 2 and \(v_{k}\in [0,1]^{n}\), it implies that

$$\begin{aligned} x_{k+1}-x_{k}= & {} \frac{(b_{p_{k+1}}-b_{p_{k}})\sqrt{a_{p_{k}}}}{a_{p_{k+1}}}\left( v_{k}-x_{k}\right) \\\le & {} \frac{(b_{p_{k+1}}-b_{p_{k}})\sqrt{a_{p_{k}}}}{a_{p_{k+1}}}(\textbf{1}-x_{k}). \end{aligned}$$

Note that for \(i\in [n]\), there is that,

$$\begin{aligned} 1-(x_{k+1})_{i}\ge & {} \left( 1-\frac{(b_{p_{k+1}}-b_{p_{k}})\sqrt{a_{p_{k}}}}{a_{p_{k+1}}}\right) \left( 1-(x_{k})_{i}\right) \\\ge & {} \left( 1-\frac{b_{p_{k+1}}-b_{p_{k}}}{1+p_{k+1}}\right) \left( 1-(x_{k})_{i}\right) \\\ge & {} \frac{1}{1+p_{k+1}}\left( 1-(x_{0})_{i}\right) , \end{aligned}$$

where \(a_{p_{k}}=(1+p_{k})^{2}\) and \(b_{p_{k}}=p_{k}\), and because of the initial point \(x_{0}\in \arg \min _{x\in \mathcal {C}}\Vert x\Vert _{\infty }\), then,

$$\begin{aligned} (x_{k})_{i}\le 1-\frac{1}{1+p_{k}}(1-(x_{0})_{i})\le 1-\frac{1}{\sqrt{a_{p_{k}}}}\left( 1-\min _{x\in \mathcal {C}}\Vert x\Vert _{\infty }\right) , \quad i\in [n], \end{aligned}$$

which implies that

$$\begin{aligned} \Vert x_{k}\Vert _{\infty }\le 1-\frac{1-\min _{x\in \mathcal {C}}\Vert x\Vert _{\infty }}{\sqrt{a_{p_{k}}}}. \end{aligned}$$

Furthermore, the inequality (9) follows from Property 2. \(\square \)

Appendix B

1.1 B.1. The Proof of Theorem 4.1

Proof

By the definition of the potential function (4), Lemma 3.3 and the Cauchy–Schwarz inequality, we obtain

$$\begin{aligned}{} & {} E(p_{k+1})-E(p_{k})\\{} & {} \quad \ge -\frac{LR^{2}e^{p_{k+1}}}{2K^{2}}-(e^{p_{k+1}}-e^{p_{k}})\mathbb {E}\left[ \Vert \Phi _{k}\Vert \Vert x^{*}-v_{k}\Vert \right] \\{} & {} \quad \ge e^{p_{k+1}}\left( -\frac{LR^{2}}{2K^{2}}-2R\left( 1-e^{({p_{k}}-p_{k+1})}\right) \mathbb {E}\left[ \Vert \Phi _{k}\Vert \right] \right) \\{} & {} \quad \ge e^{p_{k+1}}\left( -\frac{LR^{2}}{2K^{2}}-\frac{2R}{K}\mathbb {E}\left[ \Vert \Phi _{k}\Vert \right] \right) \\{} & {} \quad \ge e^{p_{k+1}}\left( -\frac{LR^{2}}{2K^{2}}-\frac{2R}{K}\frac{\sqrt{\hat{\Theta }}}{\sqrt{k+1}}\right) , \end{aligned}$$

where the third inequality is from \(p_{k}=\frac{k}{K}\), \(1-e^{-\frac{1}{K}}\le \frac{1}{K}\) and the last one is because of the inequality (13) in Lemma 4.2. Telescoping the above inequality over \(k\in \{0, \ldots , K-1\}\) yields

$$\begin{aligned} E(p_{K})-E(p_{0})= & {} e^{p_{K}}\left( \mathbb {E}\left[ F(x_{K})\right] -F(x^{*})\right) -e^{p_{0}}\left( \mathbb {E}\left[ F(x_{0})\right] -F(x^{*})\right) \\\ge & {} -\sum ^{K}_{k=1}e^{p_{k}}\left( \frac{LR^{2}}{2K^{2}}+\frac{2R}{K}\frac{\sqrt{\hat{\Theta }}}{\sqrt{k}}\right) \\\ge & {} -e^{p_{K}}\left( \frac{LR^{2}}{2K}+\frac{2R\sqrt{\hat{\Theta }}}{K}\cdot 2\sqrt{K}\right) , \end{aligned}$$

where \(p_{K}\ge p_{k}, \forall k\le K\) and \(\sum ^{K}_{k=1}k^{-\frac{1}{2}}\le \int ^{K}_{0}x^{-\frac{1}{2}}dx=2K^{\frac{1}{2}}\). Rearranging the inequality, we have

$$\begin{aligned} \mathbb {E}[F(x_{K})]\ge & {} F(x^{*})+\frac{e^{p_{0}}\left( \mathbb {E}\left[ F(x_{0})\right] -F(x^{*})\right) -e^{p_{K}}\left( \frac{LR^{2}}{2K}+\frac{2R\sqrt{\hat{\Theta }}}{K}\cdot 2\sqrt{K}\right) }{e^{p_{K}}}\\\ge & {} \left( 1-\frac{1}{e}\right) F(x^{*})-\frac{LR^{2}}{2K}-\frac{4R\sqrt{\hat{\Theta }}}{\sqrt{K}}, \end{aligned}$$

where \(F(x_{0})\ge 0, e^{p_{0}}=1\) and \(e^{p_{K}}=e\). \(\square \)

1.2 B.2. The Proof of Theorem 4.2

Proof

Note that the parameter \(\alpha _{k}=\frac{(p_{k+1}-p_{k})(1+p_{k})}{(1+p_{k+1})^{2}}\le \frac{1}{K}\), which satisfies the condition of Lemma 4.2 for the non-monotone case. From the definition of the potential function (7) and Lemma 3.5, we have

$$\begin{aligned}{} & {} E(p_{k+1})-E(p_{k})\\{} & {} \quad \ge -\frac{LD^{2}{\alpha _{k}}^{2}a_{p_{k+1}}}{2}-(b_{p_{k+1}}-b_{p_{k}})\sqrt{a_{p_{k}}}\cdot \mathbb {E}\left[ \langle \nabla F(x_{k})-g_{k}, x^{*}-v_{k}\rangle \right] \\{} & {} \quad \ge -\frac{LD^{2}a_{p_{k}}(b_{p_{k+1}}-b_{p_{k}})^{2}}{2a_{p_{k+1}}}-(b_{p_{k+1}}-b_{p_{k}})\sqrt{a_{p_{k}}}\cdot \mathbb {E}\left[ \Vert \Phi _{k}\Vert \Vert x^{*}-v_{k}\Vert \right] \\{} & {} \quad \ge -\frac{LD^{2}}{2K^{2}}-\frac{2D}{K}\frac{\sqrt{\bar{\Theta }}}{\sqrt{k+1}}, \end{aligned}$$

where we use \(\alpha _{k} =\frac{(b_{p_{k+1}}-b_{p_{k}})\sqrt{a_{p_{k}}}}{a_{p_{k+1}}}\) and the Cauchy–Schwarz inequality in the second inequality, and the last inequality is from \(a_{p_{k+1}} \ge a_{p_{k}}, b_{p_{k}}=\frac{k}{K}, \sqrt{a_{p_{k}}}\le 2\), \(\max _{x, y\in \mathcal {C}}\Vert x-y\Vert \le D\) and Lemma 4.2 for non-monotone case.

Summing up the above inequality over \(k\in \{0, \ldots , K-1\}\) yields

$$\begin{aligned}{} & {} E(p_{K})-E(p_{0})\\{} & {} \quad =a_{p_{K}}\mathbb {E}\left[ F(x_{K})\right] -(b_{p_{K}}-b_{p_{0}})\left( 1-\min _{x\in \mathcal {C}}{\Vert x\Vert _{\infty }}\right) F(x^{*})-a_{p_{0}}\mathbb {E}\left[ F(x_{0})\right] \\{} & {} \quad \ge -\sum ^{K-1}_{k=0}\left( \frac{LD^{2}}{2K^{2}}+\frac{2D}{K}\frac{\sqrt{\bar{\Theta }}}{\sqrt{k+1}}\right) \\{} & {} \quad \ge -\frac{LR^{2}}{2K}-\frac{2D\sqrt{\bar{\Theta }}}{K}\cdot 2\sqrt{K}, \end{aligned}$$

where the last inequality comes from \(\sum ^{K-1}_{k=1}k^{-\frac{1}{2}}\le \int ^{K}_{0}x^{-\frac{1}{2}}dx=2K^{\frac{1}{2}}\). Rearranging the inequality, we attain

$$\begin{aligned}{} & {} \mathbb {E}\left[ F(x_{K})\right] \\{} & {} \quad \ge \frac{(b_{p_{K}}-b_{p_{0}})\left( 1-\min _{x\in \mathcal {C}}{\Vert x\Vert _{\infty }}\right) F(x^{*})-a_{p_{0}}\mathbb {E}\left[ F(x_{0})\right] -\frac{LD^{2}}{2K}-\frac{4D\sqrt{\bar{\Theta }}}{\sqrt{K}}}{a_{p_{K}}}\\{} & {} \quad =\frac{1}{4}\left( 1-\min _{x\in \mathcal {C}}{\Vert x\Vert _{\infty }}\right) F(x^{*})-\frac{LD^{2}}{8K}-\frac{D\sqrt{\bar{\Theta }}}{\sqrt{K}}, \end{aligned}$$

where the equality follows from \(b_{p_{K}}=1, b_{p_{0}}=0,a_{p_{K}}=4\) and the non-negativeness of F. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lian, Y., Du, D., Wang, X. et al. Stochastic Variance Reduction for DR-Submodular Maximization. Algorithmica 86, 1335–1364 (2024). https://doi.org/10.1007/s00453-023-01195-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-023-01195-z

Keywords

Navigation