Skip to main content
Log in

Proximal Gradient Methods for General Smooth Graph Total Variation Model in Unsupervised Learning

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

Graph total variation methods have been proved to be powerful tools for unstructured data classification. The existing algorithms, such as MBO (short for Merriman, Bence, and Osher) algorithm, can solve such problems very efficiently with the help of Nyström approximation. However, the strictly theoretical convergence is still unclear due to such approximation. In this paper, we aim at designing a fast operator-splitting algorithm with a low memory footprint and strict convergence guarantee for two-phase unsupervised classification. We first present a general smooth graph total variation model, which mainly consists of four terms, including the Lipschitz-differential regularization term, general double-well potential term, balanced term, and the boundedness constraint. Then the proximal gradient methods without and with acceleration are designed with low computation cost, due to the closed form solution related to proximal operators. The convergence analysis is further investigated under quite mild conditions. We conduct numerical experiments in order to evaluate the performance and convergence of proposed algorithms, on two different data sets including the synthetic two-moons and the MNIST. Namely, the results demonstrate the convergence and robustness of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

Notes

  1. More details including properties and examples about KL function can be found in Sect. 2.4 and the appendix of [1]

  2. “MNIST” data set can be obtained from http://yann.lecun.com/exdb/mnist/.

References

  1. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  2. Balashov, M.V.: The gradient projection algorithm for smooth sets and functions in nonconvex case. Set-Valued Var. Anal. 29, 341–360 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bertozzi, A.L., Flenner, A.: Diffuse interface models on graphs for classification of high dimensional data. Multiscale Model. Simul. 10(3), 1090–1118 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bosch, J., Klamt, S., Stoll, M.: Generalizing diffuse interface methods on graphs: nonsmooth potentials and hypergraphs. SIAM J. Appl. Math. 78(3), 1350–1377 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  7. Boyd, Z.M., Bae, E., Tai, X., Bertozzi, A.L.: Simplified energy landscape for modularity using total variation. Siam J. Appl. Math. 78(5), 2439–2464 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  8. Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20(2), 172–188 (2008)

    Article  MATH  Google Scholar 

  9. Bühler, T., Hein, M.: Spectral clustering based on the graph p-laplacian. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 81–88. Association for Computing Machinery, New York, NY, USA (2009)

  10. Chang, H., Glowinski, R., Marchesini, S., Tai, X.C., Wang, Y., Zeng, T.: Overlapping domain decomposition methods for ptychographic imaging. SIAM J. Sci. Comput. 43(3), B570–B597 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  11. Chang, H., Marchesini, S.: A general framework for denoising phaseless diffraction measurements. CoRR arXiv:1611.01417 (2016)

  12. Chung, F.R.K.: Spectral graph theory. In: CBMS Regional Conference Series in Mathematics (1997)

  13. Dong, B.: Sparse representation on graphs by tight wavelet frames and applications. Appl. Comput. Harmon. Anal. 42(3), 452–479 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  14. Elmoataz, A., Lezoray, O., Bougleux, S.: Nonlocal discrete regularization on weighted graphs: A framework for image and manifold processing. IEEE Trans. Image Process. 17(7), 1047–1060 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  15. Feng, S., Huang, W., Song, L., Ying, S., Zeng, T.: Proximal gradient method for nonconvex and nonsmooth optimization on hadamard manifolds. Optim. Lett. 6, 1862–4480 (2021)

    Google Scholar 

  16. Gennip, Y., Bertozzi, A.L.: \(\gamma \)-convergence of graph ginzburg-landau functionals. Adv. Differ. Equ. 17(11), 1115–1180 (2012)

    MathSciNet  MATH  Google Scholar 

  17. Glowinski, R., Osher, S.J., Yin, W.: Splitting Methods in Communication, Imaging, Science, and Engineering. Springer, Cham (2016)

    Book  MATH  Google Scholar 

  18. Glowinski, R., Pan, T.W., Tai, X.C.: Some Facts About Operator-Splitting and Alternating Direction Methods, pp. 19–94 (2016)

  19. Goldstein, T., Studer, C., Baraniuk, R.G.: A field guide to forward-backward splitting with a FASTA implementation. CoRR arXiv:1411.3406 (2014)

  20. Hu, H., Laurent, T., Porter, M.A., Bertozzi, A.L.: A method based on total variation for network modularity optimization using the mbo scheme. SIAM J. Appl. Math. 73(6), 2224–2246 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  21. Huang, Y., Shen, Z., Cai, F., Li, T., Lv, F.: Adaptive graph-based generalized regression model for unsupervised feature selection. Knowl.-Based Syst. 227, 107156 (2021)

    Article  Google Scholar 

  22. Jia, F., Tai, X.C., Liu, J.: Nonlocal regularized cnn for image segmentation. Inverse Probl. Imaging 14(5), 891–911 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  23. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2323 (1998)

    Article  Google Scholar 

  24. Li, F., Ng, M.K.: Image colorization by using graph bi-laplacian. Adv. Comput. Math. 45(3), 1521–1549 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  25. Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 379–387. Curran Associates, Inc. (2015)

  26. Li, J., Zhao, J., Wang, Q.: Energy and entropy preserving numerical approximations of thermodynamically consistent crystal growth models. J. Comput. Phys. 382, 202–220 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  27. Liu, J., Zheng, X.: A block nonlocal tv method for image restoration. SIAM J. Imaging Sci. 10(2), 920–941 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  28. Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  29. Merkurjev, E., Kosti, T., Bertozzi, A.L.: An mbo scheme on graphs for classification and image processing. SIAM J. Imaging Sci. 6(4), 1903–1930 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  30. Merriman, B., Bence, J.K., Osher, S.J.: Diffusion-generated motion by mean curvature for filaments. In: J. Taylor (ed.) Proceedings of the Computational Crystal Growers Workshop, pp. 73–83. AMS (1992)

  31. Muehlebach, M., Jordan, M.: A dynamical systems perspective on nesterov acceleration. In: K. Chaudhuri, R. Salakhutdinov (eds.) Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 4656–4662. PMLR (2019)

  32. Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(o(1/k^2)\). Proc. USSR Academy Sci. 269, 543–547 (1983)

    Google Scholar 

  33. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS’01, pp. 849-856. MIT Press, Cambridge (2001)

  34. Odonoghue, B., Candes, E.J.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  35. Peressini, A.L., Sullivan, F.E., Uhl, J.J.: The Mathematics of Nonlinear Programming. Springer, New York (1988)

    Book  MATH  Google Scholar 

  36. Qin, J., Lee, H., Chi, J.T., Drumetz, L., Chanussot, J., Lou, Y., Bertozzi, A.L.: Blind hyperspectral unmixing based on graph total variation regularization. IEEE Trans. Geosci. Remote Sensing 59(4), 3338–3351 (2021)

    Article  Google Scholar 

  37. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 60(1–4), 259–268 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  38. Shang, R., Wang, L., Shang, F., Jiao, L., Li, Y.: Dual space latent representation learning for unsupervised feature selection. Pattern Recognit. 114, 107873 (2021)

    Article  Google Scholar 

  39. Shang, R., Zhang, X., Feng, J., Li, Y., Jiao, L.: Sparse and low-dimensional representation with maximum entropy adaptive graph for feature selection. Neurocomputing 485, 57–73 (2022)

    Article  Google Scholar 

  40. Shen, J., Xu, J., Yang, J.: A new class of efficient and robust energy stable schemes for gradient flows. SIAM Rev. 61(3), 474–506 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  41. Szlam, A., Bresson, X.: Total variation, cheeger cuts. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1039–1046 (2010)

  42. Tang, C., Bian, M., Liu, X., Li, M., Zhou, H., Wang, P., Yin, H.: Unsupervised feature selection via latent representation learning and manifold regularization. Neural Netw. 117, 163–178 (2019)

    Article  Google Scholar 

  43. Wen, B., Chen, X., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. Siam J. Optim. 27(1), 124–145 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  44. Wu, T., Li, W., Jia, S., Dong, Y., Zeng, T.: Deep multi-level wavelet-cnn denoiser prior for restoring blurred image with cauchy noise. IEEE Signal Process. Lett. 27, 1635–1639 (2020). https://doi.org/10.1109/LSP.2020.3023299

    Article  Google Scholar 

  45. Yang, X.F., Zhao, J., Wang, Q.: Numerical approximations for the molecular beam epitaxial growth model based on the invariant energy quadratization method. J. Comput. Phys. 333, 104–127 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  46. Yao, Q., Kwok, J.T., Gao, F., Chen, W., Liu, T.: Efficient inexact proximal gradient algorithm for nonconvex problems. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 3308–3314 (2017)

  47. Yin, K., Tai, X.C.: An effective region force for some variational models for learning and clustering. J. Sci. Comput. 74, 1–22 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  48. Zelnikmanor, L., Perona, P.: Self-tuning spectral clustering, pp. 1601–1608 (2004)

  49. Zhou, D., Schölkopf, B.: Regularization on discrete spaces. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) Pattern Recognition, pp. 361–368. Springer, Berlin (2005)

    Chapter  Google Scholar 

  50. Zhu, W., Chayes, V., Tiard, A., Sanchez, S., Dahlberg, D., Bertozzi, A.L., Osher, S., Zosso, D., Kuang, D.: Unsupervised classification in hyperspectral imagery with nonlocal total variation and primal-dual hybrid gradient algorithm. IEEE Trans. Geosci. Remote Sensing 55(5), 2786–2798 (2017)

    Article  Google Scholar 

Download references

Funding

This work was partially supported by the National Natural Science Foundation of China under Award 11871372, 11501413, and Natural Science Foundation of Tianjin under Award 18JCYBJC16600. BS recognizes support from the Postgraduate Innovation Research Project of Tianjin under award 2020YJSS141.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huibin Chang.

Ethics declarations

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, B., Chang, H. Proximal Gradient Methods for General Smooth Graph Total Variation Model in Unsupervised Learning. J Sci Comput 93, 2 (2022). https://doi.org/10.1007/s10915-022-01954-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-022-01954-0

Keywords

Navigation