Skip to main content

RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2024)

Abstract

The perturbed gradient descent (PGD) method, which adds random noises in the search directions, has been widely used in solving large-scale optimization problems, owing to its capability to escape from saddle points. However, it is inefficient sometimes for two reasons. First, the random noises may not point to a descent direction, so PGD may still stagnate around saddle points. Second, the size of random noises, which is controlled by the radius of the perturbation ball, may not be properly configured, so the convergence is slow. In this paper, we proposed a method, called RPH-PGD (Randomly Projected Hessian for Perturbed Gradient Descent), to improve the performance of PGD. The randomly projected Hessian (RPH) is created by projecting the Hessian matrix into a relatively small subspace which contains rich information about the eigenvectors of the original Hessian matrix. RPH-PGD utilizes the eigenvalues and eigenvectors of the randomly projected Hessian to identify the negative curvatures and uses the matrix itself to estimate the changes of Hessian matrices, which is necessary information for dynamically adjusting the radius during the computation. In addition, RPH-PGD employs the finite difference method to approximate the product of the Hessian and vectors, instead of constructing the Hessian explicitly. The amortized analysis shows the time complexity of RPH-PGD is only slightly higher than that of PGD. The experimental results show RPH-PGD does not only converge faster than PGD, but also converges in cases that PGD cannot.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima faster than gradient descent. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199. STOC 2017, Association for Computing Machinery, New York, NY, USA (2017)

    Google Scholar 

  2. Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than SGD. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)

    Google Scholar 

  3. Allen-Zhu, Z., Li, Y.: NEON2: finding local minima via first-order Oracles. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)

    Google Scholar 

  4. Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)

    Article  MathSciNet  Google Scholar 

  5. Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34(1), A206–A239 (2012)

    Article  MathSciNet  Google Scholar 

  6. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(61), 2121–2159 (2011)

    MathSciNet  Google Scholar 

  7. Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points: online stochastic gradient for tensor decomposition. J. Mach. Learn. Res. 40(2015)

    Google Scholar 

  8. Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)

    Article  MathSciNet  Google Scholar 

  9. Jia, Z., Stewart, G.W.: An analysis of the Rayleigh-Ritz method for approximating Eigenspaces. Math. Comput. 70, 637–647 (2001)

    Article  MathSciNet  Google Scholar 

  10. Jin, C., Netrapalli, P., Ge, R., Kakade, S.M., Jordan, M.I.: On nonconvex optimization for machine learning: gradients, stochasticity, and saddle points. J. ACM 68(2), 1–29 (2021). https://doi.org/10.1145/3418526

    Article  MathSciNet  Google Scholar 

  11. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). https://doi.org/10.48550/ARXIV.1412.6980, https://arxiv.org/abs/1412.6980

  12. Levy, K.Y.: The power of normalization: faster evasion of saddle points. CoRR abs/1611.04831 (2016). http://arxiv.org/abs/1611.04831

  13. Nocedal, J., Wright, S.J.: Numerical Optimization, 2e edn. Springer, New York (2006). https://doi.org/10.1007/978-0-387-40065-5

    Book  Google Scholar 

  14. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586

    Article  MathSciNet  Google Scholar 

  15. Zhang, C., Li, T.: Escape saddle points by a simple gradient-descent based algorithm. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8545–8556. Curran Associates, Inc. (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Che-Rung Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, CC., Huang, J., Hon, WK., Lee, CR. (2024). RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2253-2_20

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2252-5

  • Online ISBN: 978-981-97-2253-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics