RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent

Li, Chi-Chang; Huang, Jay; Hon, Wing-Kai; Lee, Che-Rung

doi:10.1007/978-981-97-2253-2_20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14646))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

183 Accesses

Abstract

The perturbed gradient descent (PGD) method, which adds random noises in the search directions, has been widely used in solving large-scale optimization problems, owing to its capability to escape from saddle points. However, it is inefficient sometimes for two reasons. First, the random noises may not point to a descent direction, so PGD may still stagnate around saddle points. Second, the size of random noises, which is controlled by the radius of the perturbation ball, may not be properly configured, so the convergence is slow. In this paper, we proposed a method, called RPH-PGD (Randomly Projected Hessian for Perturbed Gradient Descent), to improve the performance of PGD. The randomly projected Hessian (RPH) is created by projecting the Hessian matrix into a relatively small subspace which contains rich information about the eigenvectors of the original Hessian matrix. RPH-PGD utilizes the eigenvalues and eigenvectors of the randomly projected Hessian to identify the negative curvatures and uses the matrix itself to estimate the changes of Hessian matrices, which is necessary information for dynamically adjusting the radius during the computation. In addition, RPH-PGD employs the finite difference method to approximate the product of the Hessian and vectors, instead of constructing the Hessian explicitly. The amortized analysis shows the time complexity of RPH-PGD is only slightly higher than that of PGD. The experimental results show RPH-PGD does not only converge faster than PGD, but also converges in cases that PGD cannot.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima faster than gradient descent. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199. STOC 2017, Association for Computing Machinery, New York, NY, USA (2017)
Google Scholar
Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than SGD. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
Google Scholar
Allen-Zhu, Z., Li, Y.: NEON2: finding local minima via first-order Oracles. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
Google Scholar
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)
Article MathSciNet Google Scholar
Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34(1), A206–A239 (2012)
Article MathSciNet Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(61), 2121–2159 (2011)
MathSciNet Google Scholar
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points: online stochastic gradient for tensor decomposition. J. Mach. Learn. Res. 40(2015)
Google Scholar
Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Article MathSciNet Google Scholar
Jia, Z., Stewart, G.W.: An analysis of the Rayleigh-Ritz method for approximating Eigenspaces. Math. Comput. 70, 637–647 (2001)
Article MathSciNet Google Scholar
Jin, C., Netrapalli, P., Ge, R., Kakade, S.M., Jordan, M.I.: On nonconvex optimization for machine learning: gradients, stochasticity, and saddle points. J. ACM 68(2), 1–29 (2021). https://doi.org/10.1145/3418526
Article MathSciNet Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). https://doi.org/10.48550/ARXIV.1412.6980, https://arxiv.org/abs/1412.6980
Levy, K.Y.: The power of normalization: faster evasion of saddle points. CoRR abs/1611.04831 (2016). http://arxiv.org/abs/1611.04831
Nocedal, J., Wright, S.J.: Numerical Optimization, 2e edn. Springer, New York (2006). https://doi.org/10.1007/978-0-387-40065-5
Book Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586
Article MathSciNet Google Scholar
Zhang, C., Li, T.: Escape saddle points by a simple gradient-descent based algorithm. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8545–8556. Curran Associates, Inc. (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Chi-Chang Li, Jay Huang, Wing-Kai Hon & Che-Rung Lee

Authors

Chi-Chang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jay Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wing-Kai Hon
View author publications
You can also search for this author in PubMed Google Scholar
Che-Rung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Che-Rung Lee .

Editor information

Editors and Affiliations

Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, CC., Huang, J., Hon, WK., Lee, CR. (2024). RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_20

Download citation

DOI: https://doi.org/10.1007/978-981-97-2253-2_20
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2252-5
Online ISBN: 978-981-97-2253-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent