Abstract:
Preconditioning is a technique widely used to accelerate the convergence of optimization algorithms. Recently proposed efficient second-order algorithms (such as KFAC) sh...Show MoreMetadata
Abstract:
Preconditioning is a technique widely used to accelerate the convergence of optimization algorithms. Recently proposed efficient second-order algorithms (such as KFAC) showed that preconditioning the gradient using the curvature information of loss function can help achieve faster convergence. However, their practicality in large-scale deep learning is still limited due to the high computational and storage cost. In this work, we propose a stochastic adaptive gradient algorithm, called Mini-Block Adaptive Gradient (MBAG), that addresses those computational challenges in computing the preconditioning matrix. To reduce the per-iteration cost, MBAG analytically computes the inverse of preconditioning matrix using the matrix inversion lemma and then approximately finds its square root using an iterative solver. Further, to mitigate the storage requirement, MBAG partitions model parameters into subsets of small size and only computes sub-blocks of preconditioner associated with each subset of parameters. This greatly improves the scalability of the proposed algorithm. The performance of MBAG is compared to that of popular first- and second-order algorithms on auto-encoder and classification tasks using real datasets.
Date of Conference: 17-20 December 2022
Date Added to IEEE Xplore: 26 January 2023
ISBN Information: