Abstract
With the development of artificial intelligence technology, optimizing the performance of deep neural network model has become a hot issue in the field of scientific research. Learning rate is one of the most important hyper-parameters for model optimization. In recent years, some learning rate algorithms with cycle mechanism have been proposed. Most of them adopt warm restart and cycle mechanism to make the learning rate value cyclically change between two boundary values and prove their effectiveness by practicing in image classification task. In order to further improve the performance of neural network model and prove the effectiveness in different training task, the paper proposes a novel learning rate schedule called hyperbolic tangent polynomial parity cyclic learning rate (HTPPC), which adopts cycle mechanism and combines the advantages of warm restart and polynomial decay. In addition, the performance of HTPPC is demonstrated on image classification and object detection tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
An, W., Wang, H., Zhang, Y., Dai, Q.: Exponential decay sine wave learning rate for fast deep neural network training. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017)
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. Arxiv (2012)
Dauphin, Y., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: NIPS, vol. 27 (2014)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)
Feng, Y., Li, Y.: An overview of deep learning optimization methods and learning rate attenuation methods. Hans J. Data Mining 8(3), 186–200 (2018)
Fu, Q., et al.: Improving learning algorithm performance for spiking neural networks. In: 2017 IEEE 17th International Conference on Communication Technology (ICCT), pp. 1916–1919 (2017)
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points – online stochastic gradient for tensor decomposition. Mathematics (2015)
Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. Arxiv (2017)
Hao, X., Zhang, G., Ma, S.: Deep learning. Int. J. Semant. Comput. 10(03), 417–439 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, pp. 770–778. IEEE (2016)
Hill, L., et al.: Geometric mean average precision, p. 1239. Springer, Heidelberg (2009)
Jiao, J., Zhang, X., Li, F., Wang, Y.: A novel learning rate function and its application on the SVD++ recommendation algorithm. IEEE Access 8, 14112–14122 (2019)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Li, J., Yang, X.: A cyclical learning rate method in deep learning training. In: 2020 International Conference on Computer, Information and Telecommunication Systems (CITS), pp. 1–5 (2020)
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: ICLR 2017 (5th International Conference on Learning Representations) (2016)
Luo, L., Xiong, Y., Liu, Y., Sun, X.: Adaptive gradient methods with dynamic bound of learning rate. In: International Conference on Learning Representations (ICLR) (2019)
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: European Conference on Computer Vision (2018)
Mehta, J.: In Search of Deeper Learning: The Quest to Remake the American High School (2019)
Mishra, P., Sarawadekar, K.: Polynomial learning rate policy with warm restart for deep neural network. In: TENCON 2019 (2019)
Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv e-prints (2018)
Robbins, H., Monro, S.: A Stochastic Approximation Method. Herbert Robbins Selected Papers (1985)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Smith, L.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (2017)
Szegedy, C., et al.: Going deeper with convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Vicente, S., Carreira, J., Agapito, L., Batista, J.: Reconstructing pascal VOC. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 41–48 (2014)
Wang, Y., Zhou, P., Zhong, W.: An optimization strategy based on hybrid algorithm of adam and SGD. In: MATEC Web of Conferences, vol. 232 (2018)
Xiaohu, Y., Chen, G.A., Cheng, S.X.: Dynamic learning rate optimization of the backpropagation algorithm. IEEE Trans. Neural Netw. 6(03), 669–677 (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, H., Yang, X., Wu, B., Xiong, R. (2021). Hyperbolic Tangent Polynomial Parity Cyclic Learning Rate for Deep Neural Network. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13032. Springer, Cham. https://doi.org/10.1007/978-3-030-89363-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-89363-7_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89362-0
Online ISBN: 978-3-030-89363-7
eBook Packages: Computer ScienceComputer Science (R0)