skip to main content
10.1145/3460268.3460270acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaieeConference Proceedingsconference-collections
research-article

The Impact of Learning Rate Decay and Periodical Learning Rate Restart on Artificial Neural Network

Published: 30 July 2021 Publication History

Abstract

There is no denying that learning rate is one of the most important hyper-parameter for model training. In this paper, two typical strategies, namely learning rate decay and periodical learning rate restart are tested in artificial neural networks (ANN) and compared with the fixed learning rate. Experiments demonstrate that learning rate adjustment strategies surpass fixed learning rate in model training, including fast convergence, high validation accuracy and low training loss. Besides, periodical learning rate restart strategy tends to take fewer epochs than learning rate decay to get the same accuracy. Thus, increasing the learning rate appropriately will better fit the model and achieve excellent performance.

References

[1]
Gongpeng W, Meng D, Changyong N. Stochastic gradient descent algorithm based on convolution neural network. Computer Engineering and Design. 2018; 39(2): 441-445+462.
[2]
Sebastian R. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2017.
[3]
Hafidz Z. First off, what is a learning rate? towards data science. https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10. Published Jan 22, 2018. Accessed Aug 24, 2020.
[4]
Vitaly B. Introduction. techburst.io. https://techburst.io/improving-the-way-we-work-with-learning-rate-5e99554f163b. Published Nov 17, 2017. Accessed Aug 24, 2020.
[5]
Leslie N. S. Cyclical Learning Rates for Training Neural Networks. arXiv preprint arXiv:1506.01186, 2017.
[6]
Ilya L, Frank H. SGDR: STOCHASTIC GRADIENT DESCENT WITH WARM RESTARTS. arXiv preprint arXiv:1608.03983, 2017.
[7]
Aitor L, Yasaman B, Ethan D, Jascha SD, Guy GA. The large learning rate phase of deep learning: the catapult mechanism. arXiv preprint arXiv:2003.02218, 2020.
[8]
Paulo C, Antonio C, Fernando A, Telmo M, Jose R. Modeling wine preferences by data mining from physicochemical properties. Decision Support Syst. 2009; 47(4): 547-553.
[9]
Yuxu F, Yumei L. An Overview of Deep Learning Optimization Methods and Learning Rate Attenuation Methods. Hans Journal of Data Mining. 2018; 8(4): 186-200.
[10]
Kenstler, B. Cyclical Learning Rate (CLR). github. https://github.com/bckenstler/CLR. Published Mar 24, 2017. Accessed Aug 19, 2020.
[11]
Ian G, Yoshua B, Aaron C. Optimization for Training Deep Models. Deep Learning. Cambridge, Massachusetts: MIT Press; 2016. http://www.deeplearningbook.org. Accessed Aug 24, 2020.
[12]
Dominic M, Carlo L. REVISITING SMALL BATCH TRAINING FOR DEEP NUERAL NETWORKS. arXiv preprint arXiv:1804.07612, 2018.
[13]
Dua, D, Graff, C. University of California, School of Information and Computer Science. archive.ics.uci.edu. http://archive.ics.uci.edu/ml. Accessed Aug 19, 2020.
[14]
Babysitting the learning process. cs231n.github.io.https://cs231n.github.io/neural-networks-3/#baby. Accessed Aug 24, 2020.
[15]
Wenbin J, Jing P, Geyan Y. Research on adaptive learning rate algorithm in deep learning. Journal of Huazhong University of Science and Technology (Nature Science Edition). 2019; 47(5): 79-83.
[16]
Leslie N.S, Nicholay T. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates. arXiv preprint arXiv:1708.07120, 2018.
[17]
Jia W, Senpeng C, Xiuyun C, Rui Z. Reinforcement Learning for Model Selection and Hyperparameter Optimization. Journal of University of Electronic Science and Technology of China. 2020; 49(2): 255-261.
[18]
Zakaria M, Wouter M.K, Tim van E. Lipschitz Adaptivity with Multiple Learning Rates in Online Learning. arXiv preprint arXiv:1902.10797, 2019.
[19]
Tomer L, Selcuk K. Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent. arXiv preprint arXiv:1908.07607, 2019.
[20]
Nitish SK, Dheevatsa M, Jorge N, Mikhail S, Ping TPT. ON LARGE-BATCH TRAINING FOR DEEP LEARNING GENERALIZATION GAP AND SHARP MINIMA. arXiv preprint arXiv:1609.04836, 2017.
[21]
Anand S. The Cyclical Learning Rate technique. teleported.in. http://teleported.in/posts/cyclic-learning-rate. Published Nov 12, 2017. Accessed Aug 24, 2020.
[22]
SGDRScheduler (Callback). github. https://github.com/keras-team/keras/pull/3525/files#diff-50c97c4dfd2a0d293beded45a6b7b5e885 1aa5e69e424da757e329c71abac65d. Accessed Aug 24, 2020.

Cited By

View all
  • (2024)An adaptive cyclical learning rate based hybrid model for Dravidian fake news detectionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122768241:COnline publication date: 25-Jun-2024
  • (2023)Edge-based Battery Remaining Useful Life Estimation Using Deep Learning2023 International Conference on Smart Applications, Communications and Networking (SmartNets)10.1109/SmartNets58706.2023.10215733(1-6)Online publication date: 25-Jul-2023
  • (2023)Pathological Lung Segmentation in Chest X-Ray Images using Modified U-Net-Based Architecture2023 IEEE 20th India Council International Conference (INDICON)10.1109/INDICON59947.2023.10440904(452-456)Online publication date: 14-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIEE '21: Proceedings of the 2021 2nd International Conference on Artificial Intelligence in Electronics Engineering
January 2021
102 pages
ISBN:9781450389273
DOI:10.1145/3460268
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ANN
  2. Cyclical learning rate
  3. Learning rate adjustment
  4. Learning rate decay
  5. Stochastic gradient descent with warm restarts

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIEE 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)11
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An adaptive cyclical learning rate based hybrid model for Dravidian fake news detectionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122768241:COnline publication date: 25-Jun-2024
  • (2023)Edge-based Battery Remaining Useful Life Estimation Using Deep Learning2023 International Conference on Smart Applications, Communications and Networking (SmartNets)10.1109/SmartNets58706.2023.10215733(1-6)Online publication date: 25-Jul-2023
  • (2023)Pathological Lung Segmentation in Chest X-Ray Images using Modified U-Net-Based Architecture2023 IEEE 20th India Council International Conference (INDICON)10.1109/INDICON59947.2023.10440904(452-456)Online publication date: 14-Dec-2023
  • (2023)Edge-assisted Attention-based Federated Learning for Multi-Step EVSE-enabled Prosumer Energy Demand Prediction2023 International Conference on Information Networking (ICOIN)10.1109/ICOIN56518.2023.10048987(116-121)Online publication date: 11-Jan-2023
  • (2023)AMPP: An Adaptive Multilayer Perceptron Prefetcher for Irregular Data Prefetching2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00059(377-384)Online publication date: 17-Dec-2023
  • (2023)A novel model compression method based on joint distillation for deepfake video detectionJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10179235:9(101792)Online publication date: Oct-2023
  • (2023)Monitoring and simulating landscape changes: how do long-term changes in land use and long-term average climate affect regional biophysical conditions in southern Malawi?Environmental Monitoring and Assessment10.1007/s10661-023-11783-9195:10Online publication date: 26-Sep-2023
  • (2022)Clustering-Based Serverless Edge Computing Assisted Federated Learning for Energy Procurement2022 23rd Asia-Pacific Network Operations and Management Symposium (APNOMS)10.23919/APNOMS56106.2022.9919944(01-06)Online publication date: 28-Sep-2022
  • (2022)Energy Management Strategy of a Reconfigurable Grid-Tied Hybrid AC/DC Microgrid for Commercial Building ApplicationsIEEE Transactions on Smart Grid10.1109/TSG.2022.314145913:3(1720-1738)Online publication date: May-2022
  • (2022)Autonomous Outdoor Building Navigation Using a Single Monocular Camera2022 3rd International Conference on Artificial Intelligence, Robotics and Control (AIRC)10.1109/AIRC56195.2022.9836985(107-113)Online publication date: 10-May-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media