Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler

Vidyabharathi, D; Mohanraj, V; Kumar, J Senthil; Suresh, Y

doi:10.1007/s00779-021-01587-4

Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler

Original Article
Published: 16 August 2021

Volume 27, pages 1335–1353, (2023)
Cite this article

Personal and Ubiquitous Computing Aims and scope Submit manuscript

D Vidyabharathi¹,
V Mohanraj²,
J Senthil Kumar² &
…
Y Suresh²

267 Accesses
4 Citations
Explore all metrics

Abstract

Deep neural network training involves multiplfe hyperparameters which have an impact on the prediction or classification accuracy of the model. Among all the hyperparameters, learning rate plays a key role in training the network to achieve results effectively. Several researchers have attempted to design a learning rate scheduler to find an optimal learning rate. In this paper, performance of the existing state-of-the-art learning rate schedulers, viz. HTD (hyperbolic tangent decay) and CLR (cyclical learning rate) schedulers by using them with LSTM (long short-term memory) and BiLSTM (bidirectional long short-term memory) architectures is investigated. The existing learning rate schedulers have not achieved the best prediction accuracy when it is tested on three benchmark datasets such as 20Newsgroup, Reuters Newswire, and IMDB. To address the issue, HTR (toggle between hyperbolic tangent decay and triangular mode with restarts) learning rate scheduler is proposed and examined in this research. The proposed scheduler flips the learning rate between the epochs. When training is progressed through each epoch, a new learning rate is calculated based on the difference between the gradient values of the previous two iterations. Apart from the learning rate, the step width value also has certain impact in the accuracy of the model. When the step width value is set to minimum, the accuracy is enhanced. Furthermore, the network convergence takes place in minimum iterations with better accuracy. The overall performance of the proposed system is comparatively improved and shown in our experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive mechanism to achieve learning rate dynamically

Article 26 April 2018

Adaptive hierarchical hyper-gradient descent

Article Open access 13 August 2022

AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization

Article Open access 04 January 2023

References

Xie Y, Le L, Zhou Y, Raghavan VV Deep learning for natural language processing. Analytics and Data Science Institute, Kennesaw
Jiakai Wei, Forget the learning rate, decay loss, arXiv:1905.00094v1 [cs.LG] for this version, Hunan University of Technology, China (e-mail:16408400236@stu.hut.edu.cn).
Bo-Yang Hsueh, Wei Li, I-Chen Wu, Stochastic gradient descent with hyperbolic-tangent decay on classification, icwu@cs.nctu.edu.tw, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), ISSN: 1550-5790, https://doi.org/10.1109/WACV.2019.00052, 07 March 2019
Leslie N. Smith ,Cyclical learning rates for training neural networks, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Electronic ISBN:978-1-5090-4822-9,Print on Demand(PoD) ISBN:978-1-5090-4823-6 ,15 May 2017 https://doi.org/10.1109/WACV.2017.58
Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao, Recurrent convolutional neural networks for text classification, National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences, China,{swlai, lhxu, kliu, jzhao}@nlpr.ia.ac.cn.
Yong Yu, Xiaosheng Si, Changhua Hu, and Jianxun Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Computation, Volume 31, No. 7, July 2019
Zhou Q, Zhang Z, Wu H NLP at IEST 2018: BiLSTM-attention and LSTM-attention via soft voting in emotion classification. In: Proceedings of the 9^th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp 189–194
Smith LN (2018) A disciplined approach to neural network hyper-parameters: learning rate, batch size, momentum, and weight decay. In: US Naval Research Laboratory Technical Report, 5510-026,24th April
Jason Brownlee,2020, Hyperparameter optimization with random search and grid search. https://machinelearningmastery.com/hyperparameter-optimization-with-random-search-and-grid-search/
Johnson F, Valderrama A, Valle C, Crawford B, Soto R, and Ñanculef R, (2020 (Member, IEEE), ‘Automating configuration of convolutional neural network hyperparameters using genetic algorithm’, IEEE ACCESS, https://doi.org/10.1109/ACCESS.2020.3019245.
Yunus Emre Midilli Sergejs Parsutins, ‘Optimization of deep learning hyperparameters with experimental design in exchange rate prediction’, Information Technology, Riga Technical University Riga, Latvia ,Yunus-Emre.Midilli@edu.rtu.lv , Sergejs.Parsutins@rtu.lv, 2020 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS: https://ieeexplore.ieee.org/xpl/conhome/9259223/proceeding), https://doi.org/10.1109/ITMS51158.2020.9259300, November 2020
A. Helen Victoria · G. Maragatham, ‘Automatic tuning of hyperparameters using Bayesian optimization’, Department of Information Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai,Tamil Nadu, India, helenvia@srmist.edu.in,maragatg@srmist.edu.in, Evolving Systems, 2020 - Springer.
Badriyah T, Santoso DB, Syarif I (2019) Deep learning algorithm for data classification with hyperparameter optimization method. IOP Conf Ser J Phys Conf Series 1193:012033 IOP Publishing. https://doi.org/10.1088/1742-6596/1193/1/012033
Article Google Scholar
Roman Liessner, Jakob Schmitt, Ansgar Dietermann and Bernard Baker, ‘Hyperparameter optimization for deep reinforcement learning invehicle energy management’, Dresden Institute of Automobile Engineering, TU Dresden, George-B¨ahr-Straße 1c, 01069 Dresden, Germany, In Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART 2019), pages 134-144,ISBN: 978-989-758-350-6 , https://doi.org/10.5220/0007364701340144
YoungJun Yoo, ‘Hyperparameter optimization of deep neural network using univariate dynamic encoding algorithm for searches’, Department of Electronic Engineering, Pohang University of Science and Technology (POSTECH), youdalj@postech.ac.kr.,San 31, Hyojadong, Namgu,Pohang, Gyungbuk, 790-784, Republic of Korea, Knowledge-Based Systems 178 (2019)74–83, https://doi.org/10.1016/j.knosys.2019.04.019
Kaichao You , Mingsheng Long (B), Jianmin Wang , Michael I. Jordan, ‘How does learning rate decay help modern neural networks?’,{ School of Software, Tsinghua University, youkaichao@gmail.com , mingsheng@tsinghua.edu.cn,jimwang@tsinghua.edu.cn }, Department of EECS University of California, Berkeley, jordan@cs.berkeley.edu, arXiv:1908.01878v2 [cs.LG] 26 Sep 2019
Michal Rolınek and Georg Martius, ‘L4: practical loss-based stepsize adaptation for deep learning’, Max-Planck-Institute for Intelligent Systems, T¨ubingen, Germany ,{michal.rolinek,georg.martius} @ tuebingen.mpg.de, arXiv:1802.05074v5 [cs.LG] 30 Nov 2018
Jinia Konar, Prerit Khandelwal, Rishabh Tripathi, ‘Comparison of various learning rate scheduling techniques on convolutional neural network’, C.S.E Department N.I.T Bhopal, Bhopal, India konar.riku@gmail.com , preritkhandalwal@gmail.com ,imristri@gmail.com, 2020 IEEE International Students’ Conference on Electrical, Electronics and Computer Science, 978-1-7281-4862-5/20/$31.00 ©2020 IEEE 10.1109/SCEECS48394.2020.94
Step size matters in deep learning, NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems 2018, pp 3440–3448. https://doi.org/10.5555/3327144.3327262
Conghui Tan, Shiqian Ma, Yu-Hong Dai, Yuqiu Qian, ‘Barzilai-Borwein step size for stochastic gradient descent’, May 16, 2016. arXiv:1605.04131v2 [math.OC], 30^th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain 23 May 2016
Ilya Loshchilov, Frank Hutter, ‘SGDR: stochastic gradient descent with warm restarts’, University of Freiburg, Freiburg, Germany, {ilya,fh}@cs.uni-freiburg.de,arXiv:1608.03983v5, International Conference on Learning Representations,Toulon, France, April 24 - 26, 2017,http://www.iclr.cc, May 2017
W. An, H. Wang, Y. Zhang, and Q. Dai, ‘Exponential decay sine wave learning rate for fast deep neural network training’, In Visual Communications and Image Processing (VCIP) ,2017 IEEE, pages 1–4. IEEE, 2017.
Liu Peiqian, Wang Xiaojie, Distance approach for open information extraction based on word vector, Vol. 12, No. 6, June 29, 2018, https://doi.org/10.3837/tiis.2018.06.003
Shi Y, Zhu L, Li W, Guo K, Zheng Y (2019) Survey on classic and latest textual sentiment analysis articles and techniques, vol 18(04). International Journal of Information Technology and Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd, pp 1243–1287
Arnav Chakravarthy, Prssanna Desai, Simran Deshmukh, Surbhi Gawande, Prof. Ishani Saha, Mukesh Patel, Hybrid architecture for sentiment analysis using deep learning, Volume 9, No. 1, January-February 2018, ISSN No. 0976-5697,735-738
Salloum SA, Al-Emran M, Monem AA, Shaalan K (2017) A survey of text mining in social media: Facebook and Twitter perspectives. IT Adv Sci Technol Eng Syst J 2(1):127–133
Tom M (1999) Twenty newsgroup dataset. https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups. Accessed 9 Sept 1999
Lewis DD (1987) AT & T Labs – Research, Reuters-21578 text categorization collection data set. https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection,1997-09-26. Accessed 16 Feb 1999.
Maas et al (2011), Stanford AI repository, IMDB movie reviews sentiment dataset. https://www.kaggle.com/jcblaise/imdb-sentiments. Accessed Jun 2011.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Sona College of Technology, Salem, India
D Vidyabharathi
Department of Information Technology, Sona College of Technology, Salem, India
V Mohanraj, J Senthil Kumar & Y Suresh

Authors

D Vidyabharathi
View author publications
You can also search for this author in PubMed Google Scholar
V Mohanraj
View author publications
You can also search for this author in PubMed Google Scholar
J Senthil Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Y Suresh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D Vidyabharathi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vidyabharathi, D., Mohanraj, V., Kumar, J.S. et al. Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler. Pers Ubiquit Comput 27, 1335–1353 (2023). https://doi.org/10.1007/s00779-021-01587-4

Download citation

Received: 09 October 2020
Accepted: 09 June 2021
Published: 16 August 2021
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00779-021-01587-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler

Abstract

Access this article

Similar content being viewed by others

An adaptive mechanism to achieve learning rate dynamically

Adaptive hierarchical hyper-gradient descent

AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler

Abstract

Access this article

Similar content being viewed by others

An adaptive mechanism to achieve learning rate dynamically

Adaptive hierarchical hyper-gradient descent

AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation