Skip to main content
Log in

Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Deep neural network training involves multiplfe hyperparameters which have an impact on the prediction or classification accuracy of the model. Among all the hyperparameters, learning rate plays a key role in training the network to achieve results effectively. Several researchers have attempted to design a learning rate scheduler to find an optimal learning rate. In this paper, performance of the existing state-of-the-art learning rate schedulers, viz. HTD (hyperbolic tangent decay) and CLR (cyclical learning rate) schedulers by using them with LSTM (long short-term memory) and BiLSTM (bidirectional long short-term memory) architectures is investigated. The existing learning rate schedulers have not achieved the best prediction accuracy when it is tested on three benchmark datasets such as 20Newsgroup, Reuters Newswire, and IMDB. To address the issue, HTR (toggle between hyperbolic tangent decay and triangular mode with restarts) learning rate scheduler is proposed and examined in this research. The proposed scheduler flips the learning rate between the epochs. When training is progressed through each epoch, a new learning rate is calculated based on the difference between the gradient values of the previous two iterations. Apart from the learning rate, the step width value also has certain impact in the accuracy of the model. When the step width value is set to minimum, the accuracy is enhanced. Furthermore, the network convergence takes place in minimum iterations with better accuracy. The overall performance of the proposed system is comparatively improved and shown in our experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Xie Y, Le L, Zhou Y, Raghavan VV Deep learning for natural language processing. Analytics and Data Science Institute, Kennesaw

  2. Jiakai Wei, Forget the learning rate, decay loss, arXiv:1905.00094v1 [cs.LG] for this version, Hunan University of Technology, China (e-mail:16408400236@stu.hut.edu.cn).

  3. Bo-Yang Hsueh, Wei Li, I-Chen Wu, Stochastic gradient descent with hyperbolic-tangent decay on classification, icwu@cs.nctu.edu.tw, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), ISSN: 1550-5790, https://doi.org/10.1109/WACV.2019.00052, 07 March 2019

  4. Leslie N. Smith ,Cyclical learning rates for training neural networks, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Electronic ISBN:978-1-5090-4822-9,Print on Demand(PoD) ISBN:978-1-5090-4823-6 ,15 May 2017 https://doi.org/10.1109/WACV.2017.58

  5. Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao, Recurrent convolutional neural networks for text classification, National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences, China,{swlai, lhxu, kliu, jzhao}@nlpr.ia.ac.cn.

  6. Yong Yu, Xiaosheng Si, Changhua Hu, and Jianxun Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Computation, Volume 31, No. 7, July 2019

  7. Zhou Q, Zhang Z, Wu H NLP at IEST 2018: BiLSTM-attention and LSTM-attention via soft voting in emotion classification. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp 189–194

  8. Smith LN (2018) A disciplined approach to neural network hyper-parameters: learning rate, batch size, momentum, and weight decay. In: US Naval Research Laboratory Technical Report, 5510-026,24th April

  9. Jason Brownlee,2020, Hyperparameter optimization with random search and grid search. https://machinelearningmastery.com/hyperparameter-optimization-with-random-search-and-grid-search/

  10. Johnson F, Valderrama A, Valle C, Crawford B, Soto R, and Ñanculef R, (2020 (Member, IEEE), ‘Automating configuration of convolutional neural network hyperparameters using genetic algorithm’, IEEE ACCESS, https://doi.org/10.1109/ACCESS.2020.3019245.

  11. Yunus Emre Midilli Sergejs Parsutins, ‘Optimization of deep learning hyperparameters with experimental design in exchange rate prediction’, Information Technology, Riga Technical University Riga, Latvia ,Yunus-Emre.Midilli@edu.rtu.lv , Sergejs.Parsutins@rtu.lv, 2020 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS: https://ieeexplore.ieee.org/xpl/conhome/9259223/proceeding), https://doi.org/10.1109/ITMS51158.2020.9259300, November 2020

  12. A. Helen Victoria · G. Maragatham, ‘Automatic tuning of hyperparameters using Bayesian optimization’, Department of Information Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai,Tamil Nadu, India, helenvia@srmist.edu.in,maragatg@srmist.edu.in, Evolving Systems, 2020 - Springer.

  13. Badriyah T, Santoso DB, Syarif I (2019) Deep learning algorithm for data classification with hyperparameter optimization method. IOP Conf Ser J Phys Conf Series 1193:012033 IOP Publishing. https://doi.org/10.1088/1742-6596/1193/1/012033

    Article  Google Scholar 

  14. Roman Liessner, Jakob Schmitt, Ansgar Dietermann and Bernard Baker, ‘Hyperparameter optimization for deep reinforcement learning invehicle energy management’, Dresden Institute of Automobile Engineering, TU Dresden, George-B¨ahr-Straße 1c, 01069 Dresden, Germany, In Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART 2019), pages 134-144,ISBN: 978-989-758-350-6 , https://doi.org/10.5220/0007364701340144

  15. YoungJun Yoo, ‘Hyperparameter optimization of deep neural network using univariate dynamic encoding algorithm for searches’, Department of Electronic Engineering, Pohang University of Science and Technology (POSTECH), youdalj@postech.ac.kr.,San 31, Hyojadong, Namgu,Pohang, Gyungbuk, 790-784, Republic of Korea, Knowledge-Based Systems 178 (2019)74–83, https://doi.org/10.1016/j.knosys.2019.04.019

  16. Kaichao You , Mingsheng Long (B), Jianmin Wang , Michael I. Jordan, ‘How does learning rate decay help modern neural networks?’,{ School of Software, Tsinghua University, youkaichao@gmail.com , mingsheng@tsinghua.edu.cn,jimwang@tsinghua.edu.cn }, Department of EECS University of California, Berkeley, jordan@cs.berkeley.edu, arXiv:1908.01878v2 [cs.LG] 26 Sep 2019

  17. Michal Rolınek and Georg Martius, ‘L4: practical loss-based stepsize adaptation for deep learning’, Max-Planck-Institute for Intelligent Systems, T¨ubingen, Germany ,{michal.rolinek,georg.martius} @ tuebingen.mpg.de, arXiv:1802.05074v5 [cs.LG] 30 Nov 2018

  18. Jinia Konar, Prerit Khandelwal, Rishabh Tripathi, ‘Comparison of various learning rate scheduling techniques on convolutional neural network’, C.S.E Department N.I.T Bhopal, Bhopal, India konar.riku@gmail.com , preritkhandalwal@gmail.com ,imristri@gmail.com, 2020 IEEE International Students’ Conference on Electrical, Electronics and Computer Science, 978-1-7281-4862-5/20/$31.00 ©2020 IEEE 10.1109/SCEECS48394.2020.94

  19. Step size matters in deep learning, NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems 2018, pp 3440–3448. https://doi.org/10.5555/3327144.3327262

  20. Conghui Tan, Shiqian Ma, Yu-Hong Dai, Yuqiu Qian, ‘Barzilai-Borwein step size for stochastic gradient descent’, May 16, 2016. arXiv:1605.04131v2 [math.OC], 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain 23 May 2016

  21. Ilya Loshchilov, Frank Hutter, ‘SGDR: stochastic gradient descent with warm restarts’, University of Freiburg, Freiburg, Germany, {ilya,fh}@cs.uni-freiburg.de,arXiv:1608.03983v5, International Conference on Learning Representations,Toulon, France, April 24 - 26, 2017,http://www.iclr.cc, May 2017

  22. W. An, H. Wang, Y. Zhang, and Q. Dai, ‘Exponential decay sine wave learning rate for fast deep neural network training’, In Visual Communications and Image Processing (VCIP) ,2017 IEEE, pages 1–4. IEEE, 2017.

  23. Liu Peiqian, Wang Xiaojie, Distance approach for open information extraction based on word vector, Vol. 12, No. 6, June 29, 2018, https://doi.org/10.3837/tiis.2018.06.003

  24. Shi Y, Zhu L, Li W, Guo K, Zheng Y (2019) Survey on classic and latest textual sentiment analysis articles and techniques, vol 18(04). International Journal of Information Technology and Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd, pp 1243–1287

  25. Arnav Chakravarthy, Prssanna Desai, Simran Deshmukh, Surbhi Gawande, Prof. Ishani Saha, Mukesh Patel, Hybrid architecture for sentiment analysis using deep learning, Volume 9, No. 1, January-February 2018, ISSN No. 0976-5697,735-738

  26. Salloum SA, Al-Emran M, Monem AA, Shaalan K (2017) A survey of text mining in social media: Facebook and Twitter perspectives. IT Adv Sci Technol Eng Syst J 2(1):127–133

  27. Tom M (1999) Twenty newsgroup dataset. https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups. Accessed 9 Sept 1999

  28. Lewis DD (1987) AT & T Labs – Research, Reuters-21578 text categorization collection data set. https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection,1997-09-26. Accessed 16 Feb 1999.

  29. Maas et al (2011), Stanford AI repository, IMDB movie reviews sentiment dataset. https://www.kaggle.com/jcblaise/imdb-sentiments. Accessed Jun 2011.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D Vidyabharathi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vidyabharathi, D., Mohanraj, V., Kumar, J.S. et al. Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler. Pers Ubiquit Comput 27, 1335–1353 (2023). https://doi.org/10.1007/s00779-021-01587-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-021-01587-4

Keywords

Navigation