Abstract
Although Adam and its variants are widely used optimization methods, they have issues such as non-convergence and slow optimization speed. Researches have shown that weighting more of the past information in the the second moment estimation could be beneficial to the optimization process. But the design of update formulas and the standard of ideal switching time need to be considered. In this paper, a novel optimization method called Adaptive learning Rate switCH (ARCH) is proposed. According to a well-designed update formula, ARCH can increase the weight of historical information in the second moment estimation continuously. Besides, the switching time can be selected adaptively and automatically through experimental performance. A theoretical proof of the convergence of the proposed algorithm is also presented. To verify the performance of ARCH, a series of comparative experiments, which compare ARCH with other optimization methods in several classical convolution neural networks, are carried out. Experimental results have shown that ARCH has fast convergence speed as well as good generalization performance. Moreover, the algorithm proposed in this paper is also applied in practical froth flotation monitoring and results show that ARCH can perform excellently in practical application as well.
Similar content being viewed by others
Data Availability
The MNIST datasets analysed during this study are available in National Institute of Standards and Technology (NIST) repository: http://yann.lecun.com/exdb/mnist/. The Cifar-10 datasets used in this study are available in Alex Krizhevsky’s home page: http://www.cs.toronto.edu/~kriz/cifar.html. The froth floatation photos are not publicly available due to relevant data protection laws but are available from the corresponding author on reasonable request.
References
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Socher R, Bengio Y, Manning CD (2012) Deep learning for nlp (without magic). In: Tutorial abstracts of ACL 2012, USA, pp 5–9
Liang D, Ma F, Li W (2020) New gradient-weighted adaptive gradient methods with dynamic constraints. IEEE Access 8:110929–110942
Lv K, Jiang S, Li J (2017) Learning gradient descent: better generalization and longer horizons. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, vol 70. USA, pp 2247–2255
Robbins H, Monro S (1951) A stochastic approximation method. The Annals Math Stat:400–407
Sun S, Cao Z, Zhu H, Zhao J (2019) A survey of optimization methods from a machine learning perspective. IEEE Trans Cybern 50(8):3668–3681
Lin J, Song C, He K, Wang L, Hopcroft JE (2020) Nesterov accelerated gradient and scale invariance for adversarial attacks. In: International conference on learning representations, pp 1–12
Nesterov Y (1983) A method for unconstrained convex minimization problem with the rate of convergence o (1/k ̂2). In: Doklady an ussr, vol 269, pp 543–547
Sun S, Cao Z, Zhu H, Zhao J (2020) A survey of optimization methods from a machine learning perspective. IEEE Trans Cybern 50(8):3668–3681
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. Int Conf Learn Representations:1–13
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach Learn 4(2):26–31
Yang L, Cai D (2021) Adadb: an adaptive gradient method with data-dependent bound. Neurocomputing 419:183–189
Reddi SJ, Kale S, Kumar S (2018) On the convergence of adam and beyond. In: International conference on learning representations, pp 1–12
Zhou Z, Zhang Q, Lu G (2019) Adashift: decorrelation and convergence of adaptive learning rate methods. In: International conference on learning representations, pp 1–26
Luo L, Xiong Y, Liu Y (2019) Adaptive gradient methods with dynamic bound of learning rate. In: International conference on learning representations, pp 1–19
Huang H, Wang C, Dong B (2019) Nostalgic adam: weighting more of the past gradients when designing the adaptive learning rate. In: Twenty-eighth international joint conference on artificial intelligence, pp 2556–2562
Wilson AC, Roelofs R, Stern M, Srebro N, Recht B (2017) The marginal value of adaptive gradient methods in machine learning. In: Proceedings of the 31st international conference on neural information processing systems, pp 4151–4161
Li ZM, Gui WH, Zhu JY (2019) Fault detection in flotation processes based on deep learning and support vector machine. J Cent South Univ 26(9):2504–2515
Zhou X, Wang Q, Zhang R (2020) A hybrid feature selection method for production condition recognition in froth flotation with noisy labels. Miner Eng 153:106–201
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 61873285), the National Key Research and Development Program of China (Grant No. 2018AAA0101603), International Cooperation and Exchange of the National Natural Science Foundation of China (Grant No. 61860206014).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, B., Du, Y., Zhou, X. et al. A novel adaptive optimization method for deep learning with application to froth floatation monitoring. Appl Intell 53, 11820–11832 (2023). https://doi.org/10.1007/s10489-022-04083-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04083-1