Abstract
Deep learning technologies have been broadly utilized in theoretical research and practical application of intelligent robots. Among numerous paradigms, BP neural network attracts wide attentions as an accurate and flexible tool. However, there always exists an unanswered question centering around gradient disappearance when using back-propagation strategy in multi-layer BP neural network. Moreover, the situation deteriorates sharply in the context of sigmoid transfer functions employed. To fill this research gap, this study explores a new solution that the relative magnitude of gradient descent is estimated, and neutralized via a new developed function with increasing properties. As a result, the undesired gradient disappearance problem is alleviated while reserving the traditional merits of the gradient descent method. The validity is verified by an actual case study of subway passenger flow, and the simulation results elucidate a superior convergence speed compared with the original algorithm.
This work was found by the National Natural Science Foundation of China under Grant 41771187. None of the material for this article has been published at the conference.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, Y., Wang, Y.C., Lan, S.L., Wang, L.H., Shen, W.M., Huang, G.Q.: Cloud-edge-device collaboration mechanisms of deep learning models for smart robots in mass personalization. Robot. Comput. Integr. Manuf. 77, 102351 (2022)
Li, C.-B., Wang, K.-C.: A new grey forecasting model based on BP neural network and Markov chain. J. Cent. South Univ. Technol. 14(5), 713–718 (2007). https://doi.org/10.1007/s11771-007-0136-7
Zhang, Y.G., Chen, B., Pan, G.F., Zhao, Y.: A novel hybrid model based on VMD-WT and PCA-BP-RBF neural network for short-term wind speed forecasting. Energy Convers. Manag. J. 195, 180–197 (2019)
Saha, T.K., Pal, S., Sarkar, R.: Prediction of wetland area and depth using linear regression model and artificial neural network based cellular automata. Ecol. Inform. Energy Convers. Manag. 62, 101272 (2021)
Shao, Y.X., et al.: Prediction of 3-month treatment outcome of IgG4-DS based on BP artificial neural network. Oral Dis. 27(4), 934–941 (2021)
Gurcan, C., Negash, B., Nathan, H.: Improved grey system models for predicting traffic parameters. Expert Syst. Appl. 177, 114972 (2021)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feed-forward neural networks. J. Mach. Learn. Res. 9, 249–256 (2010)
Zhang, S.R., Wang, B.T., Li, X.E., Chen, H.: Research and application of improved gas concentration prediction model based on grey theory and BP neural network in digital mine. Procedia CIRP 56, 471–475 (2016)
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Internat. J. Uncertain. Fuzziness Knowledge-Based Systems 6(2), 107–116 (1998)
Apaydin, H., Feizi, H., Sattari, M.T., Colak, M.S., Shamshirband, S., Chau, K.W.: Comparative analysis of recurrent neural network architectures for reservoir inflow forecasting. Water 12(5), 1500 (2020)
Hahnloser, R.L.T.: On the piecewise analysis of networks of linear threshold neurons. Neural Netw. 11(4), 691–697 (1998)
He, J.C., Li, L., Xu, J.C.: ReLU deep neural networks from the hierarchical basis perspective. Comput. Math. Appl. 120, 105–114 (2022)
Qin, Y., Wang, X., Zou, J.Q.: The optimized deep belief networks with improved logistic Sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines. IEEE Trans. Industr. Electron. 66(5), 3814–3824 (2018)
Wang, X., Qin, Y., Wang, Y., Xiang, S., Chen, H.Z.: ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing 363, 88–98 (2019)
Roodschild, M., Sardiñas, J.G., Will, A.: A new approach for the vanishing gradient problem on sigmoid activation. Prog. Artif. Intell. 9(4), 351–360 (2020). https://doi.org/10.1007/s13748-020-00218-y
Abuqaddom, I., Mahafzah, B.A., Faris, H.: Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients. Knowl.-Based Syst. 230, 107391 (2021)
Al-Abri, S., Lin, T.X., Tao, M., Zhang, F.M.: A derivative-free optimization method with application to functions with exploding and vanishing gradients. IEEE Control Syst. Lett. 5(2), 587–592 (2021)
Karabayir, I., Akbilgic, O., Tas, N.: A novel learning algorithm to optimize deep neural networks: Evolved gradient direction optimizer (EVGO). IEEE Trans. Neural Netw. Learn. Syst. 32(2), 685–694 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, D., Liu, X., Zhang, J. (2023). Improved Vanishing Gradient Problem for Deep Multi-layer Neural Networks. In: Sun, F., Cangelosi, A., Zhang, J., Yu, Y., Liu, H., Fang, B. (eds) Cognitive Systems and Information Processing. ICCSIP 2022. Communications in Computer and Information Science, vol 1787. Springer, Singapore. https://doi.org/10.1007/978-981-99-0617-8_12
Download citation
DOI: https://doi.org/10.1007/978-981-99-0617-8_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0616-1
Online ISBN: 978-981-99-0617-8
eBook Packages: Computer ScienceComputer Science (R0)