Elsevier

Neurocomputing

Volume 363, 21 October 2019, Pages 88-98
Neurocomputing

ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis

https://doi.org/10.1016/j.neucom.2019.07.017Get rights and content

Abstract

Tanh is a sigmoidal activation function that suffers from vanishing gradient problem, so researchers have proposed some alternative functions including rectified linear unit (ReLU), however those vanishing-proof functions bring some other problem such as bias shift problem and noise-sensitiveness as well. Mainly for overcoming vanishing gradient problem as well as avoiding to introduce other problems, we propose a new activation function named Rectified Linear Tanh (ReLTanh) by improving traditional Tanh. ReLTanh is constructed by replacing Tanh’s saturated waveforms in positive and negative inactive regions with two straight lines, and the slopes of the lines are calculated by the Tanh’s derivatives at two learnable thresholds. The middle Tanh waveform provides ReLTanh with the ability of nonlinear fitting, and the linear parts contribute to the relief of vanishing gradient problem. Besides, thresholds of ReLTanh that determines the slopes of line parts are learnable, so it can tolerate the variation of inputs and help to minimize the cost function and maximize the data fitting performance. Theoretical proofs by mathematical derivations demonstrate that ReLTanh is available to diminish vanishing gradient problem and feasible to train thresholds. For verifying the practical feasibility and effectiveness of ReLTanh, fault diagnosis experiments for planetary gearboxes and rolling bearings are conducted by stacked autoencoder-based deep neural network (SAE-based DNNs). ReLTanh alleviates successfully vanishing gradient problem and the it learns faster, more steadily and precisely than Tanh, which is consistent with the theoretical analysis. Additionally, ReLTanh surpasses other popular activation functions such as ReLU family, Hexpo and Swish, which shows that ReLTanh has certain applying potential and researching value.

Introduction

With the increasing complexity, rotating machinery fault diagnosis play a more and more important role for the reliability and safety of modern industrial systems [1], [2]. Rotating components such as planetary gears and rolling bearings always have high fault probabilities, because of their complex structures and harsh operating conditions [3], [4]. To make matters worse, if the early minor failures are not detected and maintained timely, they will deteriorate rapidly and lead to a serious halt of the whole power transmission chain, and even catastrophic economic losses and casualties [5], [6]. Therefore, in order to prevent the faults from deteriorating, it is necessary to diagnose them as early as possible.

The researches on fault detection of rotating components have attached a lot of attention, but these tasks are still challenging [7], [8]. Traditional diagnosis methods are based on vibration signal analysis techniques such as short time Fourier transform, Wigner-Ville distribution, wavelet transform [9], but they have tedious procedures and relative unsatisfactory performance. Therefore, various intelligent and automatic methods based on deep learning models including deep neural networks (DNNs) have been proposed to simplify processes and improve accuracies [10]. For example, Jia et al. [11] compared the performance of DNNs and artificial neural networks (ANNs) on fault diagnosis for planetary gearbox. Xu et al. [12] applied a sparse autoencoder-based deep neural network on open-circuit fault diagnosis. Lu et al. [13] proposed a feature extraction method based on DNN for rolling bearing fault diagnosis.

Tanh is a typical sigmoidal activation function, and it is popular in shallow networks such as ANNs [14], because it outputs zero-centered non-linear activations. However, Tanh is abandoned when it comes to deep models due to the vanishing gradient problem, thus we propose an improving activation function ReLTanh based on Tanh to retard this problem.

Tanh can squash large-scale inputs into an interval of [−1, 1] and provide non-linear and noise-robust representation. But meanwhile, the saturation characteristic also vanishes gradients. Once the inputs fall into the saturation regions, they get relative small gradients close to zero and slow down the updating of weights and biases [15]. Even worse, the gradients decrease exponentially as the depth of the network increases, because gradient computation in back-propagation (BP) process is based on the chain rule and all layers are interconnected and interlocked [16]. Sometimes, the neurons in the lower layers of a multi-layer network can hardly be updated and even die, which blocks DNNs from deepening further [17]. Due to vanishing gradient problem, it generally takes more computational power to train, and DNNs are more likely to converge to a local minimum.

In order to alleviate vanishing gradient problem, a lot of methods have been developed in recent years, such as layer-wise pre-training algorithm [18], rectified linear unit (ReLU) family [19]. Unsupervised pre-training is beneficial to diminish the vanishing gradient problem by providing a better weight initialization for deep models. For example, a stacked autoencoder-based DNN (SAE-based DNN) can pre-train autoencoders to preview the data, so it will be applied for diagnosis in this study [20]. ReLU family can overcome vanishing gradient problem by their straight lines with the fixed slope of 1 in the positive interval [21]. However, straight lines are a double-edged sword, it provides noise-sensitive representation for all layers and affect the convergence of learning [22]. Besides, it brings a certain degree of bias shift problem as well. For example, ReLU is identical for positive inputs and zero otherwise, so it has a non-negative mean activation output [25]. According to Refs. [23], [28], units with a non-zero mean activation output can cause bias shift problem for the next layer, and bias shift leads to oscillations and impede learning. Even worse, bias shift may aggravate with the depth of models increase, just like the vanishing gradient problem.

In order to diminish vanishing gradient problem that perplexes Tanh and reduce bias shift and noise-sensitiveness that torments ReLU family, we propose a new activation function: ReLTanh. ReLTanh is created by replacing the saturated waveforms with two straight lines, so ReLTanh consists of the nonlinear Tanh in the center and two linear parts on both ends. The positive line is steeper than the negative one, and both lines start at two learnable thresholds, and their slopes are Tanh’s derivative values at these thresholds. These thresholds can be trained and updated along the gradient descent direction of cost function. So ReLTanh can improve vanishing gradient problem just like ReLU family, and its mean outputs are closer to zero so that it is affected less bias shift than ReLU family. Additionally, complete mathematical derivations are performed to verify theoretically that ReLTanh DNNs are effective for weakening vanishing gradient problem and feasible to training the slopes.

For further validation of the practical feasibility for ReLTanh, the ReLTanh SAE-based DNNs are employed in fault diagnosis experiments for planetary gearboxes and rolling bearings. At first, for planetary gearboxes, vibration signals are collected by ourselves from a professional test rig. Then according to Fig. 1, statistical feature extraction, vector sample construction and fault recognition are performed in sequence. The results demonstrate that ReLTanh relaxes vanishing gradient problem successfully and surpasses Tanh on multi-aspects. Even more exciting, ReLTanh can provide higher accuracies than popular activation functions such as ReLU family, Hexpo and Swish. Next, Case Western Reserve University's motor rolling bearing dataset that is internationally recognized is employed to train and test ReLTanh DNNs again, and similar results are obtained. These two experiments provide results and conclusions that are in line with the theoretical analysis, and they prove in theory and practice the potential of application and development.

The remainder of paper is organized as follows. In Section 2, the detailed introduce of vanishing gradient problem and popular activation functions, and the architectures and learning rules of SAE-based DNNs are briefly presented. The definition of ReLTanh and relevant mathematical derivations are described in Section 3. Section 4 introduces the application of the ReLTanh DNNs on fault diagnosis for planetary gearboxes according to Fig. 1. Section 5 applies the ReLTanh on rolling bearing fault diagnosis. Finally, some conclusions are addressed in Section 6.

Section snippets

Vanishing gradient problem

The main purpose of this study for ReLTanh is to weaken the vanishing gradient problem. For supervised learning models such as deep neural networks (DNNs), the ultimate training goal during BP process is to fit the labeled data and find the global minimum of the cost function by gradient descent methods. The partial gradients of the cost function with respect to the weights are a key multiplier in the updating formula. But by chain rule, they cause serious decay for updating values when several

Definition of ReLTanh

The activation function ReLTanh is defined as follows.ReLTanh(x)={Tanh(λ+)(xλ+)+Tanh(λ+)xλ+Tanh(x)λ<x<λ+Tanh(λ)(xλ)+Tanh(λ)xλwhereλlower+λ+λupper+,λlowerλλupper

It is obvious that ReLTanh consists of a piece of nonlinear Tanh waveform in the center and two linear parts on both ends, and λ+ and λ are respectively the positive and negative thresholds that determine the start positions and slopes of the straight lines. Besides, both λ+ and λ can be trained by BP algorithm, and

Experimental setup

As shown in Fig. 8, the test rig is a drivetrain diagnostics simulator (DDS) designed by SpectraQuest Inc (the company website can be visited with “http://www.pinxuntech.com/”), and it mainly consists of a driving motor, a two-stage planetary gearbox, a two-stage parallel-axis gearbox, a programmable magnetic brake. In this study, we focus on the secondary sun gear of the planetary gearbox because of its higher failure rate than other components in the gearbox [36]. Four most typical gear

Fault diagnosis for rolling bearing

In order to prove further the effectiveness and superiority of ReLTanh on vanishing gradient problem, rolling bearing faulty datasets provided by Bearing Data Centre of Case Western Reserve University (CWRU) (which can be visited with “http://csegroups.case.edu/bearingdatacenter/home”) are applied to test ReLTanh again.

Conclusions

Aiming to overcome the vanishing gradient problem suffered by Tanh, ReLTanh is created by improving Tanh for faster and more precise learning in SAE-based DNNs. ReLTanh is composed of three parts: a line with a relatively slight slope in negative interval, a non-linear part that reserved by Tanh in middle, and a line with a steeper slope in positive interval. The slopes of two lines can be trained by updating the thresholds according to the cost function, and detailed mathematical derivations

Declaration of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work described in this paper was supported by the National Natural Science Foundation of China (no. 51675065), Chongqing Research Program of Basic Research and Frontier Technology (no. cstc2017jcyjAX0459), National Key R&D Program of China (no. 2018YFB2001300) and the Fundamental Research Funds for the Central Universities (nos. 2018CDQYJX0011 and 2018CDJDCD0001).

Xin Wang received the B. Eng. degree in Automotive engineering from Chongqing University, Chongqing, China, in 2017.

He is currently working toward the M.S. degree in School of Automotive Engineering of Chongqing University, Chongqing, China. His research interests mainly include signal processing, intelligent mechanical fault diagnosis and artificial intelligence.

References (40)

  • G.F. Liu et al.

    A stacked autoencoder-based deep neural network for achieving gearbox fault diagnosis

    Math. Probl. Eng.

    (2018)
  • B.K. Humpert

    Improving back propagation with a new error function

    Neural Netw.

    (1994)
  • H. Geng et al.

    Bias estimation for asynchronous multi-rate multi-sensor fusion with unknown inputs

    Inf. Fusion

    (2018)
  • Yi Qin et al.

    Transient feature extraction by the improved orthogonal matching pursuit and K-SVD algorithm with adaptive transient dictionary

    IEEE Trans. Ind. Info.

    (2019)
  • L. Wang et al.

    Wind turbine gearbox failure identification with deep neural networks

    IEEE Trans. Ind. Info.

    (2017)
  • Y. Qin et al.

    The optimized deep belief networks with improved logistic sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines

    IEEE Trans. Ind. Electron.

    (2019)
  • W.N. Lu et al.

    Ieee, a novel feature extraction method using deep neural network for rolling bearing fault diagnosis

  • M.M. Lau et al.

    Investigation of activation functions in deep belief network

  • P. Ghahremani et al.

    Linearly augmented deep neural network

  • M.S. Ebrahimi, H.K. Abadi, Study of residual networks for image recognition, arXiv preprint arXiv:1805.00325,...
  • Cited by (0)

    Xin Wang received the B. Eng. degree in Automotive engineering from Chongqing University, Chongqing, China, in 2017.

    He is currently working toward the M.S. degree in School of Automotive Engineering of Chongqing University, Chongqing, China. His research interests mainly include signal processing, intelligent mechanical fault diagnosis and artificial intelligence.

    Yi Qin received the B. Eng. and Ph.D. degrees in mechanical engineering from Chongqing University, Chongqing, China, in 2004 and 2008 respectively.

    Since January 2009, he has been with the Chongqing University, Chongqing, China, where he is currently a Professor in the College of Mechanical Engineering. His current research interests include signal processing, fault prognosis, mechanical dynamics and smart structure.

    Dr. Qin is a Member of IEEE.

    Yi Wang received his B. Eng. degree in mechanical engineering from Southwest Jiaotong University, Chengdu, China, in 2011, Ph.D. degree in mechanical engineering from Xi'an Jiaotong University, Xi'an, China, in 2017, respectively. During 2016.8–2017.2, he was a visiting scholar in City University of Hong Kong, Hong Kong, China.

    Since January 2009, he has been with the Chongqing University, Chongqing, China, where he is currently a Lecture in the College of Mechanical Engineering. His current research interests include mechanical signal processing, weak signal detection, rotating machinery fault diagnosis under speed variation conditions, manifold learning and deep learning.

    Sheng Xiang received the B. Eng. degree in mechanical engineering from Yangtze University, Hubei, China, in 2017.

    He is currently working toward the a Ph.D. degree in mechanical engineering of Chongqing University, Chongqing, China. His research interests mainly include signal processing, mechanical fault diagnosis and residual life prediction.

    Haizhou Chen received the M.A. Eng. and Ph.D. degrees in mechanical engineering from Chongqing University, Chongqing, China, in 2010 and 2017 respectively.

    Since July 2017, he has been with the Qingdao University of Science and Technology, Qingdao, China, where he is currently a Lecturer in the College of Electromechanical Engineering. His current research interests include failure mechanism analysis, fault prognosis and tribology.

    View full text