ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis
Introduction
With the increasing complexity, rotating machinery fault diagnosis play a more and more important role for the reliability and safety of modern industrial systems [1], [2]. Rotating components such as planetary gears and rolling bearings always have high fault probabilities, because of their complex structures and harsh operating conditions [3], [4]. To make matters worse, if the early minor failures are not detected and maintained timely, they will deteriorate rapidly and lead to a serious halt of the whole power transmission chain, and even catastrophic economic losses and casualties [5], [6]. Therefore, in order to prevent the faults from deteriorating, it is necessary to diagnose them as early as possible.
The researches on fault detection of rotating components have attached a lot of attention, but these tasks are still challenging [7], [8]. Traditional diagnosis methods are based on vibration signal analysis techniques such as short time Fourier transform, Wigner-Ville distribution, wavelet transform [9], but they have tedious procedures and relative unsatisfactory performance. Therefore, various intelligent and automatic methods based on deep learning models including deep neural networks (DNNs) have been proposed to simplify processes and improve accuracies [10]. For example, Jia et al. [11] compared the performance of DNNs and artificial neural networks (ANNs) on fault diagnosis for planetary gearbox. Xu et al. [12] applied a sparse autoencoder-based deep neural network on open-circuit fault diagnosis. Lu et al. [13] proposed a feature extraction method based on DNN for rolling bearing fault diagnosis.
Tanh is a typical sigmoidal activation function, and it is popular in shallow networks such as ANNs [14], because it outputs zero-centered non-linear activations. However, Tanh is abandoned when it comes to deep models due to the vanishing gradient problem, thus we propose an improving activation function ReLTanh based on Tanh to retard this problem.
Tanh can squash large-scale inputs into an interval of [−1, 1] and provide non-linear and noise-robust representation. But meanwhile, the saturation characteristic also vanishes gradients. Once the inputs fall into the saturation regions, they get relative small gradients close to zero and slow down the updating of weights and biases [15]. Even worse, the gradients decrease exponentially as the depth of the network increases, because gradient computation in back-propagation (BP) process is based on the chain rule and all layers are interconnected and interlocked [16]. Sometimes, the neurons in the lower layers of a multi-layer network can hardly be updated and even die, which blocks DNNs from deepening further [17]. Due to vanishing gradient problem, it generally takes more computational power to train, and DNNs are more likely to converge to a local minimum.
In order to alleviate vanishing gradient problem, a lot of methods have been developed in recent years, such as layer-wise pre-training algorithm [18], rectified linear unit (ReLU) family [19]. Unsupervised pre-training is beneficial to diminish the vanishing gradient problem by providing a better weight initialization for deep models. For example, a stacked autoencoder-based DNN (SAE-based DNN) can pre-train autoencoders to preview the data, so it will be applied for diagnosis in this study [20]. ReLU family can overcome vanishing gradient problem by their straight lines with the fixed slope of 1 in the positive interval [21]. However, straight lines are a double-edged sword, it provides noise-sensitive representation for all layers and affect the convergence of learning [22]. Besides, it brings a certain degree of bias shift problem as well. For example, ReLU is identical for positive inputs and zero otherwise, so it has a non-negative mean activation output [25]. According to Refs. [23], [28], units with a non-zero mean activation output can cause bias shift problem for the next layer, and bias shift leads to oscillations and impede learning. Even worse, bias shift may aggravate with the depth of models increase, just like the vanishing gradient problem.
In order to diminish vanishing gradient problem that perplexes Tanh and reduce bias shift and noise-sensitiveness that torments ReLU family, we propose a new activation function: ReLTanh. ReLTanh is created by replacing the saturated waveforms with two straight lines, so ReLTanh consists of the nonlinear Tanh in the center and two linear parts on both ends. The positive line is steeper than the negative one, and both lines start at two learnable thresholds, and their slopes are Tanh’s derivative values at these thresholds. These thresholds can be trained and updated along the gradient descent direction of cost function. So ReLTanh can improve vanishing gradient problem just like ReLU family, and its mean outputs are closer to zero so that it is affected less bias shift than ReLU family. Additionally, complete mathematical derivations are performed to verify theoretically that ReLTanh DNNs are effective for weakening vanishing gradient problem and feasible to training the slopes.
For further validation of the practical feasibility for ReLTanh, the ReLTanh SAE-based DNNs are employed in fault diagnosis experiments for planetary gearboxes and rolling bearings. At first, for planetary gearboxes, vibration signals are collected by ourselves from a professional test rig. Then according to Fig. 1, statistical feature extraction, vector sample construction and fault recognition are performed in sequence. The results demonstrate that ReLTanh relaxes vanishing gradient problem successfully and surpasses Tanh on multi-aspects. Even more exciting, ReLTanh can provide higher accuracies than popular activation functions such as ReLU family, Hexpo and Swish. Next, Case Western Reserve University's motor rolling bearing dataset that is internationally recognized is employed to train and test ReLTanh DNNs again, and similar results are obtained. These two experiments provide results and conclusions that are in line with the theoretical analysis, and they prove in theory and practice the potential of application and development.
The remainder of paper is organized as follows. In Section 2, the detailed introduce of vanishing gradient problem and popular activation functions, and the architectures and learning rules of SAE-based DNNs are briefly presented. The definition of ReLTanh and relevant mathematical derivations are described in Section 3. Section 4 introduces the application of the ReLTanh DNNs on fault diagnosis for planetary gearboxes according to Fig. 1. Section 5 applies the ReLTanh on rolling bearing fault diagnosis. Finally, some conclusions are addressed in Section 6.
Section snippets
Vanishing gradient problem
The main purpose of this study for ReLTanh is to weaken the vanishing gradient problem. For supervised learning models such as deep neural networks (DNNs), the ultimate training goal during BP process is to fit the labeled data and find the global minimum of the cost function by gradient descent methods. The partial gradients of the cost function with respect to the weights are a key multiplier in the updating formula. But by chain rule, they cause serious decay for updating values when several
Definition of ReLTanh
The activation function ReLTanh is defined as follows.
It is obvious that ReLTanh consists of a piece of nonlinear Tanh waveform in the center and two linear parts on both ends, and and are respectively the positive and negative thresholds that determine the start positions and slopes of the straight lines. Besides, both and can be trained by BP algorithm, and
Experimental setup
As shown in Fig. 8, the test rig is a drivetrain diagnostics simulator (DDS) designed by SpectraQuest Inc (the company website can be visited with “http://www.pinxuntech.com/”), and it mainly consists of a driving motor, a two-stage planetary gearbox, a two-stage parallel-axis gearbox, a programmable magnetic brake. In this study, we focus on the secondary sun gear of the planetary gearbox because of its higher failure rate than other components in the gearbox [36]. Four most typical gear
Fault diagnosis for rolling bearing
In order to prove further the effectiveness and superiority of ReLTanh on vanishing gradient problem, rolling bearing faulty datasets provided by Bearing Data Centre of Case Western Reserve University (CWRU) (which can be visited with “http://csegroups.case.edu/bearingdatacenter/home”) are applied to test ReLTanh again.
Conclusions
Aiming to overcome the vanishing gradient problem suffered by Tanh, ReLTanh is created by improving Tanh for faster and more precise learning in SAE-based DNNs. ReLTanh is composed of three parts: a line with a relatively slight slope in negative interval, a non-linear part that reserved by Tanh in middle, and a line with a steeper slope in positive interval. The slopes of two lines can be trained by updating the thresholds according to the cost function, and detailed mathematical derivations
Declaration of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The work described in this paper was supported by the National Natural Science Foundation of China (no. 51675065), Chongqing Research Program of Basic Research and Frontier Technology (no. cstc2017jcyjAX0459), National Key R&D Program of China (no. 2018YFB2001300) and the Fundamental Research Funds for the Central Universities (nos. 2018CDQYJX0011 and 2018CDJDCD0001).
Xin Wang received the B. Eng. degree in Automotive engineering from Chongqing University, Chongqing, China, in 2017.
He is currently working toward the M.S. degree in School of Automotive Engineering of Chongqing University, Chongqing, China. His research interests mainly include signal processing, intelligent mechanical fault diagnosis and artificial intelligence.
References (40)
- et al.
An intelligent diagnosis scheme based on generative adversarial learning deep neural networks and its application to planetary gearbox fault pattern recognition
Neurocomputing
(2018) - et al.
Quantitative fault analysis of roller bearings based on a novel matching pursuit method with a new step-impulse dictionary
Mech. Syst. Signal Proc.
(2016) - et al.
Multimodal deep support vector classification with homologous features and its application to gearbox fault diagnosis
Neurocomputing
(2015) - et al.
Design of optimal lighting control strategy based on multi-variable fractional-order extremum seeking method
Inf. Sci.
(2018) - et al.
Model-reduced fault detection for multi-rate sensor fusion with unknown inputs
Inf. Fusion
(2017) - et al.
Fractional-order sliding mode based extremum seeking control of a class of nonlinear systems
Automatica
(2014) - et al.
Multicomponent decomposition by wavelet modulus maxima and synchronous detection
Mech. Syst. Signal Process.
(2017) - et al.
Deep neural networks: a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data
Mech. Syst. Signal Proc.
(2016) - et al.
Open-circuit fault diagnosis of power rectifier using sparse autoencoder based deep neural network
Neurocomputing
(2018) - et al.
Early fault diagnosis of rotating machinery based on wavelet packets-Empirical mode decomposition feature extraction and neural network
Mech. Syst. Signal Proc.
(2012)
A stacked autoencoder-based deep neural network for achieving gearbox fault diagnosis
Math. Probl. Eng.
Improving back propagation with a new error function
Neural Netw.
Bias estimation for asynchronous multi-rate multi-sensor fusion with unknown inputs
Inf. Fusion
Transient feature extraction by the improved orthogonal matching pursuit and K-SVD algorithm with adaptive transient dictionary
IEEE Trans. Ind. Info.
Wind turbine gearbox failure identification with deep neural networks
IEEE Trans. Ind. Info.
The optimized deep belief networks with improved logistic sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines
IEEE Trans. Ind. Electron.
Ieee, a novel feature extraction method using deep neural network for rolling bearing fault diagnosis
Investigation of activation functions in deep belief network
Linearly augmented deep neural network
Cited by (0)
Xin Wang received the B. Eng. degree in Automotive engineering from Chongqing University, Chongqing, China, in 2017.
He is currently working toward the M.S. degree in School of Automotive Engineering of Chongqing University, Chongqing, China. His research interests mainly include signal processing, intelligent mechanical fault diagnosis and artificial intelligence.
Yi Qin received the B. Eng. and Ph.D. degrees in mechanical engineering from Chongqing University, Chongqing, China, in 2004 and 2008 respectively.
Since January 2009, he has been with the Chongqing University, Chongqing, China, where he is currently a Professor in the College of Mechanical Engineering. His current research interests include signal processing, fault prognosis, mechanical dynamics and smart structure.
Dr. Qin is a Member of IEEE.
Yi Wang received his B. Eng. degree in mechanical engineering from Southwest Jiaotong University, Chengdu, China, in 2011, Ph.D. degree in mechanical engineering from Xi'an Jiaotong University, Xi'an, China, in 2017, respectively. During 2016.8–2017.2, he was a visiting scholar in City University of Hong Kong, Hong Kong, China.
Since January 2009, he has been with the Chongqing University, Chongqing, China, where he is currently a Lecture in the College of Mechanical Engineering. His current research interests include mechanical signal processing, weak signal detection, rotating machinery fault diagnosis under speed variation conditions, manifold learning and deep learning.
Sheng Xiang received the B. Eng. degree in mechanical engineering from Yangtze University, Hubei, China, in 2017.
He is currently working toward the a Ph.D. degree in mechanical engineering of Chongqing University, Chongqing, China. His research interests mainly include signal processing, mechanical fault diagnosis and residual life prediction.
Haizhou Chen received the M.A. Eng. and Ph.D. degrees in mechanical engineering from Chongqing University, Chongqing, China, in 2010 and 2017 respectively.
Since July 2017, he has been with the Qingdao University of Science and Technology, Qingdao, China, where he is currently a Lecturer in the College of Electromechanical Engineering. His current research interests include failure mechanism analysis, fault prognosis and tribology.