1 Introduction

It’s well known that the induction machine (IM) dominates the field of electromechanical energy conversion. These machines find a wide role in most industries in particular in the electric utility industry as auxiliary drives in central power plants of power systems, as well as restricted role in low MVA power supply systems as induction generators, mining industries, petrochemical industries, as well as in aerospace and military equipment (Benbouzid and Kliman 2003). However, a continuous monitoring system is mandatory, due to the possibility of unexpected defect on them that may lead to an interruption of production and a heavy economic loss. One of the main critical parts in the IM is the rolling bearing elements (Yu et al. 2018). Failure survey has reported that the percentage failure by bearing faults represent around 41% of total IM failures. Thereby, the fault detection and degradation in bearing is still an open task and a research challenge where important breakthroughs are developed from signal processing techniques (Mufutau et al. 2017; Ahmed and Nandi 2018).

Vibration analysis is the most widespread and commonly used to prevent breakdowns in machinery, due to its potential information on mechanical faults, including rolling bearing faults (Henry and Stephan 2018). The major challenge is how to separate effectively the valuable features which represent the fault information adequately, especially under noisy environment, non-linearity and non-stationary conditions in which the classical approach can’t deal with. Numerous signal processing methods have been introduced to diagnose bearing fault, these techniques are basically categorised on time analysis (Siegel et al. 2012; Rodriguez et al. 2013), frequency analysis (Rai and Mohanty 2007; Fan and Huaizhong 2015; Robert 2017) and time–frequency analysis (Shakya et al. 2015; Chandra and Sekhar 2016, Liu et al. 2018a, b). Typically, signal processing in the time domain extracts information from the vibration signal as a function of time, these techniques include different time features, such as, higher-order moments, skewness, variance, crest factor, kurtosis, and clearance factors (Diaz et al. 2015).

Frequency analysis methods have already been applied to diagnose bearing faults. These methods include fast Fourier transform (FFT), power spectrum estimation, and envelope spectrum analysis (Hui et al. 2010). However, these techniques are based on the assumption of stationarity and linearity conditions, and are usually applied in machines working at fixed speeds, which is not suitable for the practical applications case. To deal with this situation, time–frequency analysis techniques such as the Short Time Fourier Transform (STFT) (Hak et al. 2012), Wigner-Ville distribution (WVD) (Lubo et al. 2018), empirical mode decomposition (EMD) (Zhang and Zhou 2013; Rahul and Dheeraj 2015; Kai et al. 2017; Sun et al. 2017), Hilbert–Huang transform (HHT) (Ruquiang and Robert 2006; Soualhi et al. 2015), and the wavelet transform (WT) (Hu et al. 2018; Kordestani et al. 2018) were proposed. Basically, the Short Time Fourier Transform (STFT) technique is based on applying the FFT algorithm on a sliding window. However, the frequency and time resolutions depend strongly on the type and the length of the sliding window. An inherent drawback of this technique is, of course, the size of the fixed analysis window which does not necessarily correspond to the variable nature of the signals. The WVD gives a much better frequency resolution than the classical STFT in the same conditions. But the difficulty encountered with this method is the production of a large undesirable frequencies, the so-called inner interference terms. EMD, as a self-adaptive method, decompose the signal into a series of intrinsic mode functions (IMFs). Then, combined with the Hilbert transform (HT) to form the well-known HHT technique. This latter estimates the instantaneous frequency of the IMFs by the use of HT. However, some problems such as mode mixing phenomenon, end effect, and meaningless negative frequencies (Wei et al. 2016) limit its performances. Wavelet Transform has good local time–frequency properties. It can provide time domain as well as frequency domain information by inner production between the analysed signal and a predetermined wavelet basis (Fan et al. 2018; Deng et al. 2018). Wavelet package decomposition (WPD) is an extension of wavelet transform, which is able to separate also the high frequency bands compared to STFT, wavelet transform provide a multi-resolution representation in both time and frequency domain due to its various windows advantage.

Classification is another important task to automate the monitoring process and fault recognition. The widespread development of artificial intelligence technology has led to the emergence of intelligent methods and they have been successfully applied in a lot of fields such as image, speech recognition and fault diagnosis (Khunarsa 2017; Bai et al. 2018; Liu et al. 2018a, b). This classification methods include support vector machine (SVM) (Gu et al. 2018), K-means singular value decomposition (K-SVD) (Yuan et al. 2017; Ptuchar and Savakis 2014), Artificial Neural Network (ANN) (Asfani et al. 2012; Aditi et al. 2016; Wang et al. 2017). and so on (Deng et al. 2017; Feng et al. 2017; Toufik and Mohamed 2018).

In this regard. This paper presents a contribution based on the vibration analysis for the detection and classification of Rolling bearing faults. This contribution is based on the use of WPD and multilayer-perception network. In this context, this work has two main purposes. First of all, it aims at decomposing the vibration signal into frequency-bands partition components via WPD. Then, the most sensible component to the bearing defect will be employed as signal fault. The second target is the fault classification, which is based on features extracted from the signal fault. Finally, multilayer perceptron has been chosen as the supervised network for the classification purpose.

2 Wavelet transform

2.1 Discrete wavelets transform (DWT)

The wavelet transform has been shown to be effective tools for the analysis of both transient and steady state power system signal. A wavelet atom is a scale dilatation and time translation of the mother wavelet function. This fact leads to resolutions in time and frequency that depend on the scale dilatation for a given discrete signal x(k). The discrete wavelet transform (DWT) is given by (Kia et al. 2009):

$$w(j,k) = \sum\limits_{j} {\sum\limits_{k} {x(k)2^{{ - \tfrac{j}{2}}} \psi } } (2^{{ - \tfrac{j}{2}}} n - k)$$
(1)

\(\psi (t)\) is a time function with finite energy and fast decay called the mother wavelet. DWT split the discrete analyzed signal into different frequency bands based on power of two divisions (dyadic sampling). In multiresolution analysis based on Mallat algorithm, the signal is decomposed into approximation (low pass-bandwidth) and detail (high-pass bandwidth) coefficients which represent its low and high frequency components respectively. This is achieved by successive high pass and low pass filtering of the time domain signal and is defined by the following expressions:

$$y_{l} (n) = \sum\limits_{k} {G(2n - k)x(k)}$$
(2)
$$y_{h} (n) = \sum\limits_{k} {H(2n - k)x(k)}$$
(3)

The procedure of two levels decomposition is shown in Fig. 1, with G and H being the related low-pass and high-pass wavelet filters, respectively. This method has widely applied in mechanical fault diagnosis (Ebrahimi et al. 2012; Yan et al. 2014). But the DWT can only decompose the low frequency sub-band.

Fig. 1
figure 1

Two-level multi resolution DWT decomposition

2.2 Wavelet packet decomposition

The wavelet packet decomposition (WPD) is an extension of DWT, in which at all stages both the low- pass and the high-pass bands are split. Then, WPD at level j, generate 2j frequency bands with same bandwidth Fe/2j+1, and center frequency for node n given by:

$$f_{c} (j,n) = (n + \frac{1}{2})\frac{{F_{e} }}{{2^{j + 1} }}$$
(4)

For three levels, the recursive process based on WPD is presented in a binary tree (Fig. 2), where the wavelet packets are labelled with the depth (scale parameter) j and node (frequency parameter) n. The coefficient at depth 0 is the original signal. The last signal, at depth 1, is split into “low frequency” and “high frequency” parts. At depth 2 both the high and low parts are split. This process is repeated at each depth until the useful information is obtained. Therefore, WPD offers a more complex and flexible analysis than the DWT. In addition, it can allow a finer adjustable resolution of frequencies at high frequencies (Farajzadeh et al. 2016; Huo et al. 2017).

Fig.2
figure 2

WPD Binnary tree decomposition

3 Experimental results

The methodology of fault bearing diagnosis process is presented in Fig. 3, which includes the following procedures: signal collection, preprocessing signal using PWD, extraction of the most feature indicative of defect, and classification. Some detailed processes are discussed in the following section.

Fig. 3
figure 3

The proposed methodology for REB fault diagnosis

3.1 Experimental setup

The investigation in this paper is entirely based on vibration signal data obtained from the Case Western Reserve University bearing data center website (Case Western university). The real bearing data was collected with a sampling frequency of 12 kHz for four different load conditions (0, 1, 2 and 3 horse power) by accelerometers mounted at the drive end and fan end of the induction motor housing coupled to the load that can be varied within the operating range of the motor. Data was gathered for four different conditions: healthy (H), inner race fault (IRF), outer race fault (ORF) and ball fault (BF). Single point faults (cyclic defect) were artificially induced by using an electro-discharge machining. The size of fault, for the inner, outer race and ball, is 0.007 and 0.021 inch.

3.2 Signal processing data

Vibration signal contains abundant information about the operating state of a machine. Therefore, when the machine breaks down, we need to accurately and completely extract the fault information, which is indispensable for improving the accuracy and reliability of the fault diagnosis (Sheng et al. 2018). The selection of the optimal frequency related to impulsivity of a signal as a function of frequency band is a significant step in bearing fault diagnosis because it determines the fault information. In the aim to reveal the frequency and bandwidth of the resonance(s) excited by the bearing fault, FFT (fast Fourier transform) is firstly used to study the bearing ball condition. Figure 4 illustrates the vibration spectrum and its smoothed envelope (SE). The spectrum envelope (env) is obtained by the following equations:

$$X(f) = \left| {FFT(vib\,(t))} \right|$$
(5)
$$HT(X(f)) = Y(f) = 1/\pi \int\limits_{ - \infty }^{ + \infty } {\frac{X(\tau )}{{\tau - f}}d} \tau$$
(6)
$$env(f) = \sqrt {X(f)^{2} + Y(f)^{2} }$$
(7)

where FFT is the fast Fourier transform, HT is the Hilbert transform and vib is the temporal vibration signal.

Fig. 4
figure 4

Vibration spectrum and its SE (ball defect)

It is well worth pointing out that the sensitivity of ME is affected by the shape of the vibration spectrum to the faulty bearing. Figure 5 shows the experimental results for healthy bearing and bearings with several defects (inner, outer and ball faults). it can be seen that the frequency band associated to the faults is located at [2000–4000] Hz. In addition, in this range the magnitudes of SE are sensible to the type of bearing defect. Figure 6 displays the SE for different load levels (0, 2 and 3 hp) and with an inner race defect. By analyzing these results, it is obvious that the load variation does not affect the frequency range mentioned above. Thus, in term of energy, all information relating to the state's ball bearing (healthy or defect) is contained in this range. In order to extract this information, the vibration signal will be segmented by the use of WPD. The goal of the segmentation is to obtain wavelet coefficients which represent the original signal in resonance band frequencies. Therefore, by exploitation of this effect, the detection and diagnosis of bearing defect is possible.

Fig. 5
figure 5

SE for different fault at low load (1796 RPM)

Fig. 6
figure 6

SE for Inner race fault at different load

Fig. 7
figure 7

a Vibration spectrum, b W(3, 6) component spectrum (bearing ball fault)

Fig. 8
figure 8

a Vibration spectrum, b W(3, 6) component spectrum (bearing inner race fault)

Fig. 9
figure 9

a Vibration spectrum, b W(3, 6) component spectrum (bearing outer race fault)

3.2.1 Feature extraction

As mentioned above, the location of frequency band [2000–4000] Hz induced by bearing defect defined above is not more sensible to load but in this band the SE amplitudes are clearly sensible to the type of defect. This fact leads to have, during the detection process, an appropriate PWD decomposition level which can be fixed according to the diagnosis aimed objective. In this work, it was fixed to 3rd level and the vibration signal was analysed by PWD with daubechies mother wavelet function. Thus, the signals associated to these PWD's components coefficients depend on the frequency components imbedded in the analysed signal. Figures 7, 8 and 9 illustrate the spectrum of both vibration and packet wavelet component w(3, 6) under faulty condition (BF, IR, OR). As can be seen, the largest amplitude peak of w(3,6) frequency of rolling element fault signal appears at [2000–4000] Hz. It is clearly shown, that these amplitudes evolve according to the defect type. In order to make a robust diagnosis system, this coefficient (w(3,6)) can be selected to form the characteristic feature, and its RMS value is defined as:

$$C = \sqrt {\frac{1}{{N_{3,6} }}\sum\limits_{i = 1}^{{N_{3,6} }} {(d_{3,6} (i))^{2} } }$$
(8)

where N3,6 is the data length without the boundary effects of the WPD coefficient.

Figure 10 illustrates the evolution of the feature C according to the defect severity fault (DSF) (with 0.007 and 0.021 inch as diameter size) at different state, healthy, BF, IRF, ORF (ball, inner race and outer race faults) and under different load (0, 1, 2, 3 Hp). The results demonstrate that the data derived from this criterion can be classified in four separate sets related to healthy and faulty behavior. Thus, it is clear that this new distinctive feature, extracted under different DSF, is an efficient indicator to discriminate between healthy and faulty bearing states. Also, it allows locating the fault type. In the aim to automating the detection and the location of bearing defect this feature can be used as input, after normalization, to the ANN system.

Fig. 10
figure 10

RMS of Packet W(3,6) for different DSF (0.007 and 0.021 inch) at different state ( healthy, BF, IF and OF) and under different loads (0, 1, 2, 3 hp)

3.3 Application of artificial neural network to default diagnostic

The ANNs are considered as an alternative way to tackle complex and ill-defined problems. They learn from examples. An ANN is a collection of simple processing units, mutually interconnected with weights assigned to the connections. By modifying these weights according to a learning rule, the ANN can be trained to recognize a pattern given in the training data (Bouzid et al. 2007). ANN, known by its capacity of generalization, is considered as very powerful tool for nonlinear processing data. This justified the use of the ANN like principal tools for the diagnosis of induction machine defects. ANN can have different configuration in term of architecture.

In the literature, there are two main categories of structures static and dynamic networks. The most used in industrial monitoring are the multilayer perceptron (MLP) and radial basis function (RBF). In this work, the structure MLP is adopted. ANN is an assembly of formal neuron strongly connected by synaptic connections, which is inspired from the structure of human brain. Each artificial neuron is an elementary processor. It receives a number of entries (Ini) weighted via synaptic coefficients (wij) and generate the output (outi), in general case as:

$$out_{i} = f\left( {\sum\limits_{j = 1}^{n} {W_{ij} } In_{j} + b_{i} } \right)$$
(9)

where \(i = 1, \ldots ,m\) and f is the activation function which gives a non-linear aspect to the ANN and bi is the bias of ith neuron.

3.3.1 Structure of ANN

First, a feed forward MLP network based on back propagation training is used. In order to select an optimal neural network configuration with good generalization, several tests were carried out. The network which produces a minimum validation error will be selected as the optimum network. Typically, mean square error (MSE) was used to present the network performance which is defined by:

$$MSE = \frac{1}{N}\sum\limits_{i = 1}^{N} {(e_{i} )^{2} = } \frac{1}{N}\sum\limits_{i = 1}^{N} {(S_{i} - out_{i} )^{2} }$$
(10)

where \(e_{i} \,and\,S_{i}\) are respectively error and desired output.

Finally, the adopted ANN configuration is presented in Fig. 11. This network has one input C (RMS of PW(3.6)), three outputs which represent the bearing defect respectively in ball, inner, and outer and a hidden layer of five neurons. The activation function of hidden and output layers is “sigmoid".

Fig.11
figure 11

Neural network architecture

3.3.2 Experimental results

Before implementing the ANN, two main stages are essential: training and testing phases. There are a number of learning algorithms that can be used to make this artificial neural network learning the completed process. These methods can be classified as unsupervised and supervised algorithm. The MLP is a supervised network which needs a desired response to be trained. In this work the back-propagation algorithm, based on error correction between the desired output and the computed output, is used. A training data base, formed by the pertinent input and desired output, has been applied to train the ANN. Aiming at exploiting effectively the ANN, this data base should represent the complete range of the operating condition, which contains healthy and faulty cases. In order to enrich the database, we used experimental data collected from the different accelerometers positions. Then each group will be normalized between 0 and 1 values. For this reason, eighty percent of experimental collected data, after treatment and normalization, was used as training data base. This data base covers different state (healthy and defective) for each bearing fault location (BF, IRF and ORF) and under different load (0.0, 1, 2 and 3 hp). Figure 12 shows the training input outputs data set. The input data is composed by 54 examples corresponding to 6 examples of healthy operating, and 3 sequences of 16 examples which corresponds respectively to ball, inner race fault and outer race fault. The training output data set is produced by the following desired outputs Si which indicate the state of bearing:

Fig. 12
figure 12

Input outputs training data base of ANN classifier

S1 = for a ball fault; otherwise, S1 = 0;

S2 = for a inner race fault; otherwise, S2 = 0;

S3 = for a outer race; otherwise, S3 = 0;

Therefore, the output states of the NN are set to the following:

  • [1 0 0] ball fault;

  • [0 0 0] healthy state;

  • [0 1 0] inner race fault;

  • [0 0 1] outer race fault.

The results are obtained after 4000 epochs. The training outputs are shown in Fig. 13. The ANN has well learned the input data and has correctly reproduced the desired outputs.

Fig. 13
figure 13

ANN Training outputs

The ANN is tested for faults on each of the three types of faults. Figure 14 shows the test set introduced to the neural network. It is obvious that the results are more than satisfactory given that the neural network didn’t fail on any case. Consequently, according to these test results, we can conclude that the ANN is able to distinguish correctly between the healthy and faulty operating and it is capable to locate any type of faults.

Fig. 14
figure 14

Test input outputs of ANN classifier

4 Conclusion

The present paper presented an effective diagnosis system which combines wavelet transform and neural network techniques. It was shown that the vibration spectrum at frequency bandwidth [2000–400] Hz has given good information about bearing state. In order to extract characteristics and patterns embedded in this specified frequency bandwidth, a statistical feature extracted from the wavelet packet coefficients (w(3, 6)) is used as a fault indicator feature. Thus, the diagnostic process is automated via the RMS monitoring of w(3,6) by a simple MLP ANN. Finally, the experimental results have shown the effectiveness of the proposed diagnosis system.