Keywords

1 Introduction

Bearing elements are crucial components and the leading cause of failures in induction motors. If timely detection of defects in a bearing is not performed, it could result in significant damage to equipment, interruption of production, and costly repairs. Therefore, the early detection of bearing faults is an integral part of condition-based maintenance that helps operators to provide a timely maintenance solution for the equipment.

There are many techniques that have been proposed for diagnosing bearing faults based on vibration signal. On the other hand, fault detection and diagnosis (FDD) of bearing defects has shown significant performance improvement using acoustic emission (AE) for low-speed bearings since AE-based methods are highly sensitive to the low energy acoustic emissions released by developing cracks in a bearing [1, 2]. Therefore, this study employs the AE signal to diagnose defects in the bearing.

One of the most common methods for diagnosing localized defects in a bearing is an envelope power spectrum (EPS) analysis method. In the EPS-based method, detecting peaks are strongly correlated with characteristic frequencies in the power spectrum of the envelope signal. However, these frequencies depend on the rotational speed of the device as well as the bearing geometry. Therefore, this method may not be suitable for the use in cases when the rotational speed of an equipment changes during the operation process. Another widely used method is the data-driven method. In the data-driven approach, as many features of the acquired signals are extracted as possible to define each fault distinctly and then used to develop discriminatory models for classifying different bearing faults. As bearing fault signals (e.g., AE in this study) are inherently non-stationary and non-linear, frequency domain analyses are often ineffective in helping to extract useful information to detect faults. To address this issue, we propose a method that divides a signal into four sub-bands by using discrete wavelet packet transform (DWPT) which is more efficient for analyzing non-stationary signals [3]. A set of four feature vectors are then extracted from four sub-bands instead of using one feature vector of other methods [4]. This study ensures a high-dimensional feature vector so that no information is missed, but this high-dimensional feature vector can be redundant and irrelevant. Therefore, a feature selection technique is applied based on the separability index (SI) of features, which is an evaluation metric that ensures the quality of the selected features. These selected features are only discriminative in a classifier point of view, but also provide a simpler model (i.e., a low-dimensional feature vector) for the classifier. Finally, a two-tier classifier (TTC) is applied to classify each fault type, which consists of two tiers; support vector machine groups (SVM-Groups) and a multilayer perceptron (MLP).

2 The Proposed Methodology for Bearing Fault Diagnosis

A block diagram of the proposed method is shown in Fig. 1. In this method, we use the discrete wavelet packet transform (DWPT) with Daubechies 15 mother wavelet to decompose the original signal into four sub-bands. In DWPT, each sub-band represents a resolution level of the original signal, and the number of sub-bands is determined based on empirical tests by evaluating sufficient extraction of discriminating fault information. For each sub-band signal, a total of twenty-two features are extracted including sixteen time-domain features such as peak value (PV), root mean square (RMS), kurtosis value (KV), crest factor (CF), clearance factor (CRF), impulse factor (IF), shape factor (SF), entropy value (EV), skewness value (SV), square mean root (SMR), 5th normalized moment, 6th normalized moment, mean value (MV), peak-to-peak (PPV), margin factor (MF), kurtosis factor (KF), and six frequency-domain features, such as frequency center (FC), RMS frequency (RMSF), root variance frequency (RVF), kurtosis frequency (KF), entropy frequency (EF), sum square frequency (SSF).

Fig. 1.
figure 1

The proposed method for bearing fault diagnosis using DWPT analysis and a two-tier classifier

Feature selection is an essential technique in such feature-based methods. It aims to remove non-useful features and select a subset of relevant features, thereby reducing the dimension of the feature space, the complexity of the system as well as improving system performance parameters such as accuracy [3]. In this study, we propose a new method for selecting informative features based on the separability index (SI) of features. These selected features then form a feature vector that used as input for a classifier. Finally, a two-tier classifier (TTC) is used, in which the first tier is a combination of support vector machine groups (SVM-Groups) as the primary classifier, and the second tier is a multilayer neural network (MLP) that uses output of the SVM-Groups in the first tier as input. Therefore, this MLP makes a final decision for classifying bearing faults. In the following sub-sections, details of feature selection and the two-tier classifier are given.

2.1 Feature Selection Method Base on a Separability Index

A feature is regarded to be useful, if it represents the samples of different fault classes as distinct clusters in its feature space. In Fig. 2(a), the shape factor (SF) and square mean root(SMR) features describe the pattern of each class as a dense cluster that is entirely separated from other clusters. This means that in this feature space, the classifier can easily classify groups of faulty signals. Therefore, these features are considered as useful features, while the peak and the impulse factor features are redundant ones, as shown in Fig. 2(b). In addition, a feature is determined to be good, if the variance between samples in the same class is small, and between samples that belong to different categories is large. Figure 2(c) and (d) are examples of two cases in which the features are considered to be good and redundant.

Fig. 2.
figure 2

Representation of fault classes in different feature spaces

In this study, we use a separability index (SI) as a tool for evaluating the discrimination of feature as well as comparing it with other features. This index is the ratio between the degree of separation (DoS) between the samples of different fault classes, and the average sparseness of the fault classes (SoC) in the same class feature space. The detail of this algorithm is described as follows:

figure a

After performing the steps of the SI algorithm, we choose ten features with the highest separability index, as the most distinguishable features in the feature pool. These features create a feature vector as input of a classifier.

2.2 The Architecture of Two-Tier Classifier

In this study, we use a two-tier classifier for classifying faults. The first tier is a combination of 8 SVM-Groups, corresponding to the eight types of faults which need to be classified. Each SVM-Group consists of four SVM members which trained by a feature vector extracted from the corresponding sub-band. Eight SVMs in the 8 SVM-Groups use the feature vector as an input that are extracted from the first sub-band for training using the one-against-all multi-class (OAA MC) SVM framework. We then carry out the above procedure for SVMs that use feature vectors sequentially derived from the second, third and fourth sub-bands.

SVM is a supervised binary classification algorithm that finds the hyperplane with a maximum width in the feature space of the two classes. Nonlinear classification problems can be solved by mapping samples in high-dimensional feature spaces by kernel functions [5]. In this study, we use the Gaussian radial basis function (RBF) as a kernel function and using the Lagrange multipliers to solve the optimization problem. The corresponding classification function for the SVM is as follows:

$$ \begin{aligned} & F(x) = \text{sgn} \left\{ {f(x)} \right\} \\ & where\quad f(x) = \sum\limits_{i = 1}^{n} {\alpha_{i}^{*} y_{i}^{*} K(x_{i}^{*} ,x)} + b \\ \end{aligned} $$
(4)

Here, \( \alpha_{i}^{*} \) represents the Lagrange multiplier corresponding to the support vector \( (x_{i}^{*} ,y_{i}^{*} ) \). The decision value \( f(x) \) represents the signed distance of an unknown observation \( x \) from the decision boundary and has ranged between \( - \infty \) to \( + \infty \). Figure 3 shows outputs of SVMs at the first stage, corresponding to 240 signal samples of 8 fault classes. As can be shown in the figure, the SVM-Groups reduces the ten original features into decision values that are more discriminative.

Fig. 3.
figure 3

The decision values of SVMs in SVM-groups

The second tier of the TTC is a multilayer perceptron (MLP) that uses the decision values of the SVMs as input. In this study, we use the MLP with one input layer, one hidden layer, and one output layer. The size of the input layer depends on the number of SVMs used at the first tier, and the size of the output layer depends on the number of classes that MLP needs to distinguish. We also use the Bayesian regularization backpropagation algorithm that based on Levenberg-Marquardt optimization to train MLP. Instead of using the standard cost function, this algorithm uses the new cost function, which is a combination of squared errors and weights that make the neural network less likely to overfitting by forcing it to have smaller weights and biases, thereby producing a well-generalized network.

3 Experimental Results and Discussion

We evaluate the effectiveness of the proposed method by using acoustic emission (AE) signals obtained from a bearing in both free-defect state (BFD) and seven different defects. These localized defects include single defects such as a crack in the outer raceway (BCO), crack on the inner raceway (BCI), crack on the roller (BCR); and combine-defects such as cracks on the inner and outer raceways (BCIO), cracks on the outer raceway and roller (BCOR), cracks on the inner raceway and roller (BCIR), and cracks on the inner raceway, outer raceway, and roller (BCIOR). To simulate the cases when bearing shaft changes during operation as in reality, we measure data at six different rotational speeds from 250 revolutions per minute (r/min) to 500 (r/min) with an increase of 50 (r/m). For each condition and at each rotational rate, we use 90 data samples with a signal length of 5 s to construct dataset. Thus, each dataset consists of \( N_{RPM} \times N_{Classes} \times N_{Signals} \) or 4320 samples, where \( N_{RPM} \) is the number of operating speeds for which the AE signals are recorded \( \left( {N_{RPM} = 6} \right) \), \( N_{Classes} \) is the total number of defect types or bearing conditions \( \left( {N_{Classes} = 8} \right) \), and \( N_{Signals} \) is the total number of AE signals recorded for each bearing condition at each shaft speed \( \left( {N_{Signals} = 90} \right) \). We use two datasets in this study; each dataset corresponds to a crack size of 3 mm or 12 mm. The datasets are presented in detail as given in Table 1.

Table 1. Description of the datasets for two different crack sizes.

To demonstrate the effectiveness of the proposed method, we compare the proposed method with envelope analysis-based methods in [4] and [4]* in terms of the average classification accuracy and average sensitive values. In the [4]*, instead of using all 22 features as the original method, we use only 9 features which are the narrow-band RMSF values of bands around each BPFI, BPFO, 2xBSF, up to third harmonics. These features are then evaluated and selected using outlier-insensitive hybrid feature selection algorithm. Finally, selected features are used to detect bearing faults using the k-NN classifier. Table 2 presents the result of the experiment. It is clear that the proposed method yields a better result than the method in [4]*, as this method only uses information about defect frequencies to diagnose bearing faults. The average classification accuracies of the proposed method are 99.1% and 99.6% for dataset 1 and dataset 2, respectively, and represent a significant improvement in comparison to methods in [4] and [4]*.

Table 2. Average classification accuracies and sensitivities for single and multiple combined bearing of variant bearing speed datasets

4 Conclusions

In this paper, a new method was presented to diagnose bearing faults in case of variable operating conditions such as 6 different shaft speeds. In the proposed method, the original signal was split into multiple sub-bands, where each sub-band was analyzed to select the most discriminative features to form a feature vector for training SVMs and MLPs in the TTC classifier framework. Our experimental result showed that the proposed method could efficiently diagnose both single and combined bearing defects under variable rotational speed conditions with a high classification accuracy.