An experimental methodology to evaluate machine learning methods for fault diagnosis based on vibration signals

https://doi.org/10.1016/j.eswa.2020.114022Get rights and content

Highlights

  • Systematic analysis of overoptimistic results in machine learning fault diagnosis.

  • Computational framework to test new feature models.

  • Computational framework to test new classifier architectures

  • Experimental analysis of more realistic fault diagnosis scenarios.

Abstract

This paper presents a systematic procedure to fairly compare experimental performance scores for machine learning methods for fault diagnosis based on vibration signals. In the vast majority of related scientific publications, the estimated accuracy and similar performance criteria are the sole quality parameter presented. However, the experimental design giving rise to these results is mostly biased, based on unacceptably simple validation methods and on recycling identical patterns in test data sets, previously used for training. Moreover, the methods in general overfit their hyperparameters, introducing additional overoptimistic results. In order to remedy this defect, we critically analyse the usual training-validation-test division and propose an algorithmic guideline in the form of a validation framework. This allows a well defined comparison of experimental results. In order to illustrate the ideas of the paper, the Case Western Reserve University Bearing Data benchmark is used as a case study. Four distinct classifiers are experimentally compared, under gradually more difficult generalization tasks using the proposed evaluation framework: K-Nearest-Neighbor, Support Vector Machine, Random Forest and One-Dimensional Convolutional Neural Network. An extensive literature review suggests that most vibration based research papers, particularly for the Case Western Reserve University Bearing Data, use similar patterns for training and testing, making their classification an easy task.

Introduction

Software based fault diagnosis is an essential tool to guarantee the safety and maintainability of dynamic processes (Gao et al., 2015, Chiang et al., 2001). A principal distinction of employable methods is model-based fault diagnosis, c.f. for instance (Varga, 2017, Gertler, 2017, Ding, 2012), and model-free diagnosis, e.g. (Ding, 2016, McMillan and Vegas, 2019). Vibration based fault diagnosis focuses on analysing vibration signals in order to identify possible equipment faults. Some research works try to find signal characteristics or measures for identifying faults (Smith and Randall, 2015, Diaz et al., 2015, Diaz et al., 2017). Other research works focus on machine learning techniques that use the signals for training and testing classifiers for identifying faults. However, this work focuses on the latter approach.

The access to real world, well documented benchmark data is limited. A limited set of repositories with real vibration signals, obtained from existing mechanical systems, is available (Lee et al., 2007, Nectoux et al., 2012, Bechhoefer, 2016, Paderborn, 2020, MaFaulDa, 2016). The Case Western Reserve University (CWRU) Bearing Data (CWRU, 2014) is probably referred to the most in scientific literature. Some research papers use non publicly available vibration data sets (Liao et al., 2019, Lei et al., 2016, Verstraete et al., 2017).

An extensive amount of research works apply model-free, machine learning methods to classify distinct operational states of the process. A robust fault diagnosis system must be able to generalize well. This means that a trained classifier should be able to recognize as many faults as possible, even when there are variations of the machine conditions. Presented results must be reproducible and must be statistically significant; otherwise overoptimistic results are obtained. For instance, when a limited amount of samples are available, the data must be separated into non-overlapping sets. Each set may not be visible to the other set during the parametrization of the diagnosis system. It must be avoided that the same data is used both for tuning the hyperparameters of a classifier model and for testing the resultant classifier. Otherwise the results will probably be biased towards optimism. A typical example are the kernel and its associated variables in a Support Vector Machine (SVM). Which kernel is best for a C-SVM, RBF or Polynomial? And then, when using, for instance, the RBF kernel, use a grid search to find the best combination of the C regularization value and of the spread γ. Another example are deep learning structures of artificial neural networks where the layout is sometimes adjusted until it delivers the best results for the same data set. A more justified way of presenting performance scores is to isolate a test set completely. With the rest of the data, the hyperparameter tuning and the training can be done. When the final adjustments have been made, the test set is then used to estimate, for instance, an accuracy score. This procedure can be repeated several times, however the test set needs to always be kept apart. This hierarchy will be denoted as the inner and outer loop. Even then, when there is only a small amount of samples available for training, the performance may surpass theoretical Bayes limits.

Another common practice with vibration based machine learning research consists of defining a class, based on the chunks of a single chopped signal, even sometimes overlapping. Then, chunks of the same signal are used both on the training and testing data sets (Lei et al., 2016, Zhang et al., 2017, Verstraete et al., 2017, Liao et al., 2019). We call this the similarity bias problem. The patterns used for testing are almost indistinguishable from patterns used in the training data set. This fact may lead to an oversimplified model of real world fault diagnosis problems. A robust system must comprehend different machine conditions, and still be able to provide reasonable diagnostic information.

An important aspect when experimentally comparing fault diagnosis approaches consists of statistically verifying if the results are significantly different. However, this is seldom the case in the vibration based fault diagnosis research works.

The concern with the reproducibility of scientific research has steadily increased recently. Reproducibility is defined as obtaining consistent results using the same data and computational code as the original study. It has been found that many scientific studies are difficult or impossible to reproduce. The vibration based fault diagnosis works often have a lack of reproducibility.

This work proposes an experimental methodology for evaluating machine learning approaches for fault diagnosis based on vibration signals aiming to cope with all previously mentioned problems, i.e. completely isolating the test set, avoiding the similarity bias, verifying statistically significant differences and also allowing reproducibility. The main contributions of the paper are

  • 1. A methodology for machine learning applied to vibration signal analysis, integrating nested cross validation, reproducibility, statistical analysis and avoiding similarity bias. Except for the latter, all these topics have already been applied in an isolated manner in the context to fault diagnosis;

  • 2. An experimental study with synthetic data sets suggests superiority of the nested cross validation approach, especially for a small number of training samples;

  • 3. The identification of common problems of research papers in the area of model-free fault diagnosis;

  • 4. Identification of the overoptimistic bias, due to very similar training and test samples of the same class of machine conditions;

  • 5. A study of the CWRU database, the principal real vibration signal source used in the scientific literature of the area.

The rest of the paper is organized in the following manner: Section 2 reviews conventional performance estimation methods in supervised learning problems. Furthermore, the common cross validation techniques are improved by a two-level evaluation hierarchy that isolates performance estimation from hyperparameter tuning. Synthetic data sets with known Bayesian error bounds are used to juxtapose the conventional and improved evaluation methods, suggesting the mostly overoptimistic results in research papers. Section 3 focuses on the important aspects of the proposed methodology for avoiding the similarity bias, for checking statistical significance of result differences, and also how to guarantee reproducibility. Section 4 critically reviews research works that use the vibration signals on machine learning approaches for fault diagnosis. Special focus is given to the CWRU mechanical data set, emphasizing the applied methods, and highlighting, if applicable, the defects that motivate the elaboration of this study. Moreover, the paper shows how the methodology may be customized for a specific data set. Section 5 and Section 6 present the case study of the CWRU Bearing Data Set. Section 5 defines different experimental modes for fault diagnosis, with gradually increasing difficulties. Section 6 provides an application of the proposed framework for the CWRU data, with four different classifier models, varying the generalization difficulty of the fault diagnosis, and finally the conclusions are drawn in Section 7.

Section snippets

Supervised learning performance evaluation techniques

In this section, a fair performance estimation framework with an outer validation loop is defined and introduced by data sets with analytically known Bayesian error rates.

Important additional aspects of the proposed methodology

This section describes three important aspects of the proposed methodology: requirement of reproducibility, avoiding similarity bias, and verifying the statistical significance of the difference in the results.

Related research in vibration based fault diagnosis

Section 2 and Section 3 present an experimental methodology for evaluating model-free machine learning methods for fault diagnosis based on vibration signals. This methodology requires that experiments should use nested cross validation as their evaluation method, be completely described to allow their reproducibility, avoid the similarity bias problem and apply appropriate statistical tests for showing significant differences between the proposed methods and other state of the art methods from

Proposal for CWRU systematic performance comparison

This section initially describes how the CWRU signal files were used to define the classes of the fault diagnosis problem and generate the patterns used for training and testing the classifiers. It also describes which feature extraction models were applied on each chunk of data. The second part of the section presents three proposals for the distribution of patterns with different degrees of difficulty. The first form of division is proposed without concerning the effects of the similarity

Experiments

In the following we report classification experiment results with two principal objectives. Firstly, the tests are a practical application of the proposed nested performance evaluation. Secondly, a more realistic scenario for fault diagnosis is created by the fusion of several machine conditions into a single class. Moreover, in order to evaluate their generalization capabilities, the classifiers are subjected to data that has never been seen during training. We qualitatively show that the high

Conclusion

This work presents a performance evaluation framework for machine learning approaches for fault diagnosis from vibration signals. One experimental study is performed showing that nested cross validation is more reliable than conventional cross validation techniques. The work also identifies common methodological evaluation drawbacks on machine learning approaches for fault diagnosis. Special attention is given to the identification of the similarity bias problem and how it impacts the folds

Computational framework

In order to facilitate experimental comparisons of methods involving the CWRU data, Python source code and the full experimental results of this work are provided at http://bit.ly/2S0Dnhj. The programming framework allows to evaluate classification, feature extraction and feature selection methods. It is implemented using Numpy, Scikit-learn and Keras, following the design pattern of these libraries. The python source code of the experiments with the synthetic Fukunaga data can be found at

CRediT authorship contribution statement

Thomas Walter Rauber: Writing - original draft, Writing - review & editing. Antonio Luiz da Silva Loca: Software, Resources, Investigation. de Francisco Assis Boldt: Software, Data curation, Resources. Alexandre Loureiros Rodrigues: Writing - review & editing, Software. Flávio Miguel Varejão: Conceptualization, Methodology, Project administration, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (77)

  • H. Shao et al.

    Rolling bearing fault diagnosis using adaptive deep belief network with dual-tree complex wavelet packet

    ISA Transactions

    (2017)
  • C. Shen et al.

    Fault diagnosis of rotating machinery based on the statistical parameters of wavelet packet paving and a generic support vector regressive classifier

    Measurement

    (2013)
  • W.A. Smith et al.

    Rolling element bearing diagnostics using the case western reserve university data: A benchmark study

    Mechanical Systems and Signal Processing

    (2015)
  • H. Xu et al.

    An intelligent fault identification method of rolling bearings based on lssvm optimized by improved pso

    Mechanical Systems and Signal Processing

    (2013)
  • C. Yiakopoulos et al.

    Rolling element bearing fault detection in industrial environments based on a k-means clustering approach

    Expert Systems with Applications

    (2011)
  • J.-B. Yu

    Bearing performance degradation assessment using locality preserving projections

    Expert Systems with Applications

    (2011)
  • W. Zhang et al.

    A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load

    Mechanical Systems and Signal Processing

    (2018)
  • Y. Zhang et al.

    A new subset based deep feature learning method for intelligent fault diagnosis of bearing

    Expert Systems with Applications

    (2018)
  • M. Zhao et al.

    Fault diagnosis of rolling element bearings via discriminative subspace learning: Visualization and classification

    Expert Systems with Applications

    (2014)
  • X. Zhao et al.

    An effective procedure exploiting unlabeled data to build monitoring system

    Expert Systems with Applications

    (2011)
  • J.B. Ali et al.

    Application of empirical mode decomposition and artificial neural network for automatic bearing fault diagnosis based on vibration signals

    Applied Acoustics

    (2015)
  • Bechhoefer, E. (2016). A quick introduction to bearing envelope analysis. MFPT Data, http://www. mfpt....
  • Y. Bengio et al.

    Deep learning

    (2017)
  • Y. Benjamini et al.

    Controlling the false discovery rate: A practical and powerful approach to multiple testing

    Journal of the Royal Statistical Society: Series B (Methodological)

    (1995)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • G. Casella et al.
    (2002)
  • G.C. Cawley et al.

    On over-fitting in model selection and subsequent selection bias in performance evaluation

    Journal of Machine Learning Research

    (2010)
  • L. Chiang et al.

    Fault detection and diagnosis in industrial systems. Advanced textbooks in control and signal processing

    (2001)
  • T. Cover et al.

    Nearest neighbor pattern classification

    Information Theory, IEEE Transactions on

    (Jan. 1967)
  • CWRU (2014). Case Western Reserve University, Bearing Data Center. http://csegroups.case.edu/bearingdatacenter,...
  • F. de Assis Boldt et al.

    A fast feature selection algorithm applied to automatic faults diagnosis of rotating machinery

    Journal of Applied Computing Research

    (2014)
  • de Assis Boldt, F., Rauber, T. W., Varejão, F. M. & Ribeiro, M. P. (2015). Fast feature selection using hybrid ranking...
  • Diaz, M., Henríquez, P., Ferrer, M. A., Alonso, J. B., Pirlo, G. & Impedovo, D. (2015). Novel method for early bearing...
  • S. Ding

    Model-based fault diagnosis techniques: Design schemes, algorithms and tools. Advances in industrial control

    (2012)
  • S.X. Ding

    Data-driven design of fault diagnosis and fault-tolerant control systems

    (2016)
  • X. Ding et al.

    Energy-fluctuated multiscale feature learning with deep convnet for intelligent spindle bearing fault diagnosis

    IEEE Transactions on Instrumentation and Measurement

    (2017)
  • R.O. Duda et al.

    Pattern classification

    (2012)
  • L. Eren et al.

    A generic intelligent bearing fault diagnosis system using compact adaptive 1d cnn classifier

    Journal of Signal Processing Systems

    (2019)
  • Cited by (48)

    View all citing articles on Scopus
    View full text