Anomaly detection and diagnosis for wind turbines using long short-term memory-based stacked denoising autoencoders and XGBoost

https://doi.org/10.1016/j.ress.2022.108445Get rights and content

  • An anomaly detection and diagnosis method for wind turbines.

  • Abnormal data recognition algorithm based on LOF and adaptive K-means.

  • Normal behavior model based on LSTM-SDAE.

  • Anomaly location by contribution analysis and XGBoost.

Abstract

An anomaly detection and diagnosis method for wind turbines using long short-term memory-based stacked denoising autoencoders (LSTM-SDAE) and extreme gradient boosting (XGBoost) is proposed in this paper. First, an abnormal data recognition algorithm based on the local outlier factor and adaptive K-means was developed to implement data preprocessing and noise extraction. The LSTM-SDAE model was then established to obtain the nonlinear temporal relationship among multivariate variables in normal behavior modes. The Mahalanobis distance was calculated based on reconstruction errors and the threshold for anomaly detection was set with a 99.7% confidence interval for the distribution curve fitted by kernel density estimation. An alarm mechanism based on the sliding window technique was set up to detect abnormalities in real time. Finally, contribution analysis was conducted to extract the parameter features under different abnormal modes, and the XGBoost was trained by extended data from wind turbines of the same type in the same wind farm to realize anomaly location and diagnosis. To verify the proposed method, real SCADA data from a wind farm located in northeastern China were applied. The results show the capability of the proposed method in anomaly detection and diagnosis for wind turbines.

Introduction

With the increasing pressure caused by the fossil energy shortage and environmental degradation, optimizing the global energy structure and seeking new energy models have become significant and urgent. As one of the most competitive and promising renewable energy resources, wind energy has shown strong development momentum in recent years. The total capacity of wind energy had increased globally from 194.4 GW in 2010 to 743 GW in 2020, with an average annual rate of 14.3% according to statistics from the Global Wind Energy Council [1]. Over 469 GW of new onshore and offshore wind capacity is expected to be added in the next five years, which is approximately 94 GW of new installations annually until 2025. To achieve its net zero target (peak CO2 emissions before 2030 and carbon neutrality by 2060), China has made a series of commitments to scale up wind and renewable energy capacity. Various ministries and provincial level bodies have now been undertaking strategic measures for planning and implementation [1]. Industrial consensus has been reached to realize 50 GW of annual installations from 2021 to 2025 and 60 GW from 2026 onward. In 2020, China's annual wind power generation was 466.5 GWh, which is a 15.1% year-on-year increase and accounts for 6.21% of its total power generation. It is also envisioned that 8.4% and 17% of the nationwide electricity demand will be met by wind power in 2030 and 2050, respectively [2].

However, the high cost of operation and maintenance (O&M) (10–15% of the total income for onshore farms and 20–25% for offshore wind turbines at 20 years of operating life) has become a great obstacle hindering the sustainable development of the wind power industry [3]. The variable operating environment, complex system structure, limited accessibility, lack of status information, and backward management are the main factors. Studies have been conducted to reduce the O&M cost recently by implementing condition monitoring systems [4], [5], [6], maintenance strategy optimization [7], [8], [9], [10], [11], [12], decision support systems [13], [14], [15], [16] and maintenance resource management [17], [18], [19]. Meanwhile, as the data-driven economy evolves, enterprises have started to utilize big data techniques to guarantee maximum uptime throughout the production chain and to increase productivity while reducing production cost [20, 21].

Currently, supervisory control and data acquisition (SCADA) systems have been installed in most of the existing wind turbines for real-time remote monitoring and recording of parameters and control. Performing advance anomaly detection for wind turbines based on SCADA data has become lucrative, and the current methodologies can be divided into two main categories: model-based and data-driven approaches. Model-based methods (e.g., [22], [23], [24]) require accurate mathematical or physical models of wind turbines and their subsystems, which are always unavailable. With the development of artificial intelligence and big data analysis technologies, data-driven methods to mine hidden information in SCADA data have become feasible. Normal behavior models (by neural network (NN) [25], support vector regression [26], Gaussian process [27], cointegration analysis [28], boundary model [29], nonlinear state estimation technique [30], and nonlinear autoregressive NNs with exogenous inputs [31]) are usually established, and a significant deviation (e.g., residuals between model estimates and measured parameters, and distance from the cluster center) from normal behavior can be recognized as an anomaly [32]. Moreover, to exploit the high-dimensional and large-scale SCADA data, deep learning has gained continuous attention recently in anomaly detection for wind turbines. Through multilayer nonlinear information processing units, deep learning is capable of modeling high-level abstractions using multiple processing layers with complex structures [33], [34], [35], [36], [37], [38]. However, SCADA data are mutually coupled and interrelated with significant spatiotemporal nonlinear relationships and most data are collected during normal operating conditions while faulty data are usually scarce and sometimes even unavailable.

Moreover, several methods (such as statistics-based [39], modeling-based [40], clustering-based [41], and image-based [42]) have been reported in wind turbine abnormal data identification, whereas there still exist a few limitations: 1) high computational overhead is required; 2) susceptible to selected data sets or hyperparameters. Thus, an effective and robust recognition algorithm for eliminating abnormal data needs further study.

The motivation of this manuscript lies in the establish of a multiparameter fusion condition monitoring model based on SCADA data. In this study, an anomaly detection and diagnosis method for wind turbines using long short-term memory-based stacked denoising autoencoders (LSTM-SDAE) and extreme gradient boosting (XGBoost) is proposed. An abnormal data recognition algorithm based on the local outlier factor (LOF) and adaptive K-means was first developed to implement data preprocessing and noise extraction. The LSTM-SDAE model was then established to learn the spatial correlation information between several different variables and the temporal characteristics of each variable in normal behavior modes. The Mahalanobis distance (MD) was calculated based on reconstruction errors and the threshold for anomaly detection was set with a 99.7% confidence interval for the distribution curve fitted by kernel density estimation (KDE). An alarm mechanism based on the sliding window technique was set up to detect abnormalities in real time. Finally, contribution analysis was conducted to extract the parameter features under different abnormal modes, and the XGBoost was trained by extended data from wind turbines of the same type in the same wind farm to realize anomaly location and diagnosis.

There are two major contributions of this study:

  • 1. An abnormal data recognition algorithm based on the LOF and adaptive K-means.

  • 2. An anomaly detection and diagnosis method for wind turbines using LSTM-SDAE and XGBoost.

Section snippets

SYSTEM model

The framework of the wind turbine anomaly detection and diagnosis based on LSTM-SDAE and XGBoost is shown in Fig. 1. In the offline modeling stage, historical SCADA data with both healthy operational data and system alarm logs are collected. The healthy operational data are used to train the LSTM-SDAE based normal behavior model after being processed by the LOF and adaptive K-means algorithm. MDs are then calculated for validation datasets based on residuals, and an alarm threshold η is

SCADA data

The SCADA data used in this research were obtained from an onshore wind farm located in northeastern China. Twenty-five parameters with an acquisition interval of 1 min were preserved for the condition monitoring of critical components or subsystems after getting rid of redundant and useless signals, as listed in Table 1. In addition, alarm logs (e.g., fault events, warnings, or other relevant information) indicating changes in the turbine operating state were also obtained from the SCADA

LSTM-SDAE model

To learn the spatiotemporal nonlinear relationship of high-dimensional SCADA variables, the LSTM-SDAE neural network is proposed with the autoencoder (AE) hidden layers replaced by LSTM units, as depicted in Fig. 4. Here xt is the original input; x is the reconstructed output; hte and ste are the output and memory cell state of the LSTM unit in the encoding process, respectively; htd and std are the output and memory cell state of the LSTM unit in the decoding process, respectively; and It is

NUMERICAL examples

In this section, specific examples from a wind farm located in northeastern China are presented to evaluate the performance of the proposed model in anomaly detection and diagnosis for wind turbines.The SCADA system was officially launched in October 2019 and a large amount of one-minute data has been collected since then, together with various typical alarm records. Normally, SCADA data with a length of 4 months before the occurrence of each typical alarm are separately collected, the first

CONCLUSIONS and future work

An anomaly detection and diagnosis method for wind turbines using LSTM-SDAE and XGBoost is proposed in this paper. An abnormal data recognition algorithm based on the LOF and adaptive K-means was first developed to implement data preprocessing and noise extraction. The LSTM-SDAE model was then established to learn the spatial correlation information between several different variables and the temporal characteristics of each variable in normal behavior modes. The MD was calculated based on

CRediT authorship contribution statement

Chen Zhang: Conceptualization, Methodology, Software, Writing – original draft. Di Hu: Conceptualization, Software, Writing – review & editing. Tao Yang: Conceptualization, Validation, Resources, Writing – review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (47)

  • J. Tautz-Weinert et al.

    Sensitivity study of a wind farm maintenance decision-A performance and revenue analysis

    Renew Energy

    (2019)
  • X. Li et al.

    A decision support system for strategic maintenance planning in offshore wind farms

    Renew Energy

    (2016)
  • C. Zhang et al.

    Optimal maintenance planning and resource allocation for wind farms based on non-dominated sorting genetic algorithm-ΙΙ

    (2021)
  • T. Yan et al.

    Joint maintenance and spare parts inventory optimization for multi-unit systems considering imperfect maintenance actions

    Reliability Engineering & System Safety

    (2020)
  • X. Zhang et al.

    Optimal Condition-based Opportunistic Maintenance and Spare Parts Provisioning for a Two-unit System using a State Space Partitioning Approach

    Reliability Engineering & System Safety

    (2021)
  • R. Sahal et al.

    Big data and stream processing platforms for Industry 4.0 requirements mapping for a predictive maintenance use case

    Journal of Manufacturing Systems

    (2020)
  • Z. Feng et al.

    Time-frequency analysis based on Vold-Kalman filter and higher order energy separation for fault diagnosis of wind turbine planetary gearbox under nonstationary conditions

    Renew Energy

    (2016)
  • L. Tao et al.

    Abnormal Detection of Wind Turbine Based on SCADA Data Mining

    Mathematical Problems in Engineering

    (2019)
  • P.B. Dao et al.

    Condition monitoring and fault detection in wind turbines based on cointegration analysis of SCADA data

    Renew Energy

    (2018)
  • J. Chen et al.

    Anomaly detection for wind turbines based on the reconstruction of condition parameters using stacked denoising autoencoders

    Renew Energy

    (2020)
  • U. Saeed et al.

    Fault diagnosis based on extremely randomized trees in wireless sensor networks

    Reliability Engineering & System Safety

    (2021)
  • J. Lei et al.

    Fault diagnosis of wind turbine based on Long Short-Term memory networks

    Renew Energy

    (2019)
  • J. Liu et al.

    Fault prediction of bearings based on LSTM and statistical process analysis

    Reliability Engineering & System Safety

    (2021)
  • Cited by (0)

    View full text