Remaining useful life estimation of engineered systems using vanilla LSTM neural networks

doi:10.1016/j.neucom.2017.05.063

Neurocomputing

Volume 275, 31 January 2018, Pages 167-179

https://doi.org/10.1016/j.neucom.2017.05.063 Get rights and content

Highlights

•
Dynamic differential features – inter-frame dynamic changes contain a great amount of model degradation information. Therefore, a dynamic difference technology was used to extract new features from original datasets.
•
Higher accuracy – the vanilla LSTM can get higher accuracy than the standard RNN and GRU. As the research objects become more complex, the prediction accuracies did not obviously decrease.
•
Advanced Regularization mechanism and optimization algorithm – dropout is used to improve the generalization ability of vanilla LSTM while Adam algorithm is used to reduce the effects of learning rate on the final optimization results.

Abstract

Long Short-Term Memory (LSTM) networks are a significant branch of Recurrent Neural Networks (RNN), capable of learning long-term dependencies. In recent years, vanilla LSTM (a variation of original LSTM above) has become the state-of-the-art model for a variety of machine learning problems, especially Natural Language Processing (NLP). However, in industry, this powerful Deep Neural Network (DNN) has not aroused wide concern. In research focusing on Prognostics and Health Management (PHM) technology for complex engineered systems, Remaining Useful Life (RUL) estimation is one of the most challenging problems, which can lead to appropriate maintenance actions to be scheduled proactively to avoid catastrophic failures and minimize economic losses of the systems. Following that, this paper aims to propose utilizing vanilla LSTM neural networks to get good RUL prediction accuracy which makes the most of long short-term memory ability, in the cases of complicated operations, working conditions, model degradations and strong noises. In addition, to promote cognition ability about model degradation processes, a dynamic differential technology was proposed to extract inter-frame information. The whole proposition is illustrated and discussed by performing tests on a case of the health monitoring of aircraft turbofan engines which have four different issues. Performances of vanilla LSTM are benchmarked with standard RNN and Gated Recurrent Unit (GRU) LSTM. Results show the significance of performance improvement achieved by vanilla LSTM.

Introduction

In industry, the remaining useful life estimation of a system or component is usually dependent on operating conditions and sensor readings. Obviously, the more historical data available, the more accurate predictions will be. At the data competition of 1st international conference on Prognostics and Health Management (PHM08), Peel used Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF) networks [1] to estimate the RUL of aero-engines while Heimes used classical RNN. Heimes's algorithm got better performance than Peel's owing to the RNN's hidden units which implicitly contain information about the history of all past elements in the sequence [2]. By virtue of weight sharing idea and feedback structure, recurrent neural network is able to use all of the historical conditions and sensing data to predict with low model complexity. Nowadays, recurrent neural network has become one of the important subfields of deep learning [3]. It has been widely used to generate sequences in domains as diverse as music [4], speech [5], text [6] and motion capture data [7].

Unfortunately, trying to unfold recurrent connections of RNN lead us to find that RNN can be a very deep feedforward network. This is called “long-term dependencies”, making it hard to learn to store information for very long [8]. In the past two decades, researches made every effort to solve this issue and proposed some variants; the most famous two are Echo State Networks (ESN) [9] and LSTM, [10], respectively. On one side, since learning the recurrent and input weights is difficult, Jaeger and Haas [9] presented setting those weights such that the recurrent hidden units can capture the history of past inputs well, and only learn the output weights. This is the core of the echo state network. Echo state networks have been shown to be an effective RNN variant and achieved some success in RUL prediction problems, such as satellite lithium battery RUL estimation developed by Hong [11]. On the other side, the central idea behind the LSTM architecture is a memory cell which can maintain its state over time, and non-linear gating units which regulate the information flow into and out of the cell. However, the original LSTM (no forget gate), proposed by Hochreiter and Schmidhuber in 1997 [10], did not perform well. The most commonly used LSTM architectures nowadays were originally introduced by Graves and Schmidhuber [12]. People refer to it as vanilla LSTM. Vanilla LSTMs have forget gate allowing learning of continual tasks, and they use full gradients training instead of setting parts of weights just like ESN. Although it was also then modeled on vanilla LSTM to create a number of typical variants, such as GRU [13], Greff et al. did a nice comparison of popular variants, finding that “vanilla LSTM performs reasonably well on various datasets and using any of eight possible modifications does not significantly improve the LSTM performance” [14]. These modifications included GRU.

Above all, the aim of this paper is to utilize vanilla LSTM, which usually deals with supervised learning on language modeling, and related state-of-the-art technologies of feature extraction to improve accuracy in RUL prediction problems for complicated industrial objects. The main contributions of this paper are as follows:

(1)
Add dynamic differential features – raw features in RUL estimation problem are often stationary. But inter-frame dynamic changes (observed by the sensors under different operation conditions) contain a great amount of model degradation information. Therefore, a dynamic difference technology was used to extract new features from original health monitoring datasets.
(2)
Higher prediction accuracy – the vanilla LSTM can get higher prediction than the standard RNN and GRU under the same number of hidden neurons in a single layer. As research objects become increasingly complex, the prediction accuracies can be obtained by the model do not obviously decrease.
(3)
Advanced regularization mechanism and optimization algorithm – dropout mechanism is used to improve the generalization ability of vanilla LSTM while advanced optimization algorithm (Adam) is used to reduce the effects of learning rate on final optimization results.

This paper is organized as shown below. Application backgrounds of neural networks in RUL estimation of engineered systems are given in Section 2. This part illustrates advantages and drawbacks of classical neural networks when dealing with RUL estimation problems. On this basis, Section 3 proposes using vanilla LSTM to make effectively full of historical data to assess the RUL. This section also briefly introduces the main schemes of vanilla LSTM. Performances of vanilla LSTM are benchmarked by performing tests on aircraft turbofan engines datasets from NASA in Section 4. Four different issues are considered: a single fault and single operating mode problem, a single fault and multiple operating modes problem, a hybrid fault and single operating mode problem and a hybrid fault and multiple operating modes problem. Through comparisons with standard RNN and GRU LSTM, vanilla LSTM's excellent performance in RUL estimation field is demonstrated. Finally, Section 5 concludes this work and proposes some future aspects.

Section snippets

Neural networks in RUL estimation

As one of the most important members in the field of machine learning, neural network is considered as a mature prognostic algorithm. Multilayer perceptron, radial basis function networks and other neural networks have been widely used in anomaly detection, damage clustering and fault diagnosis. Excitedly, they achieved remarkable success [1], [15], [16], [17].

As for time series data, such as the samples in RUL prediction problem, researchers have been searching for more reasonable models and

Concept of vanilla LSTM

After refinement and popularization, the variant of LSTM, vanilla, is most commonly used in literature. The schematic of the vanilla LSTM block can be seen in Fig. 1.

As shown in Fig. 1, the core idea of LSTM lies in the information flows represented by the two black horizontal lines. The bottom one indicates the combination of input of the current time (X_t) and output of the previous time ( $h_{t - 1}$ ). In classical RNN, this integrated information is used for overwriting cell state directly. As for

Experiments and discussion

The aim of this part is to demonstrate fast modeling and enhanced performances of using vanilla LSTM in the challenge of RUL estimation, in comparison to standard RNN and GRU LSTM. Experiments are carried out on four aircraft turbofan engine simulation datasets which are injected faults unknown by data analyzers and work in different complex conditions with noises. In model training phase, Man Square Error (MSE) on cross validation set is used to evaluate performance of the trained neural

Conclusion

In this paper, vanilla LSTM neural networks, which usually work effectively in the field of natural language processing, are utilized to solve the bottleneck problem of high-precision RUL estimation for complicated engineered systems. Besides, a dynamic difference technology is proposed to extract new features from raw health monitoring data, with which RNNs can make full use of inter-frame information to find the real physical degradation mechanism behind sensor readings under complex and

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grant no.51375030. First of all, I would like to thank NASA Ames Research Center for providing turbofan engine degradation simulation data set. Secondly, I would like to express my sincere thanks to my supervisor Mei Yuan and Shaopeng Dong, who have given me so much useful advices on my writing and have tried their best to improve my paper. Last but not the least; I would like to thank my junior apprentice Lin Li and

Yuting Wu is a Master candidate student jointly educated by School of Automation Science and Electrical Engineering and School of Energy and Power Engineering, Beihang University. He received the Bachelor's degree from School of Information Science and Engineering, Central South University, Changsha, China, in 2014. His research interests include machine learning, data mining and deep learning.

References (26)

A. Graves et al.
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
Neural Netw.
(2005)
K. Javed et al.
SW-ELM: a summation wavelet extreme learning machine algorithm with a priori parameter initialization
Neurocomputing
(2014)
YuW. et al.
Learning deep representations via extreme learning machines
Neurocomputing
(2015)
P. Tamilselvan et al.
Failure diagnosis using deep belief learning based health state classification
Reliab. Eng. Syst. Saf.
(2013)
L. Peel, Data Driven Prognostics Using a Kalman Filter Ensemble of Neural Network Models, (IEEE, 2008), pp....
F.O. Heimes
Recurrent neural networks for remaining useful life estimation
LeCunY. et al.
Deep learning
Nature
(2015)
N. Boulanger-Lewandowski, Y. Bengio, & P. Vincent, (2012). Modeling temporal dependencies in high-dimensional...
A. Graves et al.
Speech recognition with deep recurrent neural networks
I. Sutskever et al.
Generating text with recurrent neural networks

I. Sutskever et al.

The recurrent temporal restricted boltzmann machine

In Advances in Neural Information Processing Systems

(2009)

Y. Bengio et al.

Learning long–short term dependency is difficult

IEEE Trans. Neural Netw.

(1994)

H. Jaeger et al.

Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication

Science

(2004)

Cited by (633)

A survey of deep learning-driven architecture for predictive maintenance
2024, Engineering Applications of Artificial Intelligence
Over the past decades, deep learning techniques have attracted increased attention from various research and industrial domains aligned with the development of Industry Internet-of-Things(IIoT). Specifically, with the advantage of data-driven methods, industrial organizations are seeking novel proactive strategies supported by analytic models to guarantee the quality of their production by observing degradation or predicting failure ahead of the occurrence of the component or asset. Predictive strategies are expected to promise the influence of unnecessary maintenance interruptions and mitigate the consequence of that, hence, extending the remaining useful life of products. This paper conducts a survey of the utilization of deep learning technologies on engineering applications where they provide satisfactory solutions with respect to specific data types or input signals. 106 primary papers are reviewed on deep learning–driven approaches which mainly explore five of the most popular architectures in the application of predictive maintenance. The main content of this paper summarizes the common advantages of each architecture and, accordingly, points out their limitations, as well as describes the application scopes of fully connected deep neural networks, convolutional neural networks, stacked autoencoders, deep belief networks, and deep recurrent neural networks. Based on the technique discussion for each of them, we intend to provide a comprehensive understanding and guidance of the appropriate usage of deep learning architectures to devise an effective predictive maintenance strategy for the scientific and industrial developers whose expertise lies in the prior domain knowledge of multi-source isomerization data. Moreover, the main content demonstrated the summarization of the decisive factor by which the incremental stages of the approaches were determined, fundamentally including the dataset specification, feature extraction, and the integration of deep learning approaches.
A regularized constrained two-stream convolution augmented Transformer for aircraft engine remaining useful life prediction
2024, Engineering Applications of Artificial Intelligence
Remaining Useful Life (RUL) prediction is of great significance for maintaining the reliability and safety of industrial equipment. To address the challenges faced by existing methods in simultaneously extracting local and global degradation information from monitoring data. This paper proposes a Two-Stream Convolution Augmented Transformer (TACT) model based on L₂ regularization constraint. Specifically, we design the parallel multi-scale Convolution Neural Network (CNN) and Transformer module to combine the local modeling ability of CNN and the global modeling ability of Transformer to improve the overall architecture of RUL prediction model. Moreover, the two-stream network based on the parallel structure also realizes the synchronous extraction of different time steps and sensor features in the sequence. Then, in the process of model training, the prediction reliability constraint is fused, the delay prediction constraint term is introduced, and the L₂ regularization loss function is constructed. Finally, extensive experiments on the commercial modular aero-propulsion system simulation (C-MAPSS) show that our model provides competitive performance in terms of Root-Mean-Square Error (RMSE) and Score metrics. Compared to the state-of-the-art method based on Recurrent Neural Network (RNN) or CNN and its variants, Score is reduced by at least 2.71% and RMSE by at least 3.13%. Compared to the Transformer-based improved method, the Score is decreased by at least 4.54% and the RMSE is decreased by at least 2.78%. The effectiveness of the proposed method is demonstrated.
A multiple conditions dual inputs attention network remaining useful life prediction method
2024, Engineering Applications of Artificial Intelligence
The degradation process of mechanical equipment shows considerable disparities under varying operating conditions. However, most existing remaining useful life (RUL) prediction methods ignore the influence of operating condition parameters on its degradation process. This results in their poor generalization performance in RUL prediction tasks of mechanical equipment. Additionally, the sequential transmission mode is adopted in the traditional time series models, leading to slower learning efficiency. To address these issues, a multiple conditions dual inputs attention network (MCDAN) is proposed for RUL prediction of mechanical equipment under complex fault modes and operating conditions. The model can learn the long short-term dependencies between sensor data in parallel without distance constraints to enhance its learning efficiency. Furthermore, it can simultaneously extract multi-dimensional performance degradation features of mechanical equipment from sensor data and operating condition data, and establish a mapping relationship between features and RUL. In addition, the model can effectively quantify the uncertainty of predicted RUL, providing more comprehensive information. Based on root mean square error, the performance of the MCDAN model was evaluated on the FD002 and FD004 datasets of C-MAPSS. The results demonstrated that compared with state-of-the-art models, the prediction accuracy of the MCDAN model has improved by 14.03% and 20.48%, respectively.
Experimental, numerical and deep learning modeling study of heat transfer in turbulent pulsating pipe flow
2024, Applied Thermal Engineering
Pulsating heat transfer is important in many engineering applications but is not fully understood or easily modeled. This work helps understand the phenomenon and demonstrates the efficacy and utility of deep learning modeling for pulsating heat transfer. The convective heat transfer coefficient (HTC) of pulsating flow through a copper tube was experimentally investigated at seventy-seven different operating conditions (500,000 time-series data points were collected at each operating condition) corresponding to different pulsating waveforms, frequencies, mass flow rates and pressures. The intensity of pulsation ranged from mild to strong enough to cause reverse flow. The literature on pulsating HTC is mixed, with several researchers reporting increased heat transfer and several other reporting decreased heat transfer relative to steady flow. The experimental data showed that both could be true depending on the ratio of the amplitude to mean flow velocity (amplitude ratio AR). The enhancement ratio (ER) of pulsating HTC relative to steady-flow HTC was measured to vary between 0.48 and 2.18 for AR ranging from 0.64 to14.42. A numerical study showed that this was only a (positive) correlation; fundamentally the overall pulsating HTC could be approximately explained by the variation of instantaneous Reynold’s number (Re) during the pulsating cycle. The distribution of instantaneous Re and hence the ER is primarily determined by the pulsating waveform, and this explains the disparate observations by different researchers. Long Short-Term Memory (LSTM) networks have been used to model pulsating HTC as a function of pulsating waveforms occuring across a measurement orifice. The ‘Toy Model’ concept where physics-based variables are used to increase generalization was implemented with the LSTMs for increased model robustness. These models could be used to maximize pulsating heat transfer by optimizing the waveform.
Enhanced residual convolutional domain adaptation network with CBAM for RUL prediction of cross-machine rolling bearing
2024, Reliability Engineering and System Safety
Remaining useful life (RUL) prediction of rolling bearing is one of the important measures to ensure the reliable operation of mechanical equipment. Most of the existing methods are domain adaptation (DA) based RUL prediction on the same machine with different conditions, but few on cross-machine. DA can cope with the data distribution discrepancy (domain shift) under different machines or other conditions, but the potential negative transfer will affect the effect of DA and prediction performance. Therefore, an enhanced residual convolutional domain adaptation network (ERCDAN) is designed for cross-machine rolling bearing RUL prediction. Firstly, the enhanced residual convolutional module (ERCM) is designed for degradation feature extraction from limited data, and with the convolutional block attention module (CBAM) to enhance the extracted features. Secondly, the DA module with a collaborative full connection structure and attenuation multi-kernel maximum mean discrepancy is designed for mitigating negative transfer to effective domain-invariant feature extraction. Finally, the experimental analysis of cross-machine rolling bearing RUL prediction is conducted on the PHM2012, XJTU-SY, and EBFL datasets. The results show that the proposed method can not only effectively achieve cross-machine RUL prediction, but also has good cross-bearing prediction performance with different conditions on the same machine, reflecting good generalization performance.
Pre-training enhanced unsupervised contrastive domain adaptation for industrial equipment remaining useful life prediction
2024, Advanced Engineering Informatics
An essential task in industrial intelligence is to accurately predict the remaining useful life(RUL) of industrial equipment, and there has been tremendous progress in RUL prediction based on data-driven methods. However, these methods rely heavily on the data representation ability of the model and the assumption of consistency in data distribution. In practical industrial environments, due to different working conditions, industrial time series data exhibit high-dimensional, dynamic, and noisy characteristics, which often leads to ineffective transferability of trained models from one environment to similar yet unlabeled new environments. To tackle the aforementioned issues, this paper first designed a dual parallel time–frequency feature extraction network for extracting effective time-series features with different dimensions and importance levels. Afterwards, an enhanced pre-training framework is proposed that employs similarity contrast learning to unearth the latent representational information in industrial time-series data. Finally, a domain adaptation method based on momentum-contrast adversarial learning is proposed, which preserves the structural information specific to the target domain during adversarial learning domain-invariant features, mitigating the negative transfer effect. A series of rigorous experiments were conducted on two widely recognized industrial benchmark dataset, focusing on cross-domain scenarios. The results demonstrate that our approach achieves state-of-the-art performance in industrial cross-domain prediction scenarios.

View all citing articles on Scopus

Mei Yuan is an Associate Professor at School of Automation Science and Electrical Engineering, Collaborative Innovation Center for Advanced Aero-Engine, Beihang University, Beijing, China. Her current research interests include prognostic and health management, advanced signal processing, embedded systems, structural health monitoring of complex system. She is currently the Member and Secretary of Chinese Society of Aeronautics and Astronautics GNC branch, Director and Member of Chinese Instrument and Control Society SHM branch, senior member of Chinese Metrology Society.

Shaopeng Dong is currently a Lecturer and working towards the Ph.D. degree at School of Automation Science and Electrical Engineering, Beihang University, Beijing, China. He received the Bachelor's degree in Automation from China Agriculture University, Beijing, China in 2004. He received the Master's degree in Detection Technology and Automatic Equipment from Beihang University, Beijing, China in 2007.

His main research interests include prognostic and health management, embedded system, signal processing, structural health monitoring of complex system.

Li Lin is a Master candidate student in the School of Energy and Power Engineering, Beihang University. She received the bachelor`s degree from the School of Electric Engineering and Automation, Hefei University of Technology, Anhui, China, in 2015. Her research interests include machine learning, automatic test system and sensor technology.

Yingqi Liu is currently a Master candidate in the school of Automation Science and Electrical Enginnering at Beihang University. He is Graduated in Ecole Centrale de Pekin from Beihang University, Beijing, China, in 2015. His main research interests include prognostic and health management (PHM), deep learning, machine learning and data mining.

View full text

Remaining useful life estimation of engineered systems using vanilla LSTM neural networks

Highlights

Abstract

Introduction

Section snippets

Neural networks in RUL estimation

Concept of vanilla LSTM

Experiments and discussion

Conclusion

Acknowledgments

Neural Netw.

Neurocomputing

Neurocomputing

Reliab. Eng. Syst. Saf.

Recurrent neural networks for remaining useful life estimation

Deep learning

Nature

Speech recognition with deep recurrent neural networks

Generating text with recurrent neural networks

The recurrent temporal restricted boltzmann machine

In Advances in Neural Information Processing Systems

Learning long–short term dependency is difficult

IEEE Trans. Neural Netw.

Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication

Science