Semi-supervised LSTM with historical feature fusion attention for temporal sequence dynamic modeling in industrial processes

doi:10.1016/j.engappai.2022.105547

Engineering Applications of Artificial Intelligence

Volume 117, Part A, January 2023, 105547

https://doi.org/10.1016/j.engappai.2022.105547 Get rights and content

Abstract

In modern industrial processes, the data-driven soft sensor technology has been widely used for the prediction of key quality variables. Due to the important of dynamics and nonlinearity in industrial process data, deep learning models like long short-term memory (LSTM) network are well suited for temporal sequence dynamic modeling due to their excellent long-term memory function and feature extraction capability. Furthermore, industrial processes generate a large amount of process data with irregular sampling frequencies. However, traditional LSTM cannot fully utilize the process data with irregular sampling frequency and the guidance value of historical data samples for feature learning. To address these issues, a novel semi-supervised LSTM with history feature fusion attention (HFFA-SSLSTM) model is proposed in this paper. First, the semi-supervised learning strategy is implemented in LSTM to fully utilize the unlabeled data and mine the temporal sequence features of labeled samples and unlabeled samples with irregular sampling frequencies. Then, a novel historical feature fusion attention (HFFA) mechanism is developed, which utilizes historical hidden features to learn attention scores for obtaining weighted historical information-related features. Finally, the extracted features are combined to form the soft sensor model to perform time series prediction tasks for key quality variables in industrial processes. The experimental results on the actual industrial hydrocracking data set demonstrate the effectiveness of the proposed HFFA-SSLSTM model and its possibility of applicating in real industrial processes.

Introduction

In modern industrial processes, the real-time measurement value of key quality variable is the important indicator to evaluate the production quality of industrial products and ensure the production safety (Zhou et al., 2021, Islam et al., 2022, Fang et al., 2021). Hence, the timely measurement of these key quality variables is of great significance to ensure efficient industrial production. However, due to the limitations of expensive measuring instruments and harsh industrial site environment, the values of quality variables are difficult to obtain by on-line measurement in actual industrial processes (Li et al., 2022, Seifi et al., 2022, Liu et al., 2020). Usually, these quality variable products are first sampled at the industrial site by sampling workers carrying sampling equipment. The collected samples are then sent to the laboratory for laboratory analysis to obtain their corresponding testing values. However, this method of off-line laboratory analysis has a long sampling period and a long time-consuming test, which is of little significance to the on-site workers. Fortunately, the introduction of soft sensor technology alleviates the problem of online measurement of quality variables in industrial processes (Alcántara et al., 2022, Parvez et al., 2022). In recent years, with the development of distributed control systems (DCS), the number of measurement and execution devices in industrial processes has continued to increase, which has prompted a large amount of industrial process data to be collected and stored. Therefore, data-driven soft sensor technology has been widely used in the quality prediction of industrial processes (Yao and Ge, 2019, Liu et al., 2022). The core of data-driven soft sensor technology is to establish a mathematical relationship model between difficult-to-measure quality variables and easy-to-measure process variables to predict difficult-to-measure quality variables (Sun and Ge, 2021a, Sun et al., 2022a). The predictive model established by the traditional linear regression algorithm is suitable for simple systems, but may not be suitable for complex process systems (Ge, 2015, Deng et al., 2022). With the rapid development of machine learning and the gradual maturity of machine learning theory, it has brought dawn to solve the above problems. Commonly used machine learning methods, including decision tree (Yeo and Grant, 2018), random forest (Chai and Zhao, 2020), artificial neural network (ANN) (Liu et al., 2021), Bayesian learning (Zeng and Sycara, 1998), have been used for data-driven soft sensor modeling applications. However, the above methods are all based on shallow network structures, which are not suitable for complex nonlinear processes.

In 2006, the emergence of deep learning has attracted widespread attention in the field of artificial intelligence and machine learning (Hinton, 2007). Different from traditional shallow networks, the structure of deep learning is to construct a multi-layer perceptron with multiple hidden layers to abstract low-level features into high-level features, thereby deep features in the data are extracted. It is a kind of neural network that simulates human brain to analyze and learn. The most widely used deep learning networks include multi-layer perceptron (MLP) (Shi et al., 2021), convolutional neural network (CNN) (Long et al., 2015), deep belief network (DBN) (Lian et al., 2020), stack autoencoder (SAE) (Sun and Ge, 2022) and so on. By far, deep learning has received more and more attention and extensive applications for data modeling like quality prediction and process monitoring in industrial processes (Sun et al., 2022b, Huang et al., 2020).

Although the above networks have achieved certain achievements in extracting nonlinear features from industrial process data, a common problem in industrial processes is that the process data is continuously sampled along the time series, which also makes the above networks not directly applicable. In addition, since the above networks mentioned are all static networks, it also makes it difficult for them to extract useful time series information and dynamic features from time series process data. Recurrent neural network (RNN), as a time series network, has great potential in time series modeling and is widely used to describe the temporal dynamic behavior of time series data (Lipton et al., 2015). For example, Su et al. introduced RNN for data modeling of curing degree during the production process of graphite fiber composites (Su et al., 1998). However, the basic RNN network is affected by the gradient vanishing and explosion problems, which makes it difficult to model long-term sequence data. Therefore, Hochreiter et al. proposed the long short-term memory network (LSTM) to learn long-term dependency information in time series data (Hochreiter and Schmidhuber, 1997). Due to its excellent feature extraction ability of time series data, LSTM has been widely used in data sequence modeling of industrial processes. For example, Shi et al. proposed Convolutional LSTM (ConvLSTM) to model radar echogram time series data for the precipitation nowcasting problem (Shi et al., 2015). To capture feature dependencies in data fluctuations, a multi-sequence feature LSTM method for prediction using feature-temporal patterns is proposed (Wang et al., 2020b). Wang et al. proposed a deep learning framework based on LSTM-SAE for quality prediction, which utilizes LSTM to extract comprehensive quality-relevant hidden features from a long sequence at each stage (Wang et al., 2020a).

In addition, due to the high cost and time-consuming acquisition of process data labels in the actual industrial process, the number of labeled process data is relatively small (Chen et al., 2022). Instead, there is a wealth of unlabeled data that can be utilized for data modeling. Therefore, semi-supervised modeling strategy is proposed to solve the problem of data modeling under limited labeled data by making full use of unlabeled data and labeled data (Zhu, 2007). Yuan et al. proposed a semi-supervised just-in-time learning framework for soft sensor modeling of nonlinear processes to deal with modeling problems with unequal lengths and few labeled data (Yuan et al., 2017). To tackle the problem of low prediction accuracy caused by insufficient labeled samples, a semi-supervised robust soft sensor modeling method based on the student mixture model is proposed (Shao et al., 2019). To deal with the limited labeled data and abundant unlabeled data, a semi-supervised pre-training strategy is designed for deep learning network based on semi-supervised stacked autoencoder (Yuan et al., 2020). Sun et al. proposed a method called ensemble semi-supervised gated stacked AE to address the problem that too many unlabeled samples cannot be fully utilized (Sun and Ge, 2021b). In this approach, gate units help to quantify the contributions of different hidden layers by establishing connections between different layers and output layers. In fact, in actual industrial processes, the continuity of industrial processes leads to strong time-series correlations in process data along the time dimension. Therefore, historical process data contains a large amount of time series feature information. The above semi-supervised method has achieved good prediction results in industrial applications to some extent. However, the existing methods still cannot effectively solve the three difficult problems issues in industrial process data modeling: (a) efficient fusion of unlabeled samples and labeled samples; (b) the sampling frequency of labeled data samples is not uniform; (c) a large amount of historical data cannot be fully utilized.

In order to solve the above issues, this paper proposes a novel semi-supervised LSTM with historical feature fusion attention (HFFA-SSLSTM) algorithm to capture meaningful historical dynamic characteristics in the process data. The semi-supervised learning strategy is introduced into LSTM to fully utilize unlabeled data and mine the temporal sequence features of labeled samples and unlabeled samples with irregular sampling frequencies. Furthermore, a novel historical feature fusion attention (HFFA) mechanism is developed that employs historical hidden features to learn attention scores in order to obtain weighted historical information-related features. In the case of limited labeled data and unlabeled data, the experimental results based on actual industrial process data show that the proposed algorithm can effectively predict the long-term and short-term changes of key quality variables, which solves the lag problem caused by off-line testing and avoids the instability of industrial process adjustment. The main contributions of our paper are given as follows.

(1)
A semi-supervised strategy is introduced into the LSTM model, which is called SSLSTM, to fully mine the temporal features of labeled samples and unlabeled samples with irregular sampling frequencies.
(2)
A novel historical feature fusion attention (HFFA) mechanism is designed to discover the correlation between the important information in the historical time data and the current time feature vector.
(3)
The extracted correlation is preserved to guide SSLSTM in efficiently and directly extracting the spatiotemporal hidden feature information of the time series.
(4)
Extensive experimental results on real-world industrial data sets validate the effectiveness of the proposed method and its applicability in real industrial processes.

The remaining sections of this paper are structured as follows. Section 2 introduces LSTM and attention mechanism. Then, the semi-supervised LSTM is overviewed and the proposed HFFA-SSLSTM is illustrated in detail in Section 3. After that, the effectiveness and feasibility of proposed approach are demonstrated in a real industrial case in Section 4. Finally, conclusions are given in Section 5.

Section snippets

LSTM

Due to the memory function of RNN, it has achieved great success in processing time series data. However, RNN network still suffer from problems such as vanishing or exploding gradients and long-term memory loss. To overcome the shortcomings of RNN, LSTM emerged as a variant of the standard RNN model. The internal structure of LSTM unit is shown in Fig. 1. LSTM is composed of three gate controllers, namely input gate, forget gate and output gate. Each gate is responsible for a specific

Semi-supervised LSTM network

In the actual industrial processes, harsh production environment and expensive online monitoring equipment are common phenomena. Therefore, it is necessary to send the sample to the laboratory for testing to obtain the label value of the sample, which requires a long time and high-test cost of testing. Moreover, due to the limitation of the number of measuring instruments, there will be a large amount of unlabeled data in the industrial processes. In addition, due to the different attribute

Case study

The time series regression modeling framework based on HFFA-SSLSTM is shown in Fig. 4. First, the obtained industrial process data is divided along the time series into the training set and the later part into the testing set. The training set includes labeled data and unlabeled data. Among them, the labeled data include easy-to-measured input variables and difficult-to-measured target variables, while unlabeled data only contain easy-to-measured input variables. Then, all the data in the

Concluding remarks

In this paper, a semi-supervised LSTM based on HFFA method is proposed for temporal sequence dynamic modeling in industrial processes. In the HFFA-SSLSTM model, not only both the labeled and unlabeled data of data sequence are introduced to learn the dynamic hidden feature states, but a new HFFA mechanism is proposed to capture important temporal sequence feature information in historical process data. In this way, the proposed HFFA-SSLSTM can obtain the most relevant hidden feature vectors

CRediT authorship contribution statement

Yiyin Tang: Conceptualization, Methodology, Review & editing. Yalin Wang: Resources, Software, Writing – original draft. Chenliang Liu: Writing – original draft, Review & editing, Validation. Xiaofeng Yuan: Formal analysis, Investigation. Kai Wang: Investigation, Validation. Chunhua Yang: Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) (U1911401), in part by the National Key Research and Development Program of China (2020YFB1713800), in part by the Natural Science Foundation of Hunan Province in China (2021JJ10065, 2022JJ20079), in part by the Training Plan of Outstanding Innovative Youngest of Changsha in China (kq2107007), and in part by the science and technology innovation Program of Hunan Province, PR China (2021RC4054).

References (39)

AlcántaraA. et al.
Direct estimation of prediction intervals for solar and wind regional energy forecasting with deep neural networks
Eng. Appl. Artif. Intell.
(2022)
HintonG.E.
Learning multiple layers of representation
Trends Cogn. Sci.
(2007)
HuangK. et al.
Sparse Bayesian learning for network structure reconstruction based on evolutionary game data
Physica A
(2020)
IslamM.K.B. et al.
Long range multi-step water quality forecasting using iterative ensembling
Eng. Appl. Artif. Intell.
(2022)
LiS. et al.
Knowledge-based operation optimization of a distillation unit integrating feedstock property considerations
Eng. Appl. Artif. Intell.
(2022)
LianP. et al.
Soft sensor based on DBN-IPSO-SVR approach for rotor thermal deformation prediction of rotary air-preheater
Measurement
(2020)
LiuC. et al.
Deep learning with nonlocal and local structure preserving stacked autoencoder for soft sensor in industrial processes
Eng. Appl. Artif. Intell.
(2021)
LiuY. et al.
Non-ferrous metals price forecasting based on variational mode decomposition and LSTM network
Knowl.-Based Syst.
(2020)
ParvezM.R. et al.
Real-time pattern matching and ranking for early prediction of industrial alarm floods
Control Eng. Pract.
(2022)
SeifiA. et al.
Multi-model ensemble prediction of pan evaporation based on the copula Bayesian model averaging approach
Eng. Appl. Artif. Intell.
(2022)

ShaoW. et al.

Semi-supervised robust modeling of multimode industrial processes for quality variable prediction based on student’s t mixture model

IEEE Trans. Ind. Inform.

(2019)

ShiY. et al.

Productivity prediction of a multilateral-well geothermal system based on a long short-term memory and multi-layer perceptron combinational neural network

Appl. Energy

(2021)

SuH.B. et al.

Monitoring the process of curing of epoxy/graphite fiber composites with a recurrent neural network as a soft sensor

Eng. Appl. Artif. Intell.

(1998)

SunY. et al.

A multiphase information fusion strategy for data-driven quality prediction of industrial batch processes

Inform. Sci.

(2022)

YaoL. et al.

Distributed parallel deep learning of hierarchical extreme learning machine for multimode quality prediction with big process data

Eng. Appl. Artif. Intell.

(2019)

YeoB. et al.

Predicting service industry performance using decision tree analysis

Int. J. Inf. Manage.

(2018)

YuanX. et al.

A novel semi-supervised pre-training strategy for deep networks and its application for quality variable prediction in industrial processes

Chem. Eng. Sci.

(2020)

ZengD. et al.

Bayesian learning in negotiation

Int. J. Human-Comput. Stud.

(1998)

ZhouP. et al.

Fast just-in-time-learning recursive multi-output LSSVR for quality prediction and control of multivariable dynamic systems

Eng. Appl. Artif. Intell.

(2021)

Cited by (16)

A task-oriented deep learning framework based on target-related transformer network for industrial quality prediction applications
2024, Engineering Applications of Artificial Intelligence
Executing various production tasks is critical to the safe operation and efficient production of industrial processes. As one of them, the detection task of key quality variables directly affects the operation optimization and decision-making of industrial processes, but it is severely limited by the harsh environment and detection instruments. Therefore, the real-time prediction task of key quality variables becomes the basis for optimal control of industrial processes. To address this issue, this paper proposes a task-oriented deep learning framework based on a target-related transformer (TR-Former) network for industrial quality prediction tasks. Specifically, a new target-related self-attention (TR-SA) mechanism is developed to guide feature learning by adding attention scores between task-related target variables and other variables. As a result, the learned features in this instance will be guaranteed to be relevant to the target variable and useful for the quality prediction task. Moreover, the long-range dynamics of industrial process data can also be captured, which can further improve the prediction performance of the model. Finally, extensive experiments were conducted on two industrial processes to validate the superiority of the proposed method in terms of quality prediction tasks. The experimental results demonstrate that the proposed TR-Former method exhibits an improvement ranging from 3% to 13% in the mean absolute error indicator compared to the traditional transformer and other state-of-the-art methods.
Multivariable correlation feature network construction and health condition assessment for unlabeled single-sample data
2024, Engineering Applications of Artificial Intelligence
The construction of effective health indicators is crucial for assessing system degradation, enabling anomaly detection and health condition assessment, which contribute to reducing costs, improving productivity, and enhancing system availability. However, there are notable challenges in the health condition assessment for unlabeled single-sample data. In this paper, a multivariable correlation feature network—the Nested Autoencoder (NAE) network, is proposed, which incorporates inter-variable correlations to constrain the construction of latent space, thereby enhancing the accuracy of health condition assessment. Moreover, the autoencoder parameters calculated with each discrete interval data are leveraged to construct latent feature graphs, enabling component anomaly detection. Furthermore, a comprehensive indicator is introduced to describe the topological changes of the graphs, facilitating the assessment of the health conditions of the system. Finally, the effectiveness of our method is validated on the N-CMAPSS dataset as well as a real satellite dataset.
A robust semi-supervised learning scheme for development of within-batch quality prediction soft-sensors
2024, Engineering Applications of Artificial Intelligence
The pivotal factor to regulate and enhance within an operating batch is the quality process, a challenging variable to monitor online. Soft sensors offer an immediate alternative for providing real-time insights into process quality, yet persisting issues include the imbalance between process and quality measurements, noisy measurements, and the intricate 3-dimensional dynamic batch data structure. This paper introduces the Robust Semi-Supervised Dual-Attention Latent Dynamic Conditional State-Space Model (RS2DA-LDCSSM) to address these challenges for within-batch quality prediction. Given the frequent absence of quality data due to measurement inconveniences, an imputation network is embedded within the RS2DA-LDCSSM to facilitate the prediction of future quality. To prevent information distortion during data unfolding, the Attentional Sequence-to-Sequence RNN Encoder-Decoder (AS2S-RNNED) is employed to process the 3-dimensional batch data. The proposed method integrates AS2S-RNNED with a probability state-space model to filter out noisy process data and stabilize the probability prediction of quality data, extracting spatial and temporal latent data from past process and quality data while minimizing the loss in multi-step prediction. This work represents a novel approach to dynamic nonlinear batch processes through probabilistic semi-supervised learning. The RS2DA-LDCSSM is adaptable to any batch process for within-batch quality prediction, as evidenced by numerical cases demonstrating its reliability, achieving an R2 value of 0.87, surpassing comparison methods. In an industrial penicillin fermentation batch process, the RS2DA-LDCSSM exhibits remarkable robustness with a high R2 index value of 0.99 in practical quality prediction scenarios, showcasing its efficacy.
A hybrid deep learning framework for conflict prediction of diverse merge scenarios at roundabouts
2024, Engineering Applications of Artificial Intelligence
The unique traffic situation at roundabouts causes complex interactions between merging vehicles, thereby increasing the likelihood of conflicts. Reliable prediction of conflict risk contributes to active safety improvement, but few studies have investigated the merge risk of roundabouts at a microscopic level. In light of this, this study develops a hybrid deep learning framework for predicting potential conflict risks in complex merging scenarios at roundabouts. Specifically, a roundabout coordinate system is devised to define vehicle characteristics based on trajectory data. Then, an improved 2D-TTC (time-to-collision) indicator is employed to identify two-dimensional merge conflicts. Since the surrounding vehicles may change as vehicles merge into a roundabout, this study analyzes several merging scenarios involving different vehicle groups and conflict durations in order to provide a comprehensive understanding of the conflict mechanism. For these scenarios, a hybrid model consisting of a convolutional neural network (CNN) and a long short-term memory network (LSTM) integrated with the convolutional block attention module (CBAM) is utilized to identify key features. The superiority of the proposed prediction method is demonstrated in comparisons with benchmark models. Results showed that segmental predictions were more accurate than overall predictions in terms of conflict duration. Furthermore, it is possible that a specific vehicle group has a decisive effect on the merging conflict risk, as indicated by the fact that information from multiple vehicle groups does not significantly improve the prediction performance. Another finding is that the driving state of vehicles merging at the roundabout varies considerably, but rarely with consecutive or multiple changes. The study provides novel insights into roundabout conflict prediction, which could serve as a tool for enhancing safety management involving complex traffic scenarios.
A novel parallel feature extraction-based multibatch process quality prediction method with application to a hot rolling mill process
2024, Journal of Process Control
In a hot strip rolling mill (HSRM) process, the prediction of the steel crown is a key factor in improving the quality of the strip steel. In this paper, a new multibatch feature extraction-based method is proposed for predicting the steel crown. Different from the cascaded feature extraction-based method which cannot extract both temporal and local features well, this method parallelly captures the feature between different batches of data using a method based on the multi-channel convolution neural network (MCNN) and long short-term memory (LSTM). The feature extraction is performed in parallel by an LSTM layer fusing variable attention and temporal attention, and a Multi-channel convolutional neural network fusing channel attention and spatial attention, which are used to extract temporal and local features of the input variables, respectively. Then, an LSTM-based fusion layer is used to incorporate both features for the development of the prediction model. The proposed method is applied to a cloud–edge-end collaborative prototype system, where the actual HSRM data is integrated. Based on the fact that an HSRM process commonly runs with the steel header crown data for the model update, an adaptive prediction method is also developed and deployed in the prototype system. It can be seen from the model complexity analysis and application results that the prediction performance improves by 42.70% compared with the cascaded feature extraction-based method, and the adaptive method can ensure a realtime prediction realization.
Dual attention-based multi-step ahead prediction enhancement for monitoring systems in industrial processes
2023, Applied Soft Computing
In industrial processes, the ability to predict future steps is essential as it offers long-term insights, benefiting strategic decision-making. However, traditional sequence-to-sequence models designed to predict dynamic behaviors suffer from accumulating errors during recurrent predictions which use previous outputs as inputs for the next time step. In this article, we propose a dual attention-based encoder–decoder framework, specifically designed to enhance multi-step ahead predictions in industrial processes. The dual attention model strategically minimizes the error accumulation of output sequence by leveraging a temporal attention mechanism, which focuses on relevant time-steps in the input sequence, and a supervised attention mechanism that assigns different weights to output sequence errors during training. The supervised attention method, in particular, provides a significant improvement by focusing on minimizing the error of earlier steps during backpropagation using predefined attention weights, resulting in enhanced overall multistep prediction performance. Experiments on real-world industrial datasets demonstrate that our approach outperforms baseline models, specifically simple sequence-to-sequence and single attention-based sequence-to-sequence models. In fact, our dual attention framework consistently surpasses single attention models, currently regarded as state-of-the-art, at all prediction stages. The suggested approach has potential applications in the field of process monitoring and model predictive control.

View all citing articles on Scopus

View full text

Semi-supervised LSTM with historical feature fusion attention for temporal sequence dynamic modeling in industrial processes

Abstract

Introduction

Section snippets

LSTM

Semi-supervised LSTM network

Case study

Concluding remarks

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Eng. Appl. Artif. Intell.

Trends Cogn. Sci.

Physica A

Eng. Appl. Artif. Intell.

Eng. Appl. Artif. Intell.

Measurement

Eng. Appl. Artif. Intell.

Knowl.-Based Syst.

Control Eng. Pract.

Eng. Appl. Artif. Intell.

IEEE Trans. Ind. Inform.

Appl. Energy

Eng. Appl. Artif. Intell.

Inform. Sci.

Eng. Appl. Artif. Intell.

Int. J. Inf. Manage.

Chem. Eng. Sci.

Int. J. Human-Comput. Stud.

Eng. Appl. Artif. Intell.