Semi-supervised LSTM with historical feature fusion attention for temporal sequence dynamic modeling in industrial processes

https://doi.org/10.1016/j.engappai.2022.105547Get rights and content

Abstract

In modern industrial processes, the data-driven soft sensor technology has been widely used for the prediction of key quality variables. Due to the important of dynamics and nonlinearity in industrial process data, deep learning models like long short-term memory (LSTM) network are well suited for temporal sequence dynamic modeling due to their excellent long-term memory function and feature extraction capability. Furthermore, industrial processes generate a large amount of process data with irregular sampling frequencies. However, traditional LSTM cannot fully utilize the process data with irregular sampling frequency and the guidance value of historical data samples for feature learning. To address these issues, a novel semi-supervised LSTM with history feature fusion attention (HFFA-SSLSTM) model is proposed in this paper. First, the semi-supervised learning strategy is implemented in LSTM to fully utilize the unlabeled data and mine the temporal sequence features of labeled samples and unlabeled samples with irregular sampling frequencies. Then, a novel historical feature fusion attention (HFFA) mechanism is developed, which utilizes historical hidden features to learn attention scores for obtaining weighted historical information-related features. Finally, the extracted features are combined to form the soft sensor model to perform time series prediction tasks for key quality variables in industrial processes. The experimental results on the actual industrial hydrocracking data set demonstrate the effectiveness of the proposed HFFA-SSLSTM model and its possibility of applicating in real industrial processes.

Introduction

In modern industrial processes, the real-time measurement value of key quality variable is the important indicator to evaluate the production quality of industrial products and ensure the production safety (Zhou et al., 2021, Islam et al., 2022, Fang et al., 2021). Hence, the timely measurement of these key quality variables is of great significance to ensure efficient industrial production. However, due to the limitations of expensive measuring instruments and harsh industrial site environment, the values of quality variables are difficult to obtain by on-line measurement in actual industrial processes (Li et al., 2022, Seifi et al., 2022, Liu et al., 2020). Usually, these quality variable products are first sampled at the industrial site by sampling workers carrying sampling equipment. The collected samples are then sent to the laboratory for laboratory analysis to obtain their corresponding testing values. However, this method of off-line laboratory analysis has a long sampling period and a long time-consuming test, which is of little significance to the on-site workers. Fortunately, the introduction of soft sensor technology alleviates the problem of online measurement of quality variables in industrial processes (Alcántara et al., 2022, Parvez et al., 2022). In recent years, with the development of distributed control systems (DCS), the number of measurement and execution devices in industrial processes has continued to increase, which has prompted a large amount of industrial process data to be collected and stored. Therefore, data-driven soft sensor technology has been widely used in the quality prediction of industrial processes (Yao and Ge, 2019, Liu et al., 2022). The core of data-driven soft sensor technology is to establish a mathematical relationship model between difficult-to-measure quality variables and easy-to-measure process variables to predict difficult-to-measure quality variables (Sun and Ge, 2021a, Sun et al., 2022a). The predictive model established by the traditional linear regression algorithm is suitable for simple systems, but may not be suitable for complex process systems (Ge, 2015, Deng et al., 2022). With the rapid development of machine learning and the gradual maturity of machine learning theory, it has brought dawn to solve the above problems. Commonly used machine learning methods, including decision tree (Yeo and Grant, 2018), random forest (Chai and Zhao, 2020), artificial neural network (ANN) (Liu et al., 2021), Bayesian learning (Zeng and Sycara, 1998), have been used for data-driven soft sensor modeling applications. However, the above methods are all based on shallow network structures, which are not suitable for complex nonlinear processes.

In 2006, the emergence of deep learning has attracted widespread attention in the field of artificial intelligence and machine learning (Hinton, 2007). Different from traditional shallow networks, the structure of deep learning is to construct a multi-layer perceptron with multiple hidden layers to abstract low-level features into high-level features, thereby deep features in the data are extracted. It is a kind of neural network that simulates human brain to analyze and learn. The most widely used deep learning networks include multi-layer perceptron (MLP) (Shi et al., 2021), convolutional neural network (CNN) (Long et al., 2015), deep belief network (DBN) (Lian et al., 2020), stack autoencoder (SAE) (Sun and Ge, 2022) and so on. By far, deep learning has received more and more attention and extensive applications for data modeling like quality prediction and process monitoring in industrial processes (Sun et al., 2022b, Huang et al., 2020).

Although the above networks have achieved certain achievements in extracting nonlinear features from industrial process data, a common problem in industrial processes is that the process data is continuously sampled along the time series, which also makes the above networks not directly applicable. In addition, since the above networks mentioned are all static networks, it also makes it difficult for them to extract useful time series information and dynamic features from time series process data. Recurrent neural network (RNN), as a time series network, has great potential in time series modeling and is widely used to describe the temporal dynamic behavior of time series data (Lipton et al., 2015). For example, Su et al. introduced RNN for data modeling of curing degree during the production process of graphite fiber composites (Su et al., 1998). However, the basic RNN network is affected by the gradient vanishing and explosion problems, which makes it difficult to model long-term sequence data. Therefore, Hochreiter et al. proposed the long short-term memory network (LSTM) to learn long-term dependency information in time series data (Hochreiter and Schmidhuber, 1997). Due to its excellent feature extraction ability of time series data, LSTM has been widely used in data sequence modeling of industrial processes. For example, Shi et al. proposed Convolutional LSTM (ConvLSTM) to model radar echogram time series data for the precipitation nowcasting problem (Shi et al., 2015). To capture feature dependencies in data fluctuations, a multi-sequence feature LSTM method for prediction using feature-temporal patterns is proposed (Wang et al., 2020b). Wang et al. proposed a deep learning framework based on LSTM-SAE for quality prediction, which utilizes LSTM to extract comprehensive quality-relevant hidden features from a long sequence at each stage (Wang et al., 2020a).

In addition, due to the high cost and time-consuming acquisition of process data labels in the actual industrial process, the number of labeled process data is relatively small (Chen et al., 2022). Instead, there is a wealth of unlabeled data that can be utilized for data modeling. Therefore, semi-supervised modeling strategy is proposed to solve the problem of data modeling under limited labeled data by making full use of unlabeled data and labeled data (Zhu, 2007). Yuan et al. proposed a semi-supervised just-in-time learning framework for soft sensor modeling of nonlinear processes to deal with modeling problems with unequal lengths and few labeled data (Yuan et al., 2017). To tackle the problem of low prediction accuracy caused by insufficient labeled samples, a semi-supervised robust soft sensor modeling method based on the student mixture model is proposed (Shao et al., 2019). To deal with the limited labeled data and abundant unlabeled data, a semi-supervised pre-training strategy is designed for deep learning network based on semi-supervised stacked autoencoder (Yuan et al., 2020). Sun et al. proposed a method called ensemble semi-supervised gated stacked AE to address the problem that too many unlabeled samples cannot be fully utilized (Sun and Ge, 2021b). In this approach, gate units help to quantify the contributions of different hidden layers by establishing connections between different layers and output layers. In fact, in actual industrial processes, the continuity of industrial processes leads to strong time-series correlations in process data along the time dimension. Therefore, historical process data contains a large amount of time series feature information. The above semi-supervised method has achieved good prediction results in industrial applications to some extent. However, the existing methods still cannot effectively solve the three difficult problems issues in industrial process data modeling: (a) efficient fusion of unlabeled samples and labeled samples; (b) the sampling frequency of labeled data samples is not uniform; (c) a large amount of historical data cannot be fully utilized.

In order to solve the above issues, this paper proposes a novel semi-supervised LSTM with historical feature fusion attention (HFFA-SSLSTM) algorithm to capture meaningful historical dynamic characteristics in the process data. The semi-supervised learning strategy is introduced into LSTM to fully utilize unlabeled data and mine the temporal sequence features of labeled samples and unlabeled samples with irregular sampling frequencies. Furthermore, a novel historical feature fusion attention (HFFA) mechanism is developed that employs historical hidden features to learn attention scores in order to obtain weighted historical information-related features. In the case of limited labeled data and unlabeled data, the experimental results based on actual industrial process data show that the proposed algorithm can effectively predict the long-term and short-term changes of key quality variables, which solves the lag problem caused by off-line testing and avoids the instability of industrial process adjustment. The main contributions of our paper are given as follows.

  • (1)

    A semi-supervised strategy is introduced into the LSTM model, which is called SSLSTM, to fully mine the temporal features of labeled samples and unlabeled samples with irregular sampling frequencies.

  • (2)

    A novel historical feature fusion attention (HFFA) mechanism is designed to discover the correlation between the important information in the historical time data and the current time feature vector.

  • (3)

    The extracted correlation is preserved to guide SSLSTM in efficiently and directly extracting the spatiotemporal hidden feature information of the time series.

  • (4)

    Extensive experimental results on real-world industrial data sets validate the effectiveness of the proposed method and its applicability in real industrial processes.

The remaining sections of this paper are structured as follows. Section 2 introduces LSTM and attention mechanism. Then, the semi-supervised LSTM is overviewed and the proposed HFFA-SSLSTM is illustrated in detail in Section 3. After that, the effectiveness and feasibility of proposed approach are demonstrated in a real industrial case in Section 4. Finally, conclusions are given in Section 5.

Section snippets

LSTM

Due to the memory function of RNN, it has achieved great success in processing time series data. However, RNN network still suffer from problems such as vanishing or exploding gradients and long-term memory loss. To overcome the shortcomings of RNN, LSTM emerged as a variant of the standard RNN model. The internal structure of LSTM unit is shown in Fig. 1. LSTM is composed of three gate controllers, namely input gate, forget gate and output gate. Each gate is responsible for a specific

Semi-supervised LSTM network

In the actual industrial processes, harsh production environment and expensive online monitoring equipment are common phenomena. Therefore, it is necessary to send the sample to the laboratory for testing to obtain the label value of the sample, which requires a long time and high-test cost of testing. Moreover, due to the limitation of the number of measuring instruments, there will be a large amount of unlabeled data in the industrial processes. In addition, due to the different attribute

Case study

The time series regression modeling framework based on HFFA-SSLSTM is shown in Fig. 4. First, the obtained industrial process data is divided along the time series into the training set and the later part into the testing set. The training set includes labeled data and unlabeled data. Among them, the labeled data include easy-to-measured input variables and difficult-to-measured target variables, while unlabeled data only contain easy-to-measured input variables. Then, all the data in the

Concluding remarks

In this paper, a semi-supervised LSTM based on HFFA method is proposed for temporal sequence dynamic modeling in industrial processes. In the HFFA-SSLSTM model, not only both the labeled and unlabeled data of data sequence are introduced to learn the dynamic hidden feature states, but a new HFFA mechanism is proposed to capture important temporal sequence feature information in historical process data. In this way, the proposed HFFA-SSLSTM can obtain the most relevant hidden feature vectors

CRediT authorship contribution statement

Yiyin Tang: Conceptualization, Methodology, Review & editing. Yalin Wang: Resources, Software, Writing – original draft. Chenliang Liu: Writing – original draft, Review & editing, Validation. Xiaofeng Yuan: Formal analysis, Investigation. Kai Wang: Investigation, Validation. Chunhua Yang: Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) (U1911401), in part by the National Key Research and Development Program of China (2020YFB1713800), in part by the Natural Science Foundation of Hunan Province in China (2021JJ10065, 2022JJ20079), in part by the Training Plan of Outstanding Innovative Youngest of Changsha in China (kq2107007), and in part by the science and technology innovation Program of Hunan Province, PR China (2021RC4054).

References (39)

Cited by (16)

View all citing articles on Scopus
View full text