Semi-supervised LSTM with historical feature fusion attention for temporal sequence dynamic modeling in industrial processes
Introduction
In modern industrial processes, the real-time measurement value of key quality variable is the important indicator to evaluate the production quality of industrial products and ensure the production safety (Zhou et al., 2021, Islam et al., 2022, Fang et al., 2021). Hence, the timely measurement of these key quality variables is of great significance to ensure efficient industrial production. However, due to the limitations of expensive measuring instruments and harsh industrial site environment, the values of quality variables are difficult to obtain by on-line measurement in actual industrial processes (Li et al., 2022, Seifi et al., 2022, Liu et al., 2020). Usually, these quality variable products are first sampled at the industrial site by sampling workers carrying sampling equipment. The collected samples are then sent to the laboratory for laboratory analysis to obtain their corresponding testing values. However, this method of off-line laboratory analysis has a long sampling period and a long time-consuming test, which is of little significance to the on-site workers. Fortunately, the introduction of soft sensor technology alleviates the problem of online measurement of quality variables in industrial processes (Alcántara et al., 2022, Parvez et al., 2022). In recent years, with the development of distributed control systems (DCS), the number of measurement and execution devices in industrial processes has continued to increase, which has prompted a large amount of industrial process data to be collected and stored. Therefore, data-driven soft sensor technology has been widely used in the quality prediction of industrial processes (Yao and Ge, 2019, Liu et al., 2022). The core of data-driven soft sensor technology is to establish a mathematical relationship model between difficult-to-measure quality variables and easy-to-measure process variables to predict difficult-to-measure quality variables (Sun and Ge, 2021a, Sun et al., 2022a). The predictive model established by the traditional linear regression algorithm is suitable for simple systems, but may not be suitable for complex process systems (Ge, 2015, Deng et al., 2022). With the rapid development of machine learning and the gradual maturity of machine learning theory, it has brought dawn to solve the above problems. Commonly used machine learning methods, including decision tree (Yeo and Grant, 2018), random forest (Chai and Zhao, 2020), artificial neural network (ANN) (Liu et al., 2021), Bayesian learning (Zeng and Sycara, 1998), have been used for data-driven soft sensor modeling applications. However, the above methods are all based on shallow network structures, which are not suitable for complex nonlinear processes.
In 2006, the emergence of deep learning has attracted widespread attention in the field of artificial intelligence and machine learning (Hinton, 2007). Different from traditional shallow networks, the structure of deep learning is to construct a multi-layer perceptron with multiple hidden layers to abstract low-level features into high-level features, thereby deep features in the data are extracted. It is a kind of neural network that simulates human brain to analyze and learn. The most widely used deep learning networks include multi-layer perceptron (MLP) (Shi et al., 2021), convolutional neural network (CNN) (Long et al., 2015), deep belief network (DBN) (Lian et al., 2020), stack autoencoder (SAE) (Sun and Ge, 2022) and so on. By far, deep learning has received more and more attention and extensive applications for data modeling like quality prediction and process monitoring in industrial processes (Sun et al., 2022b, Huang et al., 2020).
Although the above networks have achieved certain achievements in extracting nonlinear features from industrial process data, a common problem in industrial processes is that the process data is continuously sampled along the time series, which also makes the above networks not directly applicable. In addition, since the above networks mentioned are all static networks, it also makes it difficult for them to extract useful time series information and dynamic features from time series process data. Recurrent neural network (RNN), as a time series network, has great potential in time series modeling and is widely used to describe the temporal dynamic behavior of time series data (Lipton et al., 2015). For example, Su et al. introduced RNN for data modeling of curing degree during the production process of graphite fiber composites (Su et al., 1998). However, the basic RNN network is affected by the gradient vanishing and explosion problems, which makes it difficult to model long-term sequence data. Therefore, Hochreiter et al. proposed the long short-term memory network (LSTM) to learn long-term dependency information in time series data (Hochreiter and Schmidhuber, 1997). Due to its excellent feature extraction ability of time series data, LSTM has been widely used in data sequence modeling of industrial processes. For example, Shi et al. proposed Convolutional LSTM (ConvLSTM) to model radar echogram time series data for the precipitation nowcasting problem (Shi et al., 2015). To capture feature dependencies in data fluctuations, a multi-sequence feature LSTM method for prediction using feature-temporal patterns is proposed (Wang et al., 2020b). Wang et al. proposed a deep learning framework based on LSTM-SAE for quality prediction, which utilizes LSTM to extract comprehensive quality-relevant hidden features from a long sequence at each stage (Wang et al., 2020a).
In addition, due to the high cost and time-consuming acquisition of process data labels in the actual industrial process, the number of labeled process data is relatively small (Chen et al., 2022). Instead, there is a wealth of unlabeled data that can be utilized for data modeling. Therefore, semi-supervised modeling strategy is proposed to solve the problem of data modeling under limited labeled data by making full use of unlabeled data and labeled data (Zhu, 2007). Yuan et al. proposed a semi-supervised just-in-time learning framework for soft sensor modeling of nonlinear processes to deal with modeling problems with unequal lengths and few labeled data (Yuan et al., 2017). To tackle the problem of low prediction accuracy caused by insufficient labeled samples, a semi-supervised robust soft sensor modeling method based on the student mixture model is proposed (Shao et al., 2019). To deal with the limited labeled data and abundant unlabeled data, a semi-supervised pre-training strategy is designed for deep learning network based on semi-supervised stacked autoencoder (Yuan et al., 2020). Sun et al. proposed a method called ensemble semi-supervised gated stacked AE to address the problem that too many unlabeled samples cannot be fully utilized (Sun and Ge, 2021b). In this approach, gate units help to quantify the contributions of different hidden layers by establishing connections between different layers and output layers. In fact, in actual industrial processes, the continuity of industrial processes leads to strong time-series correlations in process data along the time dimension. Therefore, historical process data contains a large amount of time series feature information. The above semi-supervised method has achieved good prediction results in industrial applications to some extent. However, the existing methods still cannot effectively solve the three difficult problems issues in industrial process data modeling: (a) efficient fusion of unlabeled samples and labeled samples; (b) the sampling frequency of labeled data samples is not uniform; (c) a large amount of historical data cannot be fully utilized.
In order to solve the above issues, this paper proposes a novel semi-supervised LSTM with historical feature fusion attention (HFFA-SSLSTM) algorithm to capture meaningful historical dynamic characteristics in the process data. The semi-supervised learning strategy is introduced into LSTM to fully utilize unlabeled data and mine the temporal sequence features of labeled samples and unlabeled samples with irregular sampling frequencies. Furthermore, a novel historical feature fusion attention (HFFA) mechanism is developed that employs historical hidden features to learn attention scores in order to obtain weighted historical information-related features. In the case of limited labeled data and unlabeled data, the experimental results based on actual industrial process data show that the proposed algorithm can effectively predict the long-term and short-term changes of key quality variables, which solves the lag problem caused by off-line testing and avoids the instability of industrial process adjustment. The main contributions of our paper are given as follows.
- (1)
A semi-supervised strategy is introduced into the LSTM model, which is called SSLSTM, to fully mine the temporal features of labeled samples and unlabeled samples with irregular sampling frequencies.
- (2)
A novel historical feature fusion attention (HFFA) mechanism is designed to discover the correlation between the important information in the historical time data and the current time feature vector.
- (3)
The extracted correlation is preserved to guide SSLSTM in efficiently and directly extracting the spatiotemporal hidden feature information of the time series.
- (4)
Extensive experimental results on real-world industrial data sets validate the effectiveness of the proposed method and its applicability in real industrial processes.
The remaining sections of this paper are structured as follows. Section 2 introduces LSTM and attention mechanism. Then, the semi-supervised LSTM is overviewed and the proposed HFFA-SSLSTM is illustrated in detail in Section 3. After that, the effectiveness and feasibility of proposed approach are demonstrated in a real industrial case in Section 4. Finally, conclusions are given in Section 5.
Section snippets
LSTM
Due to the memory function of RNN, it has achieved great success in processing time series data. However, RNN network still suffer from problems such as vanishing or exploding gradients and long-term memory loss. To overcome the shortcomings of RNN, LSTM emerged as a variant of the standard RNN model. The internal structure of LSTM unit is shown in Fig. 1. LSTM is composed of three gate controllers, namely input gate, forget gate and output gate. Each gate is responsible for a specific
Semi-supervised LSTM network
In the actual industrial processes, harsh production environment and expensive online monitoring equipment are common phenomena. Therefore, it is necessary to send the sample to the laboratory for testing to obtain the label value of the sample, which requires a long time and high-test cost of testing. Moreover, due to the limitation of the number of measuring instruments, there will be a large amount of unlabeled data in the industrial processes. In addition, due to the different attribute
Case study
The time series regression modeling framework based on HFFA-SSLSTM is shown in Fig. 4. First, the obtained industrial process data is divided along the time series into the training set and the later part into the testing set. The training set includes labeled data and unlabeled data. Among them, the labeled data include easy-to-measured input variables and difficult-to-measured target variables, while unlabeled data only contain easy-to-measured input variables. Then, all the data in the
Concluding remarks
In this paper, a semi-supervised LSTM based on HFFA method is proposed for temporal sequence dynamic modeling in industrial processes. In the HFFA-SSLSTM model, not only both the labeled and unlabeled data of data sequence are introduced to learn the dynamic hidden feature states, but a new HFFA mechanism is proposed to capture important temporal sequence feature information in historical process data. In this way, the proposed HFFA-SSLSTM can obtain the most relevant hidden feature vectors
CRediT authorship contribution statement
Yiyin Tang: Conceptualization, Methodology, Review & editing. Yalin Wang: Resources, Software, Writing – original draft. Chenliang Liu: Writing – original draft, Review & editing, Validation. Xiaofeng Yuan: Formal analysis, Investigation. Kai Wang: Investigation, Validation. Chunhua Yang: Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (NSFC) (U1911401), in part by the National Key Research and Development Program of China (2020YFB1713800), in part by the Natural Science Foundation of Hunan Province in China (2021JJ10065, 2022JJ20079), in part by the Training Plan of Outstanding Innovative Youngest of Changsha in China (kq2107007), and in part by the science and technology innovation Program of Hunan Province, PR China (2021RC4054).
References (39)
- et al.
Direct estimation of prediction intervals for solar and wind regional energy forecasting with deep neural networks
Eng. Appl. Artif. Intell.
(2022) Learning multiple layers of representation
Trends Cogn. Sci.
(2007)- et al.
Sparse Bayesian learning for network structure reconstruction based on evolutionary game data
Physica A
(2020) - et al.
Long range multi-step water quality forecasting using iterative ensembling
Eng. Appl. Artif. Intell.
(2022) - et al.
Knowledge-based operation optimization of a distillation unit integrating feedstock property considerations
Eng. Appl. Artif. Intell.
(2022) - et al.
Soft sensor based on DBN-IPSO-SVR approach for rotor thermal deformation prediction of rotary air-preheater
Measurement
(2020) - et al.
Deep learning with nonlocal and local structure preserving stacked autoencoder for soft sensor in industrial processes
Eng. Appl. Artif. Intell.
(2021) - et al.
Non-ferrous metals price forecasting based on variational mode decomposition and LSTM network
Knowl.-Based Syst.
(2020) - et al.
Real-time pattern matching and ranking for early prediction of industrial alarm floods
Control Eng. Pract.
(2022) - et al.
Multi-model ensemble prediction of pan evaporation based on the copula Bayesian model averaging approach
Eng. Appl. Artif. Intell.
(2022)
Semi-supervised robust modeling of multimode industrial processes for quality variable prediction based on student’s t mixture model
IEEE Trans. Ind. Inform.
Productivity prediction of a multilateral-well geothermal system based on a long short-term memory and multi-layer perceptron combinational neural network
Appl. Energy
Monitoring the process of curing of epoxy/graphite fiber composites with a recurrent neural network as a soft sensor
Eng. Appl. Artif. Intell.
A multiphase information fusion strategy for data-driven quality prediction of industrial batch processes
Inform. Sci.
Distributed parallel deep learning of hierarchical extreme learning machine for multimode quality prediction with big process data
Eng. Appl. Artif. Intell.
Predicting service industry performance using decision tree analysis
Int. J. Inf. Manage.
A novel semi-supervised pre-training strategy for deep networks and its application for quality variable prediction in industrial processes
Chem. Eng. Sci.
Bayesian learning in negotiation
Int. J. Human-Comput. Stud.
Fast just-in-time-learning recursive multi-output LSSVR for quality prediction and control of multivariable dynamic systems
Eng. Appl. Artif. Intell.
Cited by (16)
A task-oriented deep learning framework based on target-related transformer network for industrial quality prediction applications
2024, Engineering Applications of Artificial IntelligenceMultivariable correlation feature network construction and health condition assessment for unlabeled single-sample data
2024, Engineering Applications of Artificial IntelligenceA robust semi-supervised learning scheme for development of within-batch quality prediction soft-sensors
2024, Engineering Applications of Artificial IntelligenceA hybrid deep learning framework for conflict prediction of diverse merge scenarios at roundabouts
2024, Engineering Applications of Artificial IntelligenceDual attention-based multi-step ahead prediction enhancement for monitoring systems in industrial processes
2023, Applied Soft Computing