Elsevier

Information Sciences

Volume 596, June 2022, Pages 280-303
Information Sciences

Stacked maximal quality-driven autoencoder: Deep feature representation for soft analyzer and its application on industrial processes

https://doi.org/10.1016/j.ins.2022.02.049Get rights and content

Abstract

Deep learning based soft analyzers are important for modern industrial process monitoring and measurement, which aim to establish prediction models between quality data and easy-to-measure variables. However, in traditional deep learning methods, the guidance of quality information on feature extraction is insufficient and easily reduces as data dimension increases. In this paper, a stacked maximal quality-driven autoencoder (SMQAE) is proposed to extract maximal quality-relevant features for soft analyzers. In each maximal quality-driven autoencoder, quality variables are reconstructed together with the input variables in the output layer. The SMQAE ensures that the influence of the quality part and input part on the reconstruction are the same. And the maximal information coefficient (MIC), which is not limited to any specific function type, is exploited to enhance the importance of quality-related variables in the input part. With the constraint of the quality equivalence strategy and variable importance evaluation based on MIC, the SMQAE maximizes the guidance of the quality variables during feature learning without the interference of the data dimensions. Therefore, the SMQAE can extract quality-relevant features for complex high-dimensional data. The rationality, superiority and robustness of SMQAE based soft analyzers are validated on four simulated scenarios and two industrial processes.

Introduction

In industrial processes, some key quality variables are difficult to detect online because of harsh environments [28], large detection delays [7], the need for expensive analyzers [17] and so on. Soft analyzers (soft sensors) establish prediction model based on easy-to-monitor variables to measure quality variables indirectly [30]. It generates effective information for industrial process optimization and control in real time. Some soft analyzers are established according to prior knowledge of the physicochemical mechanism of the industrial process and are classified as first-principle models (FPMs). The performance of FPMs relies on complete prior knowledge, which is difficult to acquire in complex modern industrial processes. Conversely, data-driven soft sensors [16] are insensitive to detailed mechanistic models. As data techniques and computer science have progressed, data-driven soft sensors have been widely and successfully applied to industrial processes [9] due to their high prediction accuracy and low maintenance cost, such as principal component regression (PCR) [22], [24], partial least squares (PLS) [8], support vector regression (SVR) [6] and neural network (NN) [14].

Generally, industrial process data are composed of variables with high dimensions, complicated correlations and redundancies. Feature representation is important to soft sensors since it improves the prediction accuracy, enhances model robustness and reduces model complexity. The soft sensor model can be constructed based on the extracted features. Principal component analysis (PCA) [13] and PLS are popular linear feature extraction methods. To represent nonlinear features, variants of PCA and PLS have also been developed, e.g., kernel PCA [3], kernel PLS [26], JITL-PCA [2] and locally weighted PLS [18]. These PCA and PLS based feature extraction methods can be considered a kind of shallow learning architecture that contains at most one hidden layer. Although these methods are able to learn the features of simple datasets, they are insufficient for forming feature representations for the large-scale and complicated data of modern industrial processes.

Since Hinton [12] et al. proposed a fast learning algorithm, deep learning has become more effective and applicable for real applications and has achieved breakthroughs in many fields [10], [32] including feature representation [11]. With multiple nonlinear hidden layers, the deep learning methods learn hierarchical features progressively. Many deep learning methods have been introduced into soft sensors for industrial processes, such as stacked autoencoder (SAE) [15], [20], [29], hierarchical extreme learning machine (HELM) [31], long short-term memory (LSTM) [21], NN [19], slow feature analysis (SFA) [25], and deep belief network (DBN) [38].

However, these deep learning methods mainly focus on the features of the raw input variables through unsupervised learning, and often ignores important quality information during feature extraction. Quality information is considered only in the modeling process. Moreover, the learned features may also contain irrelevant and redundant information. Although the extracted features represent the raw input variables well, they may be insufficient for quality prediction. Recently, Yuan [33] et al. developed a feature extraction method named the variable weighted SAE (VW-SAE) to solve this problem. The VW-SAE is pretrained with variable weights according to the Pearson correlation coefficients between the input and quality variables. Thus, the VW-SAE extracts abstract features containing more information from quality-related input variables, which are more suitable for quality prediction. Since the Pearson correlation coefficient is a linear correlation measurement, the performance of the original VW-SAE is limited. To extend the application of VW-SAE, Yuan et al. optimized correlation measurement of VW-SAE and developed a nonlinear VW-SAE [34] and a hybrid VW-SAE [35] (HVW-SAE) for nonlinear and complex relationship. Wang and Yan [27] introduced a maximal information coefficient-weighted SAE (MICW-SAE) to establish soft sensor models for industrial processes combined with a variable selection strategy. However, the relationship between input variables and quality variables is actually unknown, it is difficult to accurately evaluate their specific relationship. And variable weighting is easy to failure, when the information content of each variable is similar. Yuan and Zhou [37] et al. constructed a novel stacked quality-driven autoencoder (SQAE). In each sub-module, the output layer is composed of both its input and quality variables and reconstructs them simultaneously as well as possible. In a hierarchical stacked manner, the quality variables drive the network to extract quality-relevant features. However, there are two main limitations of the SQAE. The extracted features represent information from all the input variables, and irrelevant information may still be learned by the SQAE. More importantly, the dimension of the input variables is generally much larger than that of the quality variables. SQAE reconstructs each variable with the same accuracy, the proportion of the quality part in reconstruction is very small for the industrial application. Hence, the guidance of quality variables is limited for high-dimension soft sensor models.

The extracted features are expected to contain as much quality information as possible for soft analyzers. To alleviate the aforementioned problem, a novel deep learning method named the stacked maximal quality-driven autoencoder (SMQAE) is developed to maximize the guidance of quality information for feature representation. The SMQAE contains multiple maximal quality-driven autoencoders (MQAEs). In each MQAE, the quality variables are reconstructed together with the input variables by the decoder, which consists of two parts: the input part and the quality part. And SMQAE ensures that the influence of the quality part and input part on the reconstruction are the same by adopting a quality equivalence strategy. Therefore, the quality information guides the feature extraction without the influence of the data dimensions. [1] demonstrated that taking the variable importance into account can effectively improve the performance of the prediction model during supervised learning, especially for high-dimensional data. To enhance the quality information and eliminate irrelevant information during feature extraction, the maximal information coefficient [23] (MIC) is exploited to evaluate the association between input variables and quality variables. Next, a weighted reconstruction object function is constructed according to the MIC for the input part. With the constraint of the quality equivalence strategy and prioritization of the reconstruction of more quality-related variables in the input part, the SMQAE network maximizes quality guidance in the learning of potential features. Compared with commonly used Pearson coefficient Spearman coefficient, MIC is not limited to specific function types. Although MIC may slightly overestimate the importance of noise part, the reconstruction of quality variables and quality equivalence strategy in SMQAE can effectively reduce this influence. Thus, the SMQAE can effectively improve the adaptability, robustness and accuracy of soft sensors, especially for high-dimensional and complex data. The performance of SMQAE-based soft sensors is validated with simulated scenarios and industrial processes.

The rest of this paper is organized as follows. The background for SAEs is briefly revisited in Section 2. In Section 3, the stacked maximal quality-driven autoencoder and SMQAE-based soft sensor model are illustrated in detail. Then, the performance of proposed SMQAE is validated and the results are discussed in Section 4. Finally, Section 5 concludes this paper.

Section snippets

Stacked autoencoder

An SAE is a deep learning network composed of multiple basic autoencoders (AEs) layer by layer [36]. The basic AE is a three-layer network in which the structure and node values of the output layer are identical with those of the input layer. Thus, the basic AE contains a pair of encoder and decoder. Fig. 1 illustrates the architecture of the AE. In general, the hidden layer is considered as the extracted features. Suppose the variable vectors of the input layer, hidden layer and output layer

Stacked maximal quality-driven autoencoder

The SAE learns features that mainly represent the raw input information. As an improved variant, the SQAE introduces quality variables into the output layer of each QAE. With the guidance of the quality variables, the deep SQAE captures quality-related features. However, irrelevant variables still affect feature learning, and they may also be contained in the extracted features. More importantly, the dimension of the input variables is generally much larger than that of the quality variables.

Case study

In this section, the SMQAE is statistically assessed with three simulated scenarios firstly. To verify the mechanism of SMQAE, SQAE, SQAE with quality equivalence strategy (SMQAE-QE) and variants of VW-SAE and SMQAE which evaluate variable importance according to Pearson, Spearman coefficient and MIC (denoted as VWSAE-Pearson, VWSAE-Spearman, VWSAE-MIC, SMQAE-Pearson and SMQAE-Spearman, respectively) are utilized for comparison. Then, the performance of the SMQAE is validated with a numerical

Conclusion

In this paper, to overcome the shortcoming of traditional deep learning methods for feature extraction that the guidance of quality information on feature extraction is insufficient and easily reduces with the increase of the dimensions of the data, a novel deep learning method named SMQAE is proposed. In the SMQAE, quality data are reconstructed together with the input variables in the output layer of each MQAE module. Therefore, the autoencoders learn features containing more quality

CRediT authorship contribution statement

Junming Chen: Writing – original draft, Methodology, Software. Shaosheng Fan: Writing – review & editing, Supervision. Chunhua Yang: Funding acquisition, Supervision. Can Zhou: Writing – review & editing, Funding acquisition. Hongqiu Zhu: Validation. Yonggang Li: Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the State Key Program of the National Natural Science Foundation of China (Grant No. 61533021), the National Natural Science Foundation of China (Grant No. 61773403, 61860206014 & 62103063) and the Natural Science Foundation of Hunan Province (Grant No. 2021JJ40608 & 2021JJ30880).

References (38)

Cited by (12)

View all citing articles on Scopus
View full text