Abstract
In the modern chemical industry, the process data collected are high-dimensional and complex. All measured variables are usually incorporated in statistical process monitoring models because these models generally perform dimension reduction. However, if modeling involves variables that do not contain useful information about faults, that is, variables that are not relevant to faults, monitoring performance may be degraded. In typical process monitoring methods, offline modeling only uses normal data without any fault information, making monitoring performance unlikely to be optimal. Hence, a novel stacked sparse autoencoder (SSAE) monitoring model based on fault-related variable selection was proposed. From the point of view that correlation characteristics between measured variables will change when faults occur, strongly fault-related variables are selected. Mutual information was used to calculate correlations between measured variables, including normal and fault data. Euclidean distance was adopted as a similarity index to measure the similarity between each correlation vector of measured variables in a normal state and that in a fault state. Only variables strongly related to fault effects were retained, and other uninformative variables were excluded from model development. Then, SSAEs were used to construct a monitoring model for selected data. The proposed method can utilize historical fault data to select strongly fault-related variables, making the model contain useful process information and features extracted by SSAE have high interpretability. A case study on the Tennessee–Eastman process demonstrated its availability.
Similar content being viewed by others
References
Anter AM, Gupta D, Castillo O (2020) A novel parameter estimation in dynamic model via fuzzy swarm intelligence and chaos theory for faults in wastewater treatment plant. Soft Comput 24:111–129
Comon P (1994) Independent component analysis, A new concept? Sig Process 36:287–314
Downs JJ, Vogel EF (1993) A plant-wide industrial process control problem. Comput Chem Eng 17:245–255
Ge Z (2017) Review on data-driven modeling and monitoring for plant-wide industrial processes. Chemometr Intell Lab 171:16–25
Ge Z, Gao F, Song Z (2011) Two-dimensional Bayesian monitoring method for nonlinear multimode processes. Chem Eng Sci 66:5173–5183
Ge Z, Song Z, Gao F (2013) Reviewof recent research on data-based process monitoring. Ind Eng Chem Res 52:3543–3562
Ge Z, Song Z, Ding SX, Huang B (2017) Data mining and analytics in the process industry: the role of machine learning. IEEE Access 5:20590–20616
Ghosh K, Ramteke M, Srinivasan R (2014) Optimal variable selection for effective statistical process monitoring. Comput Chem Eng 60:260–276
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Jiang Q, Yan X, Huang B (2016) Performance-driven distributed PCA process monitoring based on fault-relevant variable selection and bayesian inference. IEEE T Ind Electron 63:377–386
Kano M, Hasebe S, Hashimoto IHO (2002) Statistical process monitoring based on dissimilarity of process data. AIChE J 48:1231–1240
Khatib S, Daoutidis P, Almansoori A (2018) System decomposition for distributed multivariate statistical process monitoring by performance driven agglomerative clustering. Ind Eng Chem Res 57:8283–8298
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Lee JM, Yoo C, Choi SW, Vanrolleghem PA, Lee IB (2003) Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci 59:223–234
Li W (1990) Mutual information functions versus correlation functions. J Stat Phys 60:823–837
Li X, Wang L, Li P (2008) The study on composite load model structure of artificial neural network. In: 2008 Third International Conference on Electric Utility Deregulation and Restructuring and Power Technologies. IEEE 1564-1570
Liu H, Wu X, Zhang S (2014) A new supervised feature selection method for pattern classification. Comput Intell 30:342–361
Liu J, Song C, Zhao J, Ji P (2020) Large-scale dynamic process monitoring based on performance-driven distributed canonical variate analysis. J Chemom 34:1–27
Lv FY, Wen CL, Liu MQ, Bao ZJ (2017) Weighted time series fault diagnosis based on a stacked sparse autoencoder. J Chemometr 31:2912
McAvoy TJ, Ye N (1994) Base control for the Tennessee Eastman problem. Comput Chem Eng 18:383–413
Ming L, Zhao J (2017) Review on chemical process fault detection and diagnosis. In: 2017 6th International Symposium on Advanced Control of Industrial Processes (AdCONIP). IEEE 457-462
Qin SJ (2003) Statistical process monitoring: basics and beyond. J Chemometr 17:480–502
Qin SJ (2012) Survey on data-driven industrial process monitoring and diagnosis. Annu Rev Control 36:220–234
Reunanen J (2003) Overfitting in making comparisons between variable selection methods(Article). J Mach Learn Res 3:1371–1382
Ricker NL, Lee JH (1995) Nonlinear model predictive control of the Tennessee Eastman challenge process. Comput Chem Eng 19:961–981
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. NATURE 5:1
Tong C, Song Y, Yan X (2013) Distributed statistical process monitoring based on four-subspace construction and bayesian inference. Ind Eng Chem Res 52:9897–9907
Wang YQ, Si YB, Huang B, Lou ZJ (2018) Survey on the theoretical research and engineering applications of multivariate statistics process monitoring algorithms: 2008–2017. Canad J Chem Eng 96:2073–2085
Yin S, Ding SX, Xie X, Luo H (2014) A review on basic data-driven approaches for industrial process monitoring(Review). IEEE T Ind Electron 61:6414–6428
Yu J, Yan X (2019) Active features extracted by deep belief network for process monitoring. ISA T 84:247–261
Zeng J, Luo X, Liang J (2018) Online process monitoring using recursive mutual information-based variable selection and dissimilarity analysis with no prior information. IEEE Access 6:58662–58672
Zeng J, Huang W, Wang Z, Liang J (2019) Mutual information-based sparse multiblock dissimilarity method for incipient fault detection and diagnosis in plant-wide process. J Process Contr 83:63–76
Zhang Z, Jiang T, Li S, Yang Y (2018) Automated feature learning for nonlinear process monitoring—an approach using stacked denoising autoencoder and k-nearest neighbor rule. J Process Contr 64:49–61
Zou C, Qiu P (2009) Multivariate statistical process control using LASSO. J Am Stat Assoc 104:1586–1596
Acknowledgements
The authors are grateful for the support of the National Natural Science Foundation of China (21878081) and Fundamental Research Funds for the Central Universities under Grant of China (222201917006).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared no potential conflict of interests with respect to the research, authorship and/or publication of this article.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yin, J., Yan, X. Stacked sparse autoencoders monitoring model based on fault-related variable selection. Soft Comput 25, 3531–3543 (2021). https://doi.org/10.1007/s00500-020-05384-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05384-8