A novel second-order learning algorithm based attention-LSTM model for dynamic chemical process modeling

Xu, Baochang; Wang, Yaxin; Yuan, Likun; Xu, Chaonong

doi:10.1007/s10489-022-03515-2

A novel second-order learning algorithm based attention-LSTM model for dynamic chemical process modeling

Published: 30 April 2022

Volume 53, pages 1619–1639, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Baochang Xu¹,
Yaxin Wang ORCID: orcid.org/0000-0002-7168-5399¹,
Likun Yuan¹ &
…
Chaonong Xu¹

587 Accesses
1 Altmetric
Explore all metrics

Abstract

The complexity of chemical process continues to increase, identifying the accurate process model has become a significant task of automatic control and optimal design. The effectiveness of chemical process identification based on deep learning methods has been verified in recent years. Aiming at the characteristics of chemical process, such as temporal correlation, nonlinearity, high dimension and strong coupling, a chemical process modeling method which combines spatio-temporal attention long short-term memory structure with a novel second-order optimization algorithm (STA-SO-LSTM) is proposed. Firstly, a second-order LSTM back-propagation algorithm is proposed to improve the model accuracy and training speed. This novel optimization algorithm uses the second derivative information of the neuron activation function and the gradient information to estimate the inverse Hessian matrix without matrix inversion operation. Then, considering the correlation and the time characteristics among different chemical process variables, a model combining spatio-temporal attention and LSTM was adopted. To demonstrate the efficiency of the developed neural network structure and algorithm, various comparative results are conducted on Tennessee-Eastman (TE) process and fractionator datasets. The experimental results clearly show that the structure combining LSTM and spatio-temporal attention mechanism has good performance in establishing dynamic model. And compared with some traditional and recently proposed optimization algorithms, the proposed second-order algorithm has higher accuracy and faster convergence speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gas concentration prediction based on ED-SLSTM model under the framework of Trend Prediction-Time Point Prediction

Article 25 May 2024

Spatio-temporal attention-based hybrid deep network for time series prediction of industrial process

Article 16 December 2024

Online Fault Diagnosis of Chemical Processes Based on Attention-Enhanced Encoder–Decoder Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Fang CZ, Xiao DY (1988) Processing identification. Tsinghua University Press, Beijing
Google Scholar
Shardt YAW, Huang B (2013) Data quality assessment of routine operating data for process identification. Comput Chem Eng 55:19–27. https://doi.org/10.1016/j.compchemeng.2013.03.029
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Ke WS, Huang DX, Yang F, Jiang YH (2017) Soft sensor development and applications based on LSTM in deep neural networks. In: IEEE symposium series on computational intelligence (SSCI), pp 1–6
Zhang X, Zou YY, Li SY, Xu SH (2019) A weighted auto regressive LSTM based approach for chemical processes modeling. Neurocomputing 367:64–74. https://doi.org/10.1016/j.neucom.2019.08.006
Article Google Scholar
Wang Y (2017) A new concept using LSTM neural networks for dynamic system identification. In: American control conference (ACC) pp 5324–5329. https://doi.org/10.23919/ACC.2017.7963782
Sun Q, Ge Z (2019) Probabilistic sequential network for deep learning of complex process data and soft sensor application. IEEE Trans Ind Inform 15:2700–2709. https://doi.org/10.1109/TII.2018.2869899
Article Google Scholar
Wang K, Gopaluni B, Chen J, Song Z (2018) Deep learning of complex batch process data and its application on quality prediction. IEEE Trans Ind Inform 16:7233–7242. https://doi.org/10.1109/TII.2018.2880968
Article Google Scholar
Preeti BR, Dagar A, Singh RP (2021) A novel online sequential extreme learning machine with L2,1-norm regularization for prediction problems. Appl Intell 51:1669–1689. https://doi.org/10.1007/s10489-020-01890-2
Article Google Scholar
Ye LF, Beskos A, Iorio MD, Hao J (2020) Monte Carlo co-ordinate ascent variational inference. Stat Comput 30:887–905. https://doi.org/10.1007/s11222-020-09924-y
Article MathSciNet MATH Google Scholar
Bottou L, Curtis FE, Nocedal J (2016) Optimization methods for large-scale machine learning. SIAM Rev 60:223–311. https://doi.org/10.1137/16M1080173
Article MathSciNet MATH Google Scholar
Zhang P (2020) A regularization Newton method based on the generalized Fischer–Burmeister smoothing function for the NCP. J Appl Math Comput 62:621–635. https://doi.org/10.1007/s12190-019-01300-y
Article MathSciNet MATH Google Scholar
An AM, Han KJ, Zhu L, Chen L, Liang R (2017) Networked distributed model predictive control for multi-variable coupling process. In: eighth international conference on intelligent control and information processing (ICICIP) pp 272–277. https://doi.org/10.1109/ICICIP.2017.8113954
Komatsua H, Nakajimab H (2020) The deficiency zero theorem and global asymptotic stability for a class of chemical reaction networks with arbitrary time delays. Syst Control Lett 136:1–10. https://doi.org/10.1016/j.sysconle.2019.104601
Article MathSciNet Google Scholar
Ding YK, Zhu YL, Feng J, Zhang PC, Cheng ZR (2020) Interpretable spatio-temporal attention LSTM model for flood forecasting. Neurocomputing 403:348–359. https://doi.org/10.1016/j.neucom.2020.04.110
Article Google Scholar
Yang SM, Deng B, Wang J, Li H, Lu M, Che Y, Wei X, Loparo KA (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst 31:148–162. https://doi.org/10.1109/TNNLS.2019.2899936
Article Google Scholar
Yang SM, Wang J, Zhang N et al (2021) CerebelluMorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Trans Neural Netw Learn Syst:1–15. https://doi.org/10.1109/TNNLS.2021.3057070
Yang SM, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15:601109. https://doi.org/10.3389/fnins.2021.601109
Article Google Scholar
Yang SM, Wang J, Deng B, Azghadi MR, Linares-Barranco B (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Trans Neural Netw Learn Syst:1–15. https://doi.org/10.1109/TNNLS.2021.3084250
Wilfred KJN, Sreeraj S, Vijay B, Bagyaveereswaran V (2015) System identification using artificial neural network. In: International conference on circuits, power and computing technologies (ICCPCT), pp 1–4. https://doi.org/10.1109/ICCPCT.2015.7159360
Yang Y, Fan CJ, Xiong HL (2021) A novel general-purpose hybrid model for time series forecasting. Appl Intell 52:2212–2223. https://doi.org/10.1007/s10489-021-02442-y
Article Google Scholar
Han YM, Fan CY, Xu M, Geng ZQ, Zhong YH (2019) Production capacity analysis and energy saving of complex chemical processes using LSTM based on attention mechanism. Appl Therm Eng 160:114072. https://doi.org/10.1016/j.applthermaleng.2019.114072
Article Google Scholar
Amari S (1993) Backpropagation and stochastic gradient descent method. Neurocomputing 5:185–196. https://doi.org/10.1016/0925-2312(93)90006-O
Article MATH Google Scholar
Senior A, Heigold G, Ranzato M, Yang K (2013) An empirical study of learning rates in deep neural networks for speech recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6724–6728. https://doi.org/10.1109/ICASSP.2013.6638963
Liew SS, Khalil-Hani M, Bakhteri R (2016) An optimized second order stochastic learning algorithm for neural network training. Neurocomputing 186:74–89. https://doi.org/10.1016/j.neucom.2015.12.076
Article Google Scholar
Hadgu AT, Nigam A, Diaz-Aviles E (2015) Large-scale learning with AdaGrad on spark. In: IEEE International Conference on Big Data, pp 2828–2830. https://doi.org/10.1109/BigData.2015.7364091
Zeiler MD (2012) ADADELTA: an adaptive learning rate method arXiv: 1212.5701
Anastasiadisa AD, Magoulasa GD, Vrahatis MN (2005) New globally convergent training scheme based on the resilient propagation algorithm. Neurocomputing 64:253–270. https://doi.org/10.1016/j.neucom.2004.11.016
Article Google Scholar
Lewis AS, Overton ML (2013) Nonsmooth optimization via quasi-Newton methods. Math Program 141:135–163. https://doi.org/10.1007/s10107-012-0514-2
Article MathSciNet MATH Google Scholar
Li DH, Fukushima M (2001) A modified BFGS method and its global convergence in nonconvex minimization. J Comput Appl Math 129:15–35
Article MathSciNet MATH Google Scholar
Livieris IE (2020) An advanced active set L-BFGS algorithm for training weight-constrained neural networks. Neural Comput Appl 32:6669–6684. https://doi.org/10.1007/s00521-019-04689-6
Article Google Scholar
Zhang YM, Bai SL (2011) A novel adaptive filter algorithm based on DFP technique. In: Proceedings of the 30th Chinese control conference (CCC), pp 1688–1691
Gratton S, Lawless AS, Nichols NK (2007) Approximate gauss–Newton methods for nonlinear least squares problems. SIAM J Optim 18:106–132. https://doi.org/10.1137/050624935
Article MathSciNet MATH Google Scholar
Rana MJ, Shahriar MS, Shafiullah M (2019) Levenberg–Marquardt neural network to estimate UPFC-coordinated PSS parameters to enhance power system stability. Neural Comput Appl 31:1237–1248. https://doi.org/10.1007/s00521-017-3156-8
Article Google Scholar
Gupta V, Koren T, Singer Y (2018) Shampoo: preconditioned stochastic tensor optimization. arXiv: 1802.09568v2
Bollapragada R, Byrd RH, Nocedal J (2016) Exact and inexact subsampled Newton methods for optimization. IMA J Numer Anal 39:545–578. https://doi.org/10.1093/imanum/dry009
Article MathSciNet MATH Google Scholar
Agarwal N, Bullins B, Hazan E (2017) Second-order stochastic optimization for machine learning in linear time. J Mach Learn Res 18:1–40
MathSciNet MATH Google Scholar
Li C, Zhang X, Qaosar M, Ahmed S, Alam KMR, Morimoto Y (2019) Multi-factor based stock Price prediction using hybrid neural networks with attention mechanism. In: IEEE Intl Conf on dependable, autonomic and secure computing (DASC), pp 961–966. https://doi.org/10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00176
He XY, Shi SX, Geng XL, Xu LY, Zhang XL (2021) Spatial-temporal attention network for multistep-ahead forecasting of chlorophyll. Appl Intell 51:4381–4393. https://doi.org/10.1007/s10489-020-02143-y
Article Google Scholar
Chen MY, Chiang HS, Sangaiah AK, Hsieh TC (2020) Recurrent neural network with attention mechanism for language model. Neural Comput Appl 32:7915–7923. https://doi.org/10.1007/s00521-019-04301-x
Article Google Scholar
Wu X, Du ZK, Guo YK, Fujita H (2019) Hierarchical attention based long short-term memory for Chinese lyric generation. Appl Intell 49:44–52. https://doi.org/10.1007/s10489-018-1206-2
Article Google Scholar
Li JC, Yang B, Li HG, Wang YJ, Qi C, Liu Y (2021) DTDR–ALSTM: extracting dynamic time-delays to reconstruct multivariate data for improving attention-based LSTM industrial time series prediction models. Knowl Based Syst 211:106508. https://doi.org/10.1016/j.knosys.2020.106508
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Strategic Cooperation Technology Projects of CNPC and CUPB (ZLZX2020-03), and the National Key Research and Development Project (2019YFA0708304).

Author information

Authors and Affiliations

College of Information Science and Engineering, China University of Petroleum, Beijing Campus (CUP), Beijing, China
Baochang Xu, Yaxin Wang, Likun Yuan & Chaonong Xu

Authors

Baochang Xu
View author publications
You can also search for this author inPubMed Google Scholar
Yaxin Wang
View author publications
You can also search for this author inPubMed Google Scholar
Likun Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Chaonong Xu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yaxin Wang.

Ethics declarations

Conflict of interest

We declare that we have no potential competing interests in our paper. All authors have seen the manuscript and approved to submit to your journal. We confirm that the content of the manuscript has not been published or submitted for publication elsewhere.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, B., Wang, Y., Yuan, L. et al. A novel second-order learning algorithm based attention-LSTM model for dynamic chemical process modeling. Appl Intell 53, 1619–1639 (2023). https://doi.org/10.1007/s10489-022-03515-2

Download citation

Accepted: 15 March 2022
Published: 30 April 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03515-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel second-order learning algorithm based attention-LSTM model for dynamic chemical process modeling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Gas concentration prediction based on ED-SLSTM model under the framework of Trend Prediction-Time Point Prediction

Spatio-temporal attention-based hybrid deep network for time series prediction of industrial process

Online Fault Diagnosis of Chemical Processes Based on Attention-Enhanced Encoder–Decoder Network

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now