Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system

Chen, Zhong-Sheng; Zhu, Qun-Xiong; Xu, Yuan; He, Yan-Lin; Su, Qing-Lin; Liu, Yiqing C.; Nagy, Zoltan K.

doi:10.1007/s00500-021-05641-4

Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system

Methodologies and Application
Published: 27 February 2021

Volume 25, pages 6489–6504, (2021)
Cite this article

Soft Computing Aims and scope Submit manuscript

Zhong-Sheng Chen ORCID: orcid.org/0000-0001-9659-0751^1,2,3,
Qun-Xiong Zhu^1,2,
Yuan Xu^1,2,
Yan-Lin He^1,2,
Qing-Lin Su³,
Yiqing C. Liu³ &
…
Zoltan K. Nagy³

511 Accesses
11 Citations
Explore all metrics

Abstract

Small sample size (SSS) problems pose a tremendous challenge in modeling tasks due to insufficient training samples, especially in process industry where thousands of useless samples overwhelm very limited valuable samples, leading to deterioration on the prediction ability of trained models for key variables. In this study, the prediction ability to forecast models is enhanced by generating virtual samples. Considering the integrated effects of attributes, a new data augment approach, called ITNN-VSG, which integrates virtual sample generation (VSG) with input-training neural network (ITNN), was put forward to enlarge training datasets for improving the performance of forecasting models. In the absence of any available domain-specific knowledge about target models, a query-driven interpolation process was first developed to explore the overall tendency of data distribution in both sparse regions and dense regions. Second, an ITNN with fixed weights was used to calculate the input corresponding to the virtual output generated by the interpolation process. To validate the effectiveness of the proposed approach, several in silico experiments were carried out on a benchmark dataset from sinc(x) function, followed by a real-world application to purified terephthalic acid (PTA) solvent system. The experimental results demonstrated that the proposed approach outperformed other existing approaches such as mega-trend-diffusion and tree-based-trend-diffusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geyser Inspired Algorithm: A New Geological-inspired Meta-heuristic for Real-parameter and Constrained Engineering Optimization

Article 26 September 2023

Mojtaba Ghasemi, Mohsen Zare, … Laith Abualigah

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

Article 13 March 2023

Mohamed Abdel-Basset, Reda Mohamed, … Mohamed Abouhawwash

Emerging trends in federated learning: from model fusion to federated X learning

Article Open access 02 April 2024

Shaoxiong Ji, Yue Tan, … Anwar Walid

References

Bayar B, Bouaynaya N, Shterenberg R (2017) SMURC: high-dimension small-sample multivariate regression with covariance estimation. IEEE J Biomed Health Inform 21:573–581
Article Google Scholar
Blaes S, Burwick T (2017) Few-shot learning in deep networks through global prototyping. Neural Netw 94:159–172
Article Google Scholar
Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49:1–50
Article Google Scholar
Chen J (2018) The quadrilateral Mindlin plate elements using the spline interpolation bases. J Comput Appl Math 329:68–83
Article MathSciNet Google Scholar
Chen ZS, Zhu B, He YL, Yu LA (2017) A PSO based virtual sample generation method for small sample sets: Applications to regression datasets. Eng Appl Artif Intell 59:236–243
Article Google Scholar
Dias LS, Ierapetritou MG (2016) Integration of scheduling and control under uncertainties: review and challenges. Chem Eng Res Des 116:98–113
Article Google Scholar
Diez-Olivan A, Del Ser J, Galar D, Sierra B (2019) Data fusion and machine learning for industrial prognosis: trends and perspectives towards Industry 4.0. Inf Fus 50:92–111
Article Google Scholar
Espezua S, Villanueva E, Maciel CD, Carvalho A (2015) A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets. Neurocomputing 149:767–776
Article Google Scholar
Gong HF, Chen ZS, Zhu QX, He YL (2017) A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: an empirical study of petrochemical industries. Appl Energy 197:405–415
Article Google Scholar
He YL, Wang PJ, Zhang MQ, Zhu QX, Xu Y (2018) A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: a case study of Ethylene industry. Energy 147:418–427
Article Google Scholar
Hong SH, Wang L, Truong TK (2018) Low-complexity direct computation algorithm for cubic-spline interpolation scheme. J Vis Commun Image Represent 50:159–166
Article Google Scholar
Huang S et al (2013) A sparse structure learning algorithm for Gaussian Bayesian Network identification from high-dimensional data. IEEE Trans Pattern Anal Mach Intell 35:1328–1342
Article Google Scholar
Lee Y, Kang J, Kang B, Ryu KR (2006) Bayesian sampling of virtual examples to improve classification accuracy. In: SICE-ICASE International Joint Conference, IEEE, Busan, South Korea, pp 1009–1014. http://doi.org/https://doi.org/10.1109/SICE.2006.315740
Li DC, Chen CC, Chang CJ, Lin WK (2012) A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Syst Appl 39:1575–1581
Article Google Scholar
Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34:966–982
Article Google Scholar
Li DC, Lin LS (2014) Generating information for small data sets with a multi-modal distribution. Decis Support Syst 66:71–81
Article Google Scholar
Li DC, Lin LS, Peng LJ (2014) Improving learning accuracy by using synthetic samples for small datasets with non-linear attribute dependency. Decis Support Syst 59:286–295
Article Google Scholar
Li DC, Lin WK, Chen CC, Chen HY, Lin LS (2018) Rebuilding sample distributions for small dataset learning. Decis Support Syst 105:66–76
Article Google Scholar
Liu Y, Zhou Y, Liu X, Dong F, Wang C, Wang Z (2019) Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology. Engineering 5:156–163
Article Google Scholar
Martin-Diaz I, Morinigo-Sotelo D, Duque-Perez O, Romero-Troncoso RD (2017) Early fault detection in induction motors using adaboost with imbalanced small data and optimized sampling. IEEE Trans Ind Appl 53:3066–3075
Article Google Scholar
Niyogi P, Girosi F, Poggio T (1998) Incorporating prior information in machine learning by creating virtual examples. Proc IEEE 86:2196–2209
Article Google Scholar
Ohashi T, Watanabe H, Tokuno J, Katagiri S, Ohsaki M, Matsuda S, Kashioka H (2012) Increasing virtual samples through loss smoothness determination in large geometric margin minimum classification error training. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Kyoto, Japan, pp 2081–2084. http://doi.org/https://doi.org/10.1109/ICASSP.2012.6288320
Qin SJ, Chiang LH (2019) Advances and opportunities in machine learning for process data analytics. Comput Chem Eng 126:465–473
Article Google Scholar
Reuter C, Brambring F, Weirich J, Kleines A (2016) Improving data consistency in production control by adaptation of data mining algorithms. Procedia CIRP 56:545–550
Article Google Scholar
Rodriguez-Amigo MC, Diez-Mediavilla M, Gonzalez-Pena D, Perez-Burgos A, Alonso-Tristan C (2017) Mathematical interpolation methods for spatial estimation of global horizontal irradiation in Castilla-Leon, Spain: A case study. Sol Energy 151:14–21
Article Google Scholar
Saez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Article Google Scholar
Tan SF, Mavrovouniotis ML (1995) Reducing data dimensionality through optimizing neural-network inputs. AIChE J 41:1471–1480
Article Google Scholar
Tang J, Jia M, Liu Z, Chai T, Yu W (2015) Modeling high dimensional frequency spectral data based on virtual sample generation technique. In: IEEE International Conference on Information and Automation, IEEE, Lijiang, China, pp 1090–1095. http://doi.org/https://doi.org/10.1109/ICInfA.2015.7279449
Tulsyan A, Garvin C, Undey C (2018) Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems. Biotechnol Bioeng 115:1915–1924
Article Google Scholar
Van Gorp J, Rolain Y (2000) An interpolation technique for learning with sparse Data. IFAC Proc Vol 33:73–78
Article Google Scholar
Zhang Y, Ling C (2018) A strategy to apply machine learning to small datasets in materials science. NPJ Comput Mater 4:25
Article Google Scholar
Zhao Y, Ma R, Wen X (2011) Construct virtual samples for improving kernel PCA. In: International Conference on Multimedia and Signal Processing, IEEE, Guilin, China, pp 325–328. http://doi.org/https://doi.org/10.1109/CMSP.2011.72
Zhu B, Chen ZS, He YL, Yu LA (2017a) A novel nonlinear functional expansion based PLS (FEPLS) and its soft sensor application. Chemom Intell Lab Syst 161:108–117
Article Google Scholar
Zhu FY, Ma ZY, Li XX, Chen G, Chien JT, Xue JH, Guo J (2019) Image-text dual neural network with decision strategy for small-sample image classification. Neurocomputing 328:182–188
Article Google Scholar
Zhu JL, Ge ZQ, Song ZH, Gao FR (2018) Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data. Annu Rev Control 46:107–133
Article MathSciNet Google Scholar
Zhu Q, Chen Z, Zhang X, Abbas R, Xu Y, Chen Y (2020) Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach. Soft Comput 24(9):6889–6902
Article Google Scholar
Zhu QX, Gong HF, Xu Y, He YL (2017) A bootstrap based virtual sample generation method for improving the accuracy of modeling complex chemical processes using small datasets. In: 6th Data Driven Control and Learning Systems, IEEE, Chongqing, China. http://doi.org/https://doi.org/10.1109/DDCLS.2017.8068049
Zhu QX, Li CF (2006) Dimensionality reduction with input training neural network and its application in chemical process modelling. Chin J Chem Eng 14:597–603
Article Google Scholar

Download references

Acknowledgements

Many thanks to Andy Koswara, Botond Szilagyi, Kanjakha Pal at Davidson School of Chemical Engineering in Purdue University, for invaluable discussions and advice. This research was partly funded by National Natural Science Foundation of China (Grant Nos. 61973024, 61973022, and 61703027), the China Scholarship Council State-Sponsored Scholarship Program (Grant Nos. 201806880024, 201806885004), the Fundamental Research Funds for the Central Universities under Grant Nos. JD1808 and the Open Research Fund of State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University (Grant No.18I01).

Author information

Authors and Affiliations

College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China
Zhong-Sheng Chen, Qun-Xiong Zhu, Yuan Xu & Yan-Lin He
Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing, 100029, China
Zhong-Sheng Chen, Qun-Xiong Zhu, Yuan Xu & Yan-Lin He
Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47907, USA
Zhong-Sheng Chen, Qing-Lin Su, Yiqing C. Liu & Zoltan K. Nagy

Authors

Zhong-Sheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qun-Xiong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Lin He
View author publications
You can also search for this author in PubMed Google Scholar
Qing-Lin Su
View author publications
You can also search for this author in PubMed Google Scholar
Yiqing C. Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zoltan K. Nagy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qun-Xiong Zhu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

No individual participants are included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, ZS., Zhu, QX., Xu, Y. et al. Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system. Soft Comput 25, 6489–6504 (2021). https://doi.org/10.1007/s00500-021-05641-4

Download citation

Accepted: 28 January 2021
Published: 27 February 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00500-021-05641-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system

Abstract

Access this article

Similar content being viewed by others

Geyser Inspired Algorithm: A New Geological-inspired Meta-heuristic for Real-parameter and Constrained Engineering Optimization

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

Emerging trends in federated learning: from model fusion to federated X learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system

Abstract

Access this article

Similar content being viewed by others

Geyser Inspired Algorithm: A New Geological-inspired Meta-heuristic for Real-parameter and Constrained Engineering Optimization

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

Emerging trends in federated learning: from model fusion to federated X learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation