Skip to main content
Log in

Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Small sample size (SSS) problems pose a tremendous challenge in modeling tasks due to insufficient training samples, especially in process industry where thousands of useless samples overwhelm very limited valuable samples, leading to deterioration on the prediction ability of trained models for key variables. In this study, the prediction ability to forecast models is enhanced by generating virtual samples. Considering the integrated effects of attributes, a new data augment approach, called ITNN-VSG, which integrates virtual sample generation (VSG) with input-training neural network (ITNN), was put forward to enlarge training datasets for improving the performance of forecasting models. In the absence of any available domain-specific knowledge about target models, a query-driven interpolation process was first developed to explore the overall tendency of data distribution in both sparse regions and dense regions. Second, an ITNN with fixed weights was used to calculate the input corresponding to the virtual output generated by the interpolation process. To validate the effectiveness of the proposed approach, several in silico experiments were carried out on a benchmark dataset from sinc(x) function, followed by a real-world application to purified terephthalic acid (PTA) solvent system. The experimental results demonstrated that the proposed approach outperformed other existing approaches such as mega-trend-diffusion and tree-based-trend-diffusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Bayar B, Bouaynaya N, Shterenberg R (2017) SMURC: high-dimension small-sample multivariate regression with covariance estimation. IEEE J Biomed Health Inform 21:573–581

    Article  Google Scholar 

  • Blaes S, Burwick T (2017) Few-shot learning in deep networks through global prototyping. Neural Netw 94:159–172

    Article  Google Scholar 

  • Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49:1–50

    Article  Google Scholar 

  • Chen J (2018) The quadrilateral Mindlin plate elements using the spline interpolation bases. J Comput Appl Math 329:68–83

    Article  MathSciNet  Google Scholar 

  • Chen ZS, Zhu B, He YL, Yu LA (2017) A PSO based virtual sample generation method for small sample sets: Applications to regression datasets. Eng Appl Artif Intell 59:236–243

    Article  Google Scholar 

  • Dias LS, Ierapetritou MG (2016) Integration of scheduling and control under uncertainties: review and challenges. Chem Eng Res Des 116:98–113

    Article  Google Scholar 

  • Diez-Olivan A, Del Ser J, Galar D, Sierra B (2019) Data fusion and machine learning for industrial prognosis: trends and perspectives towards Industry 4.0. Inf Fus 50:92–111

    Article  Google Scholar 

  • Espezua S, Villanueva E, Maciel CD, Carvalho A (2015) A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets. Neurocomputing 149:767–776

    Article  Google Scholar 

  • Gong HF, Chen ZS, Zhu QX, He YL (2017) A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: an empirical study of petrochemical industries. Appl Energy 197:405–415

    Article  Google Scholar 

  • He YL, Wang PJ, Zhang MQ, Zhu QX, Xu Y (2018) A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: a case study of Ethylene industry. Energy 147:418–427

    Article  Google Scholar 

  • Hong SH, Wang L, Truong TK (2018) Low-complexity direct computation algorithm for cubic-spline interpolation scheme. J Vis Commun Image Represent 50:159–166

    Article  Google Scholar 

  • Huang S et al (2013) A sparse structure learning algorithm for Gaussian Bayesian Network identification from high-dimensional data. IEEE Trans Pattern Anal Mach Intell 35:1328–1342

    Article  Google Scholar 

  • Lee Y, Kang J, Kang B, Ryu KR (2006) Bayesian sampling of virtual examples to improve classification accuracy. In: SICE-ICASE International Joint Conference, IEEE, Busan, South Korea, pp 1009–1014. http://doi.org/https://doi.org/10.1109/SICE.2006.315740

  • Li DC, Chen CC, Chang CJ, Lin WK (2012) A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Syst Appl 39:1575–1581

    Article  Google Scholar 

  • Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34:966–982

    Article  Google Scholar 

  • Li DC, Lin LS (2014) Generating information for small data sets with a multi-modal distribution. Decis Support Syst 66:71–81

    Article  Google Scholar 

  • Li DC, Lin LS, Peng LJ (2014) Improving learning accuracy by using synthetic samples for small datasets with non-linear attribute dependency. Decis Support Syst 59:286–295

    Article  Google Scholar 

  • Li DC, Lin WK, Chen CC, Chen HY, Lin LS (2018) Rebuilding sample distributions for small dataset learning. Decis Support Syst 105:66–76

    Article  Google Scholar 

  • Liu Y, Zhou Y, Liu X, Dong F, Wang C, Wang Z (2019) Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology. Engineering 5:156–163

    Article  Google Scholar 

  • Martin-Diaz I, Morinigo-Sotelo D, Duque-Perez O, Romero-Troncoso RD (2017) Early fault detection in induction motors using adaboost with imbalanced small data and optimized sampling. IEEE Trans Ind Appl 53:3066–3075

    Article  Google Scholar 

  • Niyogi P, Girosi F, Poggio T (1998) Incorporating prior information in machine learning by creating virtual examples. Proc IEEE 86:2196–2209

    Article  Google Scholar 

  • Ohashi T, Watanabe H, Tokuno J, Katagiri S, Ohsaki M, Matsuda S, Kashioka H (2012) Increasing virtual samples through loss smoothness determination in large geometric margin minimum classification error training. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Kyoto, Japan, pp 2081–2084. http://doi.org/https://doi.org/10.1109/ICASSP.2012.6288320

  • Qin SJ, Chiang LH (2019) Advances and opportunities in machine learning for process data analytics. Comput Chem Eng 126:465–473

    Article  Google Scholar 

  • Reuter C, Brambring F, Weirich J, Kleines A (2016) Improving data consistency in production control by adaptation of data mining algorithms. Procedia CIRP 56:545–550

    Article  Google Scholar 

  • Rodriguez-Amigo MC, Diez-Mediavilla M, Gonzalez-Pena D, Perez-Burgos A, Alonso-Tristan C (2017) Mathematical interpolation methods for spatial estimation of global horizontal irradiation in Castilla-Leon, Spain: A case study. Sol Energy 151:14–21

    Article  Google Scholar 

  • Saez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203

    Article  Google Scholar 

  • Tan SF, Mavrovouniotis ML (1995) Reducing data dimensionality through optimizing neural-network inputs. AIChE J 41:1471–1480

    Article  Google Scholar 

  • Tang J, Jia M, Liu Z, Chai T, Yu W (2015) Modeling high dimensional frequency spectral data based on virtual sample generation technique. In: IEEE International Conference on Information and Automation, IEEE, Lijiang, China, pp 1090–1095. http://doi.org/https://doi.org/10.1109/ICInfA.2015.7279449

  • Tulsyan A, Garvin C, Undey C (2018) Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems. Biotechnol Bioeng 115:1915–1924

    Article  Google Scholar 

  • Van Gorp J, Rolain Y (2000) An interpolation technique for learning with sparse Data. IFAC Proc Vol 33:73–78

    Article  Google Scholar 

  • Zhang Y, Ling C (2018) A strategy to apply machine learning to small datasets in materials science. NPJ Comput Mater 4:25

    Article  Google Scholar 

  • Zhao Y, Ma R, Wen X (2011) Construct virtual samples for improving kernel PCA. In: International Conference on Multimedia and Signal Processing, IEEE, Guilin, China, pp 325–328. http://doi.org/https://doi.org/10.1109/CMSP.2011.72

  • Zhu B, Chen ZS, He YL, Yu LA (2017a) A novel nonlinear functional expansion based PLS (FEPLS) and its soft sensor application. Chemom Intell Lab Syst 161:108–117

    Article  Google Scholar 

  • Zhu FY, Ma ZY, Li XX, Chen G, Chien JT, Xue JH, Guo J (2019) Image-text dual neural network with decision strategy for small-sample image classification. Neurocomputing 328:182–188

    Article  Google Scholar 

  • Zhu JL, Ge ZQ, Song ZH, Gao FR (2018) Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data. Annu Rev Control 46:107–133

    Article  MathSciNet  Google Scholar 

  • Zhu Q, Chen Z, Zhang X, Abbas R, Xu Y, Chen Y (2020) Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach. Soft Comput 24(9):6889–6902

    Article  Google Scholar 

  • Zhu QX, Gong HF, Xu Y, He YL (2017) A bootstrap based virtual sample generation method for improving the accuracy of modeling complex chemical processes using small datasets. In: 6th Data Driven Control and Learning Systems, IEEE, Chongqing, China. http://doi.org/https://doi.org/10.1109/DDCLS.2017.8068049

  • Zhu QX, Li CF (2006) Dimensionality reduction with input training neural network and its application in chemical process modelling. Chin J Chem Eng 14:597–603

    Article  Google Scholar 

Download references

Acknowledgements

Many thanks to Andy Koswara, Botond Szilagyi, Kanjakha Pal at Davidson School of Chemical Engineering in Purdue University, for invaluable discussions and advice. This research was partly funded by National Natural Science Foundation of China (Grant Nos. 61973024, 61973022, and 61703027), the China Scholarship Council State-Sponsored Scholarship Program (Grant Nos. 201806880024, 201806885004), the Fundamental Research Funds for the Central Universities under Grant Nos. JD1808 and the Open Research Fund of State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University (Grant No.18I01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qun-Xiong Zhu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

No individual participants are included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, ZS., Zhu, QX., Xu, Y. et al. Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system. Soft Comput 25, 6489–6504 (2021). https://doi.org/10.1007/s00500-021-05641-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05641-4

Keywords

Navigation