Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions

Li, Der-Chiang; Shi, Qi-Shi; Chen, Hung-Yu

doi:10.1007/s13042-018-00905-2

Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions

Original Article
Published: 08 April 2019

Volume 10, pages 2805–2822, (2019)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

215 Accesses
Explore all metrics

Abstract

Learning with small data is challenging for most algorithms in regard to building statistically robust models. In previous studies, virtual sample generation (VSG) approaches have been verified as effective in terms of meeting this challenge. However, most VSG methods were developed for numerical inputs. Therefore, to address situations where data has nominal inputs and continuous outputs, a systemic VSG procedure is proposed to generate samples based on fuzzy techniques to further enhance modelling capability. Based on the concept of the data preprocess in the M5′ model tree, we reveal a useful procedure by which to extract the fuzzy relations between nominal inputs and continuous outputs. Further, with the idea of nonparametric operations, we employ trend similarity to present the fuzzy relations between inputs and outputs. Then, these relations are represented by possibility distributions, and sample candidates are created based on these distributions. Finally, the candidates filtered using \(\alpha\)-cut are regarded as qualified virtual samples. In the experiments, we demonstrate the effectiveness of our approach through a comparison with two other VSG approaches using five public datasets and two prediction models. Moreover, three parameters used in our approaches are discussed. However, determining how to find the most fit parameters requires further study in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Using virtual samples to improve learning performance for small datasets with multimodal distributions

Article 08 January 2019

Improved feature selection with simulation optimization

Article 30 May 2022

Fuzzy Clustering Ensemble for Prioritized Sampling Based on Average and Rough Patterns

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Ali SS, Howlader T, Rahman SMM (2018) Pooled shrinkage estimator for quadratic discriminant classifier: an analysis for small sample sizes in face recognition. Int J Mach Learn Cybern 9(3):507–522
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Chawla NV, Bowyer KW, Hall LO, Philip Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Conroy B, Eshelman L, Potes C, Xu-Wilson M (2016) A dynamic ensemble approach to robust classification in the presence of missing data. Mach Learn 102(3):443–463
Article MathSciNet MATH Google Scholar
Cost S, Salzberg S (1993) A weighted nearest neighbor algorithm for learning with symbolic features. Mach Learn 10(1):57–78
Google Scholar
de Jesús Rubio J (2018) Error convergence analysis of the SUFIN and CSUFIN. Appl Soft Comput 72:587–595
Article Google Scholar
Efron B (1979) Computers and the theory of statistics: thinking the unthinkable. SIAM Rev 21(4):460–480
Article MathSciNet MATH Google Scholar
Fard MJ, Wang P, Chawla S, Reddy CK (2016) A bayesian perspective on early stage event prediction in longitudinal data. IEEE Trans Knowl Data Eng 28:3126–3139
Article Google Scholar
Gosset WS (1908) The probable error of a mean. Biometrika 6(1):1–25
Article MathSciNet Google Scholar
Gui L, Xu RF, Lu Q, Du JC, Zhou Y (2018) Negative transfer detection in transductive transfer learning. Int J Mach Learn Cybern 9(2):185–197
Article Google Scholar
Huang C (1997) Principle of information diffusion. Fuzzy Sets Syst 91(1):69–90
Article MathSciNet MATH Google Scholar
Huang C, Moraga C (2004) A diffusion-neural-network for learning from small samples. Int J Approx Reason 35(2):137–161
Article MathSciNet MATH Google Scholar
Kawakita M, Takeuchi J (2017) A note on model selection for small sample regression. Mach Learn 106(11):1839–1862
Article MathSciNet MATH Google Scholar
Li DC, Lin WK, Chen CC, Chen HY, Lin LS (2018) Rebuilding sample distributions for small dataset learning. Decis Support Syst 105:66–76
Article Google Scholar
Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34(4):966–982
Article MATH Google Scholar
Meza AG, Cortes TH, Lopez AV, Carranza LA, Herrera RT, Ramirez IO, Campana JA (2017) Analysis of fuzzy observability property for a class of TS fuzzy models. IEEE Latin Am Trans 15(4):595–602
Article Google Scholar
Niyogi P, Girosi F, Poggio T (1998) Incorporating prior information in machine learning by creating virtual examples. Proc IEEE 86(11):2196–2209
Article Google Scholar
Pan SJ, Yang QA (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Sezer EA, Nefeslioglu HA, Gokceoglu C (2014) An assessment on producing synthetic samples by fuzzy C-means for limited number of data in prediction models. Appl Soft Comput 24:126–134
Article Google Scholar
Sharma A, Paliwal K (2015) Linear discriminant analysis for the small sample size problem: an overview. Int J Mach Learn Cybern 6(3):443–454
Article Google Scholar
Sinn HW (1980) A rehabilitation of the principle of insufficient reason. Q J Econ 94(3):493–506
Article Google Scholar
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Article Google Scholar
Song X, Shao C, Yang X, Wu X (2017) Sparse representation-based classification using generalized weighted extended dictionary. Soft Comput 21(15):4335–4348
Article Google Scholar
van de Schoot R, Broere JJ, Perryck KH, Zondervan-Zwijnenburg M, Van Loey NE (2015) Analyzing small data sets using Bayesian estimation: the case of posttraumatic stress symptoms following mechanical ventilation in burn survivors. Eur J Psychotraumatol 6(1):25216
Article Google Scholar
Wang XZ, Wang R, Xu C (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715
Article MathSciNet Google Scholar
Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
Article Google Scholar
Wang Y, Witten IH (1997) Inducing model trees for continuous classes. In: Proceedings of the ninth european conference on machine learning, pp128–37
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
Article MATH Google Scholar
Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1(1):3–28
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Information Management, National Cheng Kung University, Tainan, Taiwan
Der-Chiang Li & Qi-Shi Shi
Institute of Information Management, National Cheng Kung University, Tainan, Taiwan
Hung-Yu Chen

Authors

Der-Chiang Li
View author publications
You can also search for this author inPubMed Google Scholar
Qi-Shi Shi
View author publications
You can also search for this author inPubMed Google Scholar
Hung-Yu Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Der-Chiang Li.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, DC., Shi, QS. & Chen, HY. Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions. Int. J. Mach. Learn. & Cyber. 10, 2805–2822 (2019). https://doi.org/10.1007/s13042-018-00905-2

Download citation

Received: 05 April 2018
Accepted: 14 December 2018
Published: 08 April 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s13042-018-00905-2

Keywords

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions

Abstract

Access this article

Similar content being viewed by others

Using virtual samples to improve learning performance for small datasets with multimodal distributions

Improved feature selection with simulation optimization

Fuzzy Clustering Ensemble for Prioritized Sampling Based on Average and Rough Patterns

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords