Skip to main content
Log in

Perturbation-based classifier

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The Bayes classifier depends on the conditional densities and the prior probabilities. Among many density functions, the Gaussian density has received more attention mainly motivated by its analytical tractability. The parameters of the Bayes classifier for the Gaussian distribution data are generally unknown, and approximations are calculated for the mean vector \({\hat{{\varvec{\mu }}}}\) and the covariance matrix \({\hat{\varSigma }}\). When a pattern is inserted in the training set of the class \(\omega _i\), the values of the parameters \(\hat{{\varvec{\mu }}}_{{\varvec{i}}}\) and \({\hat{\varSigma }}_{i}\) change by an amount given by \(\varDelta \hat{{\varvec{\mu }}}_{{\varvec{i}}}\) and \(\varDelta {\hat{\varSigma }}_i\), respectively. The insertion of one pattern can cause a perturbation, so we claim that this perturbation can be used for supervised classification purposes. Based on this assumption, we propose a supervised classifier called Perturbation-based Classifier PerC that assigns the class of the query pattern as the one that presents the smallest perturbation among all the classes after the insertion of this query pattern in the classes. The rationale is that the addition of a pattern that belongs to one specific class should not alter much the distribution of that class. PerC only uses the perturbations (\(\varDelta \hat{{\varvec{\mu }}}_{{\varvec{i}}}\) and \(\varDelta {\hat{\varSigma }}_i\)) to evaluate the class of a query pattern; so, it is a parameter-free classifier. The proposed method was assessed on 21 datasets from the UCI Machine Learning Repository, and its results were compared with classifiers from the literature. Results have shown that PerC obtains very competitive recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. To be more accurate, we should also mention that \( \left\| \varDelta {{\varvec{x}}}\right\| \rightarrow 0\). However, we have not assumed that because there is no change in the query vector \({{\varvec{x}}}\).

References

  • Achieser NI (2013) Theory of approximation. Courier Corporation, North Chelmsford

    MATH  Google Scholar 

  • Ade RR, Deshmukh PR (2013) Methods for incremental learning: a survey. Int J Data Min Knowl Manage Process 3(4):119–125

    Article  Google Scholar 

  • Bache K, Lichman M (2013) UCI Machine Learning Repository

  • Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):955–974

    Article  Google Scholar 

  • Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Stat Sci 9(1):2–54

    Article  MathSciNet  MATH  Google Scholar 

  • Cooper GF, Herskovits E (1992) A bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347

    MATH  Google Scholar 

  • Cover T, Hart P (2006) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  MATH  Google Scholar 

  • de Jesus RJ (2017) A method with neural networks for the classification of fruits and vegetables. Soft Comput 21(23):7207–7220

    Article  Google Scholar 

  • Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 07:1–30

    MathSciNet  MATH  Google Scholar 

  • Devroye L, Gyorfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin

    Book  MATH  Google Scholar 

  • Ding J, Wang H, Li C, Chai T, Wang J (2017) An online learning neural network ensembles with random weights for regression of sequential data stream. Soft Comput 21(20):5919–5937

    Article  Google Scholar 

  • Duda RO, Hart PE et al (1973) Pattern classification and scene analysis. Wiley, New York

    MATH  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York

    MATH  Google Scholar 

  • Evgeniou T, Poggio T, Pontil M, Verri A (2002) Regularization and statistical learning theory for data analysis. Comput Stat Data Anal 38(4):421–432

    Article  MathSciNet  MATH  Google Scholar 

  • Flores MJ, Gamez JA, Martinez AM, Puerta JM (2009) Gaode and haode: two proposals based on aode to deal with continuous variables. In: Proceedings of the 26th annual international conference on machine learning, pp 313–320

  • Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    Article  MATH  Google Scholar 

  • Fukunaga K (1972) Introduction to statistical pattern recognition, 1st edn. Academic Press, Orlando

    MATH  Google Scholar 

  • Hoffbeck JP, Landgrebe DA (1996) Covariance matrix estimation and classification with limited training data. IEEE Trans Pattern Anal Mach Intell 18(7):763–767

    Article  Google Scholar 

  • Iosifidis A, Tefas A, Pitas I (2013) On the optimal class representation in linear discriminant analysis. IEEE Trans Neural Netw Learn Syst 24(9):1491–1497

    Article  Google Scholar 

  • Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Article  Google Scholar 

  • Kivinen J, Smola AJ, Williamson RC (2004) Online learning with kernels. IEEE Trans Signal Process 52(8):2165–2176

    Article  MathSciNet  MATH  Google Scholar 

  • Kumar R, Srivastava S, Gupta J (2017) Modeling and adaptive control of nonlinear dynamical systems using radial basis function network. Soft Comput 21(15):4447–4463

    Article  Google Scholar 

  • Kuo BC, Landgrebe DA (2002) A covariance estimator for small sample size classification problems and its application to feature extraction. IEEE Trans Geosci Remote Sens 40(4):814–819

    Article  Google Scholar 

  • Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88:365–411

    Article  MathSciNet  MATH  Google Scholar 

  • Liu P, Choo KKR, Wang L, Huang F (2017) Svm or deep learning? A comparative study on remote sensing image classification. Soft Comput 21(23):7053–7065

    Article  Google Scholar 

  • Lutz A, Rodner E, Denzler J (2013) I want to know more—efficient multi-class incremental learning using gaussian processes. Pattern Recognit Image Anal 23(3):402–407

    Article  Google Scholar 

  • Lutz A, Rodner E, Denzler J (2011) Efficient multi-class incremental learning using gaussian processes. In: Open German-Russian workshop on pattern recognition and image understanding, pp 182–185

  • Mitchell TM (1997) Machine learning. McGraw-Hill, Boston

    MATH  Google Scholar 

  • Perez A, Larranaga P, Inza I (2006) Supervised classification with conditional gaussian networks: increasing the structure complexity from naive bayes. Int J Approx Reason 43(1):1–25

    Article  MathSciNet  MATH  Google Scholar 

  • Perez A, Larranaga P, Inza I (2009) Bayesian classifiers based on kernel density estimation: flexible classifiers. Int J Approx Reason 50(2):341–362

    Article  MATH  Google Scholar 

  • Perron F (1992) Minimax estimators of a covariance matrix. J Multivar Anal 43(1):16–28

    Article  MathSciNet  MATH  Google Scholar 

  • Searle SR (1982) Matrix algebra useful for statistics. Wiley, New York

    MATH  Google Scholar 

  • Tadjudin S, Landgrebe DA (1999) Covariance estimation with limited training samples. IEEE Trans Geosci Remote Sens Sens 37(4):2113–2118

    Article  Google Scholar 

  • Theodoridis S, Koutroumbas K (2008) Pattern recognition, 4th edn. Academic Press, California

    MATH  Google Scholar 

  • van Wieringen WN (2017) On the mean squared error of the ridge estimator of the covariance and precision matrix. Stat Probab Lett 123:88–92

    Article  MathSciNet  MATH  Google Scholar 

  • Wu WB, Xiao H (2012) Covariance matrix estimation in time series. In: Tata Subba Rao SSR, Rao C (eds) Time series analysis: methods and applications, handbook of statistics, vol 30. Elsevier, Amsterdam pp, pp 187–209

    Chapter  Google Scholar 

  • Zhu F, Yang J, Xu S, Gao C, Ye N, Yin T (2017) Incorporating neighbors distribution knowledge into support vector machines. Soft Comput 21(21):6407–6420

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Brazilian agencies: CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) and FACEPE (Fundação de Amparo à Ciência e Tecnologia de Pernambuco).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George D. C. Cavalcanti.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Araújo, E.L., Cavalcanti, G.D.C. & Ren, T.I. Perturbation-based classifier. Soft Comput 24, 16565–16576 (2020). https://doi.org/10.1007/s00500-020-04960-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-04960-2

Keywords

Navigation