Skip to main content

Advertisement

Log in

Combining partially global and local characteristics for improved classification

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The Support Vector Machine (SVM) has achieved promising classification performance. However, since it is based only on local information (Support Vectors), it is sensitive to directions with large data spread. On the other hand, Nonparametric Discriminant Analysis (NDA) is an improvement over the more general Linear Discriminant Analysis (LDA) where, the normality assumption from LDA is relaxed. Furthermore, NDA incorporates the partially global information to detect the dominant normal directions to the decision surface, which represent the true data spread. However, NDA relies on the choice of the κ-nearest neighbors (κ-NN’s) on the decision boundary. This paper introduces a novel Combined SVM and NDA (CSVMNDA) model which controls the spread of the data, while maximizing a relative margin separating the data classes. This model is considered as an improvement to SVM by incorporating the data spread information represented by the dominant normal directions to the decision boundary. This can also be viewed as an extension to the NDA where the support vectors improve the choice of κ-nearest neighbors (κ-NN’s) on the decision boundary by incorporating local information. Since our model is an extension to both SVM and NDA, it can deal with heteroscedastic and non-normal data. It also avoids the small sample size problem. Interestingly, the proposed improvements only require a rigorous and simple combination of NDA and SVM objective functions, and preserve the computational efficiency of SVM. Through the optimization of the CSVMNDA objective function, surprising performance gains were achieved on real-world problems. In particular, the experiments on face recognition have clearly shown the superiority of CSVMNDA over other state-of-the-art classification methods, especially, SVM and NDA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. The data sets can be obtained from http://www.first.gmd.de/raetsch/

References

  1. Vapnik VN (1998) Statistical learning theory. Wiley, New York, USA

    MATH  Google Scholar 

  2. Baudat G, Anouar B (2000) Generalized discriminant analysis using Kernel approach. Neural Comput 12:2385–2404

    Google Scholar 

  3. Mika S et al (1999) Fisher discriminant analysis with kernels. In: Proceedings of IEEE neural networks for signal processing workshop, pp 41–48

  4. Lim TS, Loh WY, Shih YS (2000)A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228

    Article  MATH  Google Scholar 

  5. Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19:711–720

    Article  Google Scholar 

  6. Weng JJ (1996) Crescepton and SHOSLIF: towards comprehensive visual learning. Early Visual Learn 183–214

  7. Fukunaga K (2000) Introduction to statistical pattern recognition, 2nd edn. Academic Press, London

  8. Loog M, Duin RPW (2004) Linear dimensionality reduction via heteroscedastic extension of LDA: the Chernoff criterion. IEEE Trans Pattern Anal Mach Intell 26(6):732-739

    Article  Google Scholar 

  9. Lee C, Landgrebe DA (1993) Feature extraction based on decision boundaries. IEEE Trans Pattern Anal Mach Intell 15(4):388–400

    Article  Google Scholar 

  10. Hart EP, Duda OR, Stork GD (2000) Pattern classification, 2nd edn. Wiley-Interscience, New York

  11. Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. Adv Neural Inform Process Syst 13:668–674

    Google Scholar 

  12. Shivaswamy PK, Jebara T (2007) Ellipsoidal kernel machines. In: Proceedings of the artificial intelligence and statistics, pp 484–491

  13. Zhang B, Chen X, Shan S, Gao W (2005) Nonlinear face recognition based on maximum average margin criterion. Comput Vis Pattern Recogn 1:554–559

    Google Scholar 

  14. Cesa-Bianchi N, Conconi A, Gentile C (2005) A second-order perceptron algorithm. SIAM J Comput 34(3):640–668

    Article  MathSciNet  MATH  Google Scholar 

  15. Cristianini M, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

  16. Dredze M, Crammer K, Pereira F (2008) Confidence-weighted linear classification. In: International conference on machine leaarning, pp 264–271

  17. Crammer K, Dredze M, Pereira F (2009) Exact convex confidence-weighted learning. Adv Neural Inform Process Syst 21:345–352

    Google Scholar 

  18. Mohri M, Pereira F (2009) Gaussian margin machines. In: Proceedings of the artificial intelligence and statistics, pp 105–112

  19. Belkin M, Niyogi P, Sindhwani V (2005) On manifold regularization. In: Proceedings of the Artificial Intelligence and Statistics

  20. Weston J, Collobert R, Sinz FH, Bottou L, Vapnik V (2006) Inference with the universum. In: Proceedings of the international conference on machine learning, pp 1009–1016

  21. Sinz F, Chapelle O, Agarwal A, Scholkopf B (2008) An analysis of inference with the universum. Adv Neural Inform Process Syst 20:1369–1376

    Google Scholar 

  22. Shivaswamy PK, Jebara T (2010) Maximum relative margin and data-dependent regularization. J Mach Learn Res 11:747–788

    MathSciNet  Google Scholar 

  23. Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recogn 34:2067–2070

    Article  MATH  Google Scholar 

  24. Jain A, Bolle R, Pankanti S (eds) (1999) BIOMETRIC-personal identification in networked society. Kluwer Academic Publishers, London

  25. Xiong T, Cherkassy V (2005) A combined SVM and LDA approach for classification. In: Proceedings of the international joint conference on neural networks, pp 1455–1459

  26. Park CH, Park H (2005) Nonlinear discriminant analysis using kernel functions and the generalized singular value decomposition. SIAM J Matrix Anal Appl 27(1):98–102

    Article  Google Scholar 

  27. Xiong T et al. (2005) Efficient kernel discriminant analysis via QR decomposition. Adv Neural Inform Process Syst 17:1529–1536

    Google Scholar 

  28. Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inform Theory 21(1):32–40

    Article  MathSciNet  MATH  Google Scholar 

  29. Kuhn HW, Tucker AW (1950) Nonlinear programming. In: Proceedings of 2nd Berkeley symposium, pp 481–492

  30. The MathWorks™. MATLAB Bioinformatics Toolbox™ (2009)

  31. Golub GH, Van-Loan CFV (1996) Matrix computations, 3rd edn. The John Hopkins University Press, Baltimore

  32. Coleman TF, Li Y (1996) A reflective newton method for minimizing a quadratic function subject to bounds on some of the variables. SIAM J Optim 6(4):1040–1058

    Article  MathSciNet  MATH  Google Scholar 

  33. Yiu ML, Mamoulis N (2005) Iterative projected clustering by subspace mining. IEEE Trans Knowl Data Eng 17(2):176–189

    Article  Google Scholar 

  34. Ratsch G, Onoda T, Muller KR (2000) Soft margins for Adaboost. Mach Learn 42(3):287–320

    Article  Google Scholar 

  35. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences

  36. Athinodoros SG (1997) Yale face database. Yale University, Center for Computational Vision and Control, USA

  37. Spacek L (2008) Face recognition data. University of Essex, Computer Vision Science Research Projects

  38. Budynek J, Lyons MJ, Akamatsu S (1999) Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell 21(12):1357–1362

    Article  Google Scholar 

  39. Kamachi M, Lyons M, Akamatsu S, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Proceedings of the Third IEEE international conference on automatic face and gesture recognition, 1998, pp 200–205

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riadh Ksantini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ksantini, R., Boufama, B. Combining partially global and local characteristics for improved classification. Int. J. Mach. Learn. & Cyber. 3, 119–131 (2012). https://doi.org/10.1007/s13042-011-0045-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-011-0045-9

Keywords

Navigation