Abstract
The Support Vector Machine (SVM) has achieved promising classification performance. However, since it is based only on local information (Support Vectors), it is sensitive to directions with large data spread. On the other hand, Nonparametric Discriminant Analysis (NDA) is an improvement over the more general Linear Discriminant Analysis (LDA) where, the normality assumption from LDA is relaxed. Furthermore, NDA incorporates the partially global information to detect the dominant normal directions to the decision surface, which represent the true data spread. However, NDA relies on the choice of the κ-nearest neighbors (κ-NN’s) on the decision boundary. This paper introduces a novel Combined SVM and NDA (CSVMNDA) model which controls the spread of the data, while maximizing a relative margin separating the data classes. This model is considered as an improvement to SVM by incorporating the data spread information represented by the dominant normal directions to the decision boundary. This can also be viewed as an extension to the NDA where the support vectors improve the choice of κ-nearest neighbors (κ-NN’s) on the decision boundary by incorporating local information. Since our model is an extension to both SVM and NDA, it can deal with heteroscedastic and non-normal data. It also avoids the small sample size problem. Interestingly, the proposed improvements only require a rigorous and simple combination of NDA and SVM objective functions, and preserve the computational efficiency of SVM. Through the optimization of the CSVMNDA objective function, surprising performance gains were achieved on real-world problems. In particular, the experiments on face recognition have clearly shown the superiority of CSVMNDA over other state-of-the-art classification methods, especially, SVM and NDA.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The data sets can be obtained from http://www.first.gmd.de/raetsch/
References
Vapnik VN (1998) Statistical learning theory. Wiley, New York, USA
Baudat G, Anouar B (2000) Generalized discriminant analysis using Kernel approach. Neural Comput 12:2385–2404
Mika S et al (1999) Fisher discriminant analysis with kernels. In: Proceedings of IEEE neural networks for signal processing workshop, pp 41–48
Lim TS, Loh WY, Shih YS (2000)A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19:711–720
Weng JJ (1996) Crescepton and SHOSLIF: towards comprehensive visual learning. Early Visual Learn 183–214
Fukunaga K (2000) Introduction to statistical pattern recognition, 2nd edn. Academic Press, London
Loog M, Duin RPW (2004) Linear dimensionality reduction via heteroscedastic extension of LDA: the Chernoff criterion. IEEE Trans Pattern Anal Mach Intell 26(6):732-739
Lee C, Landgrebe DA (1993) Feature extraction based on decision boundaries. IEEE Trans Pattern Anal Mach Intell 15(4):388–400
Hart EP, Duda OR, Stork GD (2000) Pattern classification, 2nd edn. Wiley-Interscience, New York
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. Adv Neural Inform Process Syst 13:668–674
Shivaswamy PK, Jebara T (2007) Ellipsoidal kernel machines. In: Proceedings of the artificial intelligence and statistics, pp 484–491
Zhang B, Chen X, Shan S, Gao W (2005) Nonlinear face recognition based on maximum average margin criterion. Comput Vis Pattern Recogn 1:554–559
Cesa-Bianchi N, Conconi A, Gentile C (2005) A second-order perceptron algorithm. SIAM J Comput 34(3):640–668
Cristianini M, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
Dredze M, Crammer K, Pereira F (2008) Confidence-weighted linear classification. In: International conference on machine leaarning, pp 264–271
Crammer K, Dredze M, Pereira F (2009) Exact convex confidence-weighted learning. Adv Neural Inform Process Syst 21:345–352
Mohri M, Pereira F (2009) Gaussian margin machines. In: Proceedings of the artificial intelligence and statistics, pp 105–112
Belkin M, Niyogi P, Sindhwani V (2005) On manifold regularization. In: Proceedings of the Artificial Intelligence and Statistics
Weston J, Collobert R, Sinz FH, Bottou L, Vapnik V (2006) Inference with the universum. In: Proceedings of the international conference on machine learning, pp 1009–1016
Sinz F, Chapelle O, Agarwal A, Scholkopf B (2008) An analysis of inference with the universum. Adv Neural Inform Process Syst 20:1369–1376
Shivaswamy PK, Jebara T (2010) Maximum relative margin and data-dependent regularization. J Mach Learn Res 11:747–788
Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recogn 34:2067–2070
Jain A, Bolle R, Pankanti S (eds) (1999) BIOMETRIC-personal identification in networked society. Kluwer Academic Publishers, London
Xiong T, Cherkassy V (2005) A combined SVM and LDA approach for classification. In: Proceedings of the international joint conference on neural networks, pp 1455–1459
Park CH, Park H (2005) Nonlinear discriminant analysis using kernel functions and the generalized singular value decomposition. SIAM J Matrix Anal Appl 27(1):98–102
Xiong T et al. (2005) Efficient kernel discriminant analysis via QR decomposition. Adv Neural Inform Process Syst 17:1529–1536
Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inform Theory 21(1):32–40
Kuhn HW, Tucker AW (1950) Nonlinear programming. In: Proceedings of 2nd Berkeley symposium, pp 481–492
The MathWorks™. MATLAB Bioinformatics Toolbox™ (2009)
Golub GH, Van-Loan CFV (1996) Matrix computations, 3rd edn. The John Hopkins University Press, Baltimore
Coleman TF, Li Y (1996) A reflective newton method for minimizing a quadratic function subject to bounds on some of the variables. SIAM J Optim 6(4):1040–1058
Yiu ML, Mamoulis N (2005) Iterative projected clustering by subspace mining. IEEE Trans Knowl Data Eng 17(2):176–189
Ratsch G, Onoda T, Muller KR (2000) Soft margins for Adaboost. Mach Learn 42(3):287–320
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences
Athinodoros SG (1997) Yale face database. Yale University, Center for Computational Vision and Control, USA
Spacek L (2008) Face recognition data. University of Essex, Computer Vision Science Research Projects
Budynek J, Lyons MJ, Akamatsu S (1999) Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell 21(12):1357–1362
Kamachi M, Lyons M, Akamatsu S, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Proceedings of the Third IEEE international conference on automatic face and gesture recognition, 1998, pp 200–205
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ksantini, R., Boufama, B. Combining partially global and local characteristics for improved classification. Int. J. Mach. Learn. & Cyber. 3, 119–131 (2012). https://doi.org/10.1007/s13042-011-0045-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-011-0045-9