Abstract
For most practical supervised learning applications, the training datasets are often linearly nonseparable based on the traditional Euclidean metric. To strive for more effective classification capability, a new and flexible distance metric has to be adopted. There exist a great variety of kernel-based classifiers, each with their own favorable domain of applications. They are all based on a new distance metric induced from a kernel-based inner-product. It is also known that classifier’s effectiveness depends strongly on the distribution of training and testing data. The problem lies in that we just do not know in advance the right models for the observation data and measurement noise. As a result, it is impossible to pinpoint an appropriate model for the best tradeoff between the classifier’s training accuracy and error resilience. The objective of this paper is to develop a versatile classifier endowed with a broad array of parameters to cope with various kinds of real-world data. More specifically, a so-called PDA-SVM Hybrid is proposed as a unified model for kernel-based supervised classification. This paper looks into the interesting relationship between existing classifiers (such as KDA, PDA, and SVM) and explains why they are special cases of the unified model. It further explores the effects of key parameters on various aspects of error analysis. Finally, simulations were conducted on UCI and biological data and their performance compared.
Similar content being viewed by others
Notes
Generally speaking, the formula for an optimal linear decision function is x T w + b. For the special case, we happen to have b = 0.
Note that a vector with α i < 0 implies that it will have a safety margin greater than or equal to 1.0. They are arguably too far away from the decision boundary and may therefore be regarded as non-critical for decision making. Therefore, by the conventional SVM, they are excluded from the pool of selected vectors (i.e. those with nonzero a i ′s).
References
Kung, S. Y. (2009). Kernel approaches to unsupervised and supervised machine learning. In Proc. PCM’2009. Lecture notes in computer science (Vol. 5879, pp. 1–32). Springer-Verlag.
Aizerman, M., et al. (1964). Theoretical foundation of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821–837.
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer-Verlag.
Mitchell, T. M. (1997). Machine learning. McGraw-Hill.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
McLachlan, G. J. (1992). Discriminant analysis and statistical pattern recognition. John Wiley & Sons.
Friedman, J. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–175.
Fukunaga, K. (1990). Introduction to statistical pattern recognition. Boston: Academic.
Schölkopf, B., Burges, C. J. C., & Smola, A. J. (1999). Advances in kernel methods: Support vector learning. Cambridge: MIT Press.
Mika, S., Ratsch, G., Weston, J., Scholkopf, B., & Mullers, K. R. (1999). Fisher discriminant analysis with kernels. In Y. H. Hu, J. Larsen, E. Wilson, & S. Douglas (Eds.), Neural networks for signal processing IX (pp. 41–48).
Mika, S., Ratsch, G., & Muller, K. R. (2001). A mathematical programming approach to the kernel Fisher algorithm. Advances in Neural Information Processing Systems, 13, 591–597.
Mika, S., Smola, A. J., & Scholkopf, B. (2001). An improved training algorithm for kernel Fisher discriminants. In T. Jaakkola, & T. Richardson (Eds.), Proceedings AISTATS (Vol. 2001, pp. 98–104). San Francisco: Morgan Kaufmann.
Muller, K. R., Mika, S., Ratsch, G., Tsuda, K., & Scholkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–201.
Gestel, T. V., Suykens, J. A. K., Lanckriet, G., Lambrechts, A., Moor, B. D., & Vandewalle, J. (2002). Bayesian framework for least-squares support vector machine classifiers, Gaussian processes, and kernel Fisher discriminant analysis. Neural Computation, 14(5), 1115–1147,
Woodbury, M. A. (1950). Inverting modified matrices. In Statistical research group. Princeton: Princeton University, pp. Memorandum Rept. 42, MR38136.
Joachims, T. (1999). Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in Kernel methods—Support vector learning. Cambridge: MIT Press.
Schwaighofer, A. (2005). SVM toolbox for Matlab.
Pochet, N., De Smet, F., Suykens, J. A. K., & De Moor, B. L. R. (2004). Systematic benchmarking of microarray data classification: Assessing the role of nonlinearity and dimensionality reduction. Bioinformatics, 20(17), 3185–3195.
Iizuka, N., et al. (2003). Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The Lancet, 361(9361), 923–929.
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., et al. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745–6750.
Nutt, C. L., et al. (2003). Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 63(7), 1602–1607.
Golub, T. R., Slonim, D. K., Huard, C., Tamayo, P., Gaasenbeek, M., Mesirov, J. P., et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.
Singh, D., et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell, 1(2), 203–209.
van ’t Veer, L., et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530– 535.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389–422.
Author information
Authors and Affiliations
Corresponding author
Additional information
This manuscript was based on the keynote paper at PCM2009 by Kung [1]. This work benefited greatly from our research collaboration with Ms. Yuhui Luo from the Princeton University. The work was in part supported by The Hong Kong Research Grant Council, Grant No. PolyU5251/08E and PolyU5264/09E. Some of the research was conducted when S.Y. Kung was a Distinguished Visiting Professor at The University of Hong Kong.
Rights and permissions
About this article
Cite this article
Kung, S.Y., Mak, MW. PDA-SVM Hybrid: A Unified Model for Kernel-Based Supervised Classification. J Sign Process Syst 65, 5–21 (2011). https://doi.org/10.1007/s11265-011-0588-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-011-0588-8