ABSTRACT
When given a small sample, we show that classification with SVM can be considerably enhanced by using a kernel function learned from the training data prior to discrimination. This kernel is also shown to enhance retrieval based on data similarity. Specifically, we describe KernelBoost - a boosting algorithm which computes a kernel function as a combination of 'weak' space partitions. The kernel learning method naturally incorporates domain knowledge in the form of unlabeled data (i.e. in a semi-supervised or transductive settings), and also in the form of labeled samples from relevant related problems (i.e. in a learning-to-learn scenario). The latter goal is accomplished by learning a single kernel function for all classes. We show comparative evaluations of our method on datasets from the UCI repository. We demonstrate performance enhancement on two challenging tasks: digit classification with kernel SVM, and facial image retrieval based on image similarity as measured by the learnt kernel.
- Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2005). Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6(Jun), 937--965.]] Google ScholarDigital Library
- Baxter, J. (1997). A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28, 7--39.]] Google ScholarDigital Library
- Blake, C., & Merz, C. (1998). UCI repository of machine learning databases.]]Google Scholar
- Chapelle, O., Schölkpf, B., & Zien, A. (Eds.). (2006). Semi-supervised learning. Cambridge: MIT Press. in press.]]Google ScholarCross Ref
- Crammer, K., Keshet, J., & Singer, Y. (2002). Kernel design using boosting. Advances in Neural Information Processing Systems.]]Google Scholar
- Cristianini, N., Kandola, J., Elissee, A., & Shawe-Taylor, J. (2002). On kernel target alignment. Advances in Neural Information Processing Systems.]]Google Scholar
- Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263--286.]]Google ScholarCross Ref
- Ferencz, A., Learned-Miller, E., & Malik, J. (2005). Building a classification cascade for visual identification from one example. International Conference of Computer Vision (ICCV) (pp. 286--293).]] Google ScholarDigital Library
- Georghiades, A., Belhumeur, P., & Kriegman, D. (2000). From few to many: Generative models for recognition under variable pose and illumination. Automatic Face and Gesture Recognition (pp. 277--284).]] Google ScholarDigital Library
- Hertz, T., Bar-Hillel, A., & Weinshall, D. (2004). Boosting margin based distance functions for clustering. 21st International Conference on Machine Learning (ICML).]] Google ScholarDigital Library
- Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2002). Learning the kernel matrix with semi-definite programming. ICML (pp. 323--330).]] Google ScholarDigital Library
- LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278--2324.]]Google ScholarCross Ref
- Li, F. F., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. CVPR, Workshop on Generative-Model Based Vision.]] Google ScholarDigital Library
- Miller, E., Matsakis, N., & Viola, P. (2000). Learning from one example through shared densities on transforms. CVPR (pp. 464--471).]]Google Scholar
- Ong, C. S., Smola, A. J., & Williamson, R. C. (2005). Learning the kernel with hyperkernels. Journal of Machine Learning Research, 6, 1043--1071.]] Google ScholarDigital Library
- Sali, E., & Ullman, S. (1998). Recognizing novel 3-d objects under new illumination and viewing position using a small number of example views or even a single view. ICCV (pp. 153--161).]] Google ScholarDigital Library
- Schapire, R. E., & Singer, Y. (1999). Improved boosting using confidence-rated predictions. Machine Learning, 37, 297--336.]] Google ScholarDigital Library
- Shental, N., Hertz, T., Bar-Hilel, A., & Weinshall, D. (2003). Computing gaussian mixture models with EM using equivalence constraints.]]Google Scholar
- Thrun, S., & Pratt, L. (1998). Learning to learn. Boston, MA: Kluwer Academic Publishers.]] Google ScholarCross Ref
- Xing, E., Ng, A., Jordan, M., & Russell, S. (2002). Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems.]]Google Scholar
- Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks. Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Germany.]] Google ScholarDigital Library
- Zhang, Z., Kwok, J., & Yeung, D. (2006). Model-based transductive learning of the kernel matrix.]] Google ScholarDigital Library
Index Terms
- Learning a kernel function for classification with small training samples
Recommendations
Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets
AbstractClassification performance of an ensemble method can be deciphered by studying the bias and variance contribution to its classification error. Statistically, the bias and variance of a single classifier is controlled by the size of the training ...
Learning the unified kernel machines for classification
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningKernel machines have been shown as the state-of-the-art learning techniques for classification. In this paper, we propose a novel general framework of learning the Unified Kernel Machines (UKM) from both labeled and unlabeled data. Our proposed ...
Ensemble of kernel extreme learning machine based elimination optimization for multi-label classification
AbstractMulti-label learning is a class of machine learning algorithms that study the classification problem of data associated with multiple labels simultaneously. Ensemble-based method is one of the representative methods in multi-label learning. In ...
Comments