skip to main content
10.1145/1143844.1143895acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Learning a kernel function for classification with small training samples

Published:25 June 2006Publication History

ABSTRACT

When given a small sample, we show that classification with SVM can be considerably enhanced by using a kernel function learned from the training data prior to discrimination. This kernel is also shown to enhance retrieval based on data similarity. Specifically, we describe KernelBoost - a boosting algorithm which computes a kernel function as a combination of 'weak' space partitions. The kernel learning method naturally incorporates domain knowledge in the form of unlabeled data (i.e. in a semi-supervised or transductive settings), and also in the form of labeled samples from relevant related problems (i.e. in a learning-to-learn scenario). The latter goal is accomplished by learning a single kernel function for all classes. We show comparative evaluations of our method on datasets from the UCI repository. We demonstrate performance enhancement on two challenging tasks: digit classification with kernel SVM, and facial image retrieval based on image similarity as measured by the learnt kernel.

References

  1. Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2005). Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6(Jun), 937--965.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baxter, J. (1997). A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28, 7--39.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blake, C., & Merz, C. (1998). UCI repository of machine learning databases.]]Google ScholarGoogle Scholar
  4. Chapelle, O., Schölkpf, B., & Zien, A. (Eds.). (2006). Semi-supervised learning. Cambridge: MIT Press. in press.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. Crammer, K., Keshet, J., & Singer, Y. (2002). Kernel design using boosting. Advances in Neural Information Processing Systems.]]Google ScholarGoogle Scholar
  6. Cristianini, N., Kandola, J., Elissee, A., & Shawe-Taylor, J. (2002). On kernel target alignment. Advances in Neural Information Processing Systems.]]Google ScholarGoogle Scholar
  7. Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263--286.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. Ferencz, A., Learned-Miller, E., & Malik, J. (2005). Building a classification cascade for visual identification from one example. International Conference of Computer Vision (ICCV) (pp. 286--293).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Georghiades, A., Belhumeur, P., & Kriegman, D. (2000). From few to many: Generative models for recognition under variable pose and illumination. Automatic Face and Gesture Recognition (pp. 277--284).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hertz, T., Bar-Hillel, A., & Weinshall, D. (2004). Boosting margin based distance functions for clustering. 21st International Conference on Machine Learning (ICML).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2002). Learning the kernel matrix with semi-definite programming. ICML (pp. 323--330).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278--2324.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. Li, F. F., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. CVPR, Workshop on Generative-Model Based Vision.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Miller, E., Matsakis, N., & Viola, P. (2000). Learning from one example through shared densities on transforms. CVPR (pp. 464--471).]]Google ScholarGoogle Scholar
  15. Ong, C. S., Smola, A. J., & Williamson, R. C. (2005). Learning the kernel with hyperkernels. Journal of Machine Learning Research, 6, 1043--1071.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sali, E., & Ullman, S. (1998). Recognizing novel 3-d objects under new illumination and viewing position using a small number of example views or even a single view. ICCV (pp. 153--161).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Schapire, R. E., & Singer, Y. (1999). Improved boosting using confidence-rated predictions. Machine Learning, 37, 297--336.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shental, N., Hertz, T., Bar-Hilel, A., & Weinshall, D. (2003). Computing gaussian mixture models with EM using equivalence constraints.]]Google ScholarGoogle Scholar
  19. Thrun, S., & Pratt, L. (1998). Learning to learn. Boston, MA: Kluwer Academic Publishers.]] Google ScholarGoogle ScholarCross RefCross Ref
  20. Xing, E., Ng, A., Jordan, M., & Russell, S. (2002). Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems.]]Google ScholarGoogle Scholar
  21. Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks. Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Germany.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhang, Z., Kwok, J., & Yeung, D. (2006). Model-based transductive learning of the kernel matrix.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning a kernel function for classification with small training samples

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICML '06: Proceedings of the 23rd international conference on Machine learning
          June 2006
          1154 pages
          ISBN:1595933832
          DOI:10.1145/1143844

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader