Abstract
We introduce a new class of kernels between distributions. These induce a kernel on the input space between data points by associating to each datum a generative model fit to the data point individually. The kernel is then computed by integrating the product of the two generative models corresponding to two data points. This kernel permits discriminative estimation via, for instance, support vector machines, while exploiting the properties, assumptions, and invariances inherent in the choice of generative model. It satisfies Mercer’s condition and can be computed in closed form for a large class of models, including exponential family models, mixtures, hidden Markov models and Bayesian networks. For other models the kernel can be approximated by sampling methods. Experiments are shown for multinomial models in text classification and for hidden Markov models for protein sequence classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aherne, F., Thacker, N., Rockett, P.: The Bhattacharyya metric as an absolute similarity measure for frequency coded data. Kybernetika 32(4), 1–7 (1997)
Barndorff-Nielsen, O.: Information and Exponential Families in Statistical Theory. John Wiley & Sons, Chichester (1978)
Bengio, Y., Frasconi, P.: Input-output HMM’s for sequence processing. IEEE Transactions on Neural Networks 7(5), 1231–1249 (1996)
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. (1943)
Bishop, C.: Neural Networks for Pattern Recognition. Oxford Press, Oxford (1996)
Collins, M., Duffy, N.: Convolution kernels for natural language. Neural Information Processing Systems 14 (2002)
Cortes, C., Haffner, P., Mohri, M.: Rational kernels. In: Neural Information Processing Systems, vol. 15 (2002)
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural network architectures. Neural Computation 7, 219–269 (1995)
Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSCCRL- 9-10, University of California at Santa Cruz (1999)
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Neural Information Processing Systems, vol. 11 (1998)
Jaakkola, T., Meila, M., Jebara, T.: Maximum entropy discrimination. In: Neural Information Processing Systems, vol. 12 (1999)
Joachims, T., Cristianini, N., Shawe-Taylor, J.: Composite kernels for hypertext categorisation. In: International Conference on Machine Learning (2001)
Jordan, M.: Learning in Graphical Models. Kluwer Academic, Dordrecht (1997)
Kin, T., Tsuda, K., Asai, K.: Marginalized kernels for rna sequence data analysis. In: Proc. Genome Informatics (2002)
Kondor, R., Jebara, T.: A kernel between sets of vectors. Machine Learning: 10th International Conference. In: ICML 2003 (February 2003)
Lafferty, J., Lebanon, G.: Information diffusion kernels. In: Neural Information Processing Systems (2002)
Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for svm protein classification. In: Neural Information Processing Systems (2002)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Lyngso, R.B., Pedersen, C.N.S., Nielsen, H.: Metrics and similarity measures for hidden markov models. In: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB) (1999)
Ong, C., Smola, A., Williamson, R.: Superkernels. In: Neural Information Processing Systems (2002)
Rathinavelu, C., Deng, L.: Speech trajectory discrimination using the minimum classification error learning. In: IEEE Trans. on Speech and Audio Processing (1997)
Smola, A.J., Scholkopf, B.: From regularization operators to support vector machines. In: Neural Information Processing Systems, pp. 343–349 (1998)
Tishby, N., Bialek, W., Pereira, F.: The information bottleneck method: Extracting relevant information from concurrent data. Technical report, NEC Research Institute (1998)
Topsoe, F.: Some inequalities for information divergence and related measures of discrimination. J. of Inequalities in Pure and Applied Mathematics 2(1) (1999)
Vishawanathan, S.V.N., Smola, A.J.: Fast kernels for string and tree matching. In: Neural Information Processing Systems, vol. 15 (2002)
Watkins, C.: Dynamic Alignment Kernels. In: Watkins, C. (ed.) Advances in kernel methods. MT Press, Cambridge (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jebara, T., Kondor, R. (2003). Bhattacharyya and Expected Likelihood Kernels. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-45167-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40720-1
Online ISBN: 978-3-540-45167-9
eBook Packages: Springer Book Archive