Abstract
In this manuscript, we derive a non-parametric version of the Fisher kernel. We obtain this original result from the Non-negative Matrix Factorization with the Kullback-Leibler divergence. By imposing suitable normalization conditions on the obtained factorization, it can be assimilated to a mixture of densities, with no assumptions on the distribution of the parameters. The equivalence between the Kullback-Leibler divergence and the log-likelihood leads to kernelization by simply taking partial derivatives. The estimates provided by this kernel, retain the consistency of the Fisher kernel.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the NMF framework it is more convenient to write the matrix \(\mathbf {X}=(x_{ij})\) (with the subscripts i and j defined in 1) as
$$\begin{aligned}{}[\mathbf {X}]_{ij}=\mathbf {X} \end{aligned}$$This notation is suitable when some index can vary (normally k in 4). If the matrix operations do not depend on the index that varies, eliding the subscripts simplifies the notation (as happens for diagonal matrices). For the rest of the objects, we refer to them as usual, being the matrices indicated as a bold capital letter, and vectors as bold lowercase.
- 2.
Several authors refers to the KL divergence as
$$\begin{aligned} D_{KL}(\mathbf {Y}\,\Vert \,\mathbf {W\,H}) = \sum _{ij}\Big (\,[\mathbf {Y}]_{ij}\,\odot \log \frac{[\mathbf {Y}]_{ij}}{[\mathbf {W\,H}]_{ij}}-[\mathbf {Y}]_{ij}+[\mathbf {WH}]_{ij}\Big ) \nonumber \end{aligned}$$which we prefer to call it as I-divergence or generalized KL-divergence, according to [6, p. 105], and reserving the name of KL divergence for the mean information (given by the formula 6), following the original denomination of S. Kullback and R.A. Leibler [20].
References
Amari, S.: Information Geometry and its Applications. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Bredensteiner, E.J., Bennett, K.P.: Multicategory classification by support vector machines. In: Pang, J.S. (ed.) Computational Optimization, pp. 53–79. Springer, Boston (1999). https://doi.org/10.1007/978-1-4615-5197-3_5
Chappelier, J.-C., Eckard, E.: PLSI: the true fisher kernel and beyond. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 195–210. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04180-8_30
Chen, J.C.: The nonnegative rank factorizations of nonnegative matrices. Linear Algebra Appl. 62, 207–217 (1984)
Cichocki, A., Zdunek, R., Phan, A.H., Amary, S.I.: Nonnegative Matrix and Tensor Factorizations. Wiley, Hoboken (2009)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Ding, C., Li, T., Peng, W.: On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput. Statist. Data Anal. 52(8), 3913–3927 (2008)
Ding, C., Xiaofeng, H., Horst D.S.: On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering, pp. 606–610 (2005). https://doi.org/10.1137/1.9781611972757.70
Dua, D., Graff, C.: UCI machine learning repository (2017). School of Information and Computer Sciences, University of California, Irvine. http://archive.ics.uci.edu/ml
Durgesh, K.S., Lekha, B.: Data classification using support vector machine. J. Theor. Appl. Inf. Technol. 12(1), 1–7 (2010)
Elkan, C.: Deriving TF-IDF as a fisher kernel. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 295–300. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_33
Figuera, P., García Bringas, P.: On the probabilistic latent semantic analysis generalization as the singular value decomposition probabilistic image. J. Stat. Theory Appl. 19, 286–296 (2020). https://doi.org/10.2991/jsta.d.200605.001
Franke, B., et al.: Statistical inference, learning and models in big data. Int. Stat. Rev. 84(3), 371–389 (2016)
Hofmann, T.: Learning the similarity of documents: an information-geometric approach to document retrieval and categorization. In: Advances in Neural Information Processing Systems, pp. 914–920 (2000)
Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36, 1171–1220 (2008)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. J. Mach. Learn. Res. 42(1–2), 177–196 (2000)
Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Jaakkola, T.S., Haussler, D., et al.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, pp. 487–493 (1999)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Latecki, L.J., Sobel, M., Lakaemper, R.: New EM derived from Kullback-Leibler divergence. In: KDD06 Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Minings, pp. 267–276 (2006)
Lee, H., Cichocki, A., Choi, S.: Kernel nonnegative matrix factorization for spectral EEG feature extraction. Neurocomputing 72(13–15), 3182–3190 (2009)
Martens, J.: New insights and perspectives on the natural gradient method. J. Mach. Learn. Res. 21, 1–76 (2020)
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien (2021). R package version 1.7-6. https://CRAN.R-project.org/package=e1071
Naik, G.R.: Non-negative Matrix Factorization Techniques. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-48331-2
Salcedo-Sanz, S., Rojo-Álvarez, J.L., Martínez-Ramón, M., Camps-Valls, G.: Support vector machines in engineering: an overview. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 4(3), 234–267 (2014)
Tsuda, K., Akaho, S., Kawanabe, M., Müller, K.R.: Asymptotic properties of the fisher kernel. Neural Comput. 16(1), 115–137 (2004)
Zhang, D., Zhou, Z.-H., Chen, S.: Non-negative matrix factorization on kernels. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 404–412. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-36668-3_44
Zhang, X.D.: Matrix Analysis and Applications. Cambridge University Press, Cambridge (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Figuera, P., Bringas, P.G. (2021). A Non-parametric Fisher Kernel. In: Sanjurjo González, H., Pastor López, I., García Bringas, P., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science(), vol 12886. Springer, Cham. https://doi.org/10.1007/978-3-030-86271-8_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-86271-8_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86270-1
Online ISBN: 978-3-030-86271-8
eBook Packages: Computer ScienceComputer Science (R0)