Skip to main content

A Non-parametric Fisher Kernel

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2021)

Abstract

In this manuscript, we derive a non-parametric version of the Fisher kernel. We obtain this original result from the Non-negative Matrix Factorization with the Kullback-Leibler divergence. By imposing suitable normalization conditions on the obtained factorization, it can be assimilated to a mixture of densities, with no assumptions on the distribution of the parameters. The equivalence between the Kullback-Leibler divergence and the log-likelihood leads to kernelization by simply taking partial derivatives. The estimates provided by this kernel, retain the consistency of the Fisher kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the NMF framework it is more convenient to write the matrix \(\mathbf {X}=(x_{ij})\) (with the subscripts i and j defined in 1) as

    $$\begin{aligned}{}[\mathbf {X}]_{ij}=\mathbf {X} \end{aligned}$$

    This notation is suitable when some index can vary (normally k in 4). If the matrix operations do not depend on the index that varies, eliding the subscripts simplifies the notation (as happens for diagonal matrices). For the rest of the objects, we refer to them as usual, being the matrices indicated as a bold capital letter, and vectors as bold lowercase.

  2. 2.

    Several authors refers to the KL divergence as

    $$\begin{aligned} D_{KL}(\mathbf {Y}\,\Vert \,\mathbf {W\,H}) = \sum _{ij}\Big (\,[\mathbf {Y}]_{ij}\,\odot \log \frac{[\mathbf {Y}]_{ij}}{[\mathbf {W\,H}]_{ij}}-[\mathbf {Y}]_{ij}+[\mathbf {WH}]_{ij}\Big ) \nonumber \end{aligned}$$

    which we prefer to call it as I-divergence or generalized KL-divergence, according to [6, p. 105], and reserving the name of KL divergence for the mean information (given by the formula 6), following the original denomination of S. Kullback and R.A. Leibler [20].

References

  1. Amari, S.: Information Geometry and its Applications. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8

    Book  MATH  Google Scholar 

  2. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)

    Article  MathSciNet  Google Scholar 

  3. Bredensteiner, E.J., Bennett, K.P.: Multicategory classification by support vector machines. In: Pang, J.S. (ed.) Computational Optimization, pp. 53–79. Springer, Boston (1999). https://doi.org/10.1007/978-1-4615-5197-3_5

    Chapter  Google Scholar 

  4. Chappelier, J.-C., Eckard, E.: PLSI: the true fisher kernel and beyond. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 195–210. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04180-8_30

    Chapter  Google Scholar 

  5. Chen, J.C.: The nonnegative rank factorizations of nonnegative matrices. Linear Algebra Appl. 62, 207–217 (1984)

    Article  MathSciNet  Google Scholar 

  6. Cichocki, A., Zdunek, R., Phan, A.H., Amary, S.I.: Nonnegative Matrix and Tensor Factorizations. Wiley, Hoboken (2009)

    Book  Google Scholar 

  7. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  8. Ding, C., Li, T., Peng, W.: On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput. Statist. Data Anal. 52(8), 3913–3927 (2008)

    Article  MathSciNet  Google Scholar 

  9. Ding, C., Xiaofeng, H., Horst D.S.: On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering, pp. 606–610 (2005). https://doi.org/10.1137/1.9781611972757.70

  10. Dua, D., Graff, C.: UCI machine learning repository (2017). School of Information and Computer Sciences, University of California, Irvine. http://archive.ics.uci.edu/ml

  11. Durgesh, K.S., Lekha, B.: Data classification using support vector machine. J. Theor. Appl. Inf. Technol. 12(1), 1–7 (2010)

    Google Scholar 

  12. Elkan, C.: Deriving TF-IDF as a fisher kernel. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 295–300. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_33

    Chapter  Google Scholar 

  13. Figuera, P., García Bringas, P.: On the probabilistic latent semantic analysis generalization as the singular value decomposition probabilistic image. J. Stat. Theory Appl. 19, 286–296 (2020). https://doi.org/10.2991/jsta.d.200605.001

    Article  Google Scholar 

  14. Franke, B., et al.: Statistical inference, learning and models in big data. Int. Stat. Rev. 84(3), 371–389 (2016)

    Article  MathSciNet  Google Scholar 

  15. Hofmann, T.: Learning the similarity of documents: an information-geometric approach to document retrieval and categorization. In: Advances in Neural Information Processing Systems, pp. 914–920 (2000)

    Google Scholar 

  16. Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36, 1171–1220 (2008)

    MathSciNet  MATH  Google Scholar 

  17. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. J. Mach. Learn. Res. 42(1–2), 177–196 (2000)

    MATH  Google Scholar 

  18. Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)

    Article  Google Scholar 

  19. Jaakkola, T.S., Haussler, D., et al.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, pp. 487–493 (1999)

    Google Scholar 

  20. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  21. Latecki, L.J., Sobel, M., Lakaemper, R.: New EM derived from Kullback-Leibler divergence. In: KDD06 Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Minings, pp. 267–276 (2006)

    Google Scholar 

  22. Lee, H., Cichocki, A., Choi, S.: Kernel nonnegative matrix factorization for spectral EEG feature extraction. Neurocomputing 72(13–15), 3182–3190 (2009)

    Article  Google Scholar 

  23. Martens, J.: New insights and perspectives on the natural gradient method. J. Mach. Learn. Res. 21, 1–76 (2020)

    MathSciNet  MATH  Google Scholar 

  24. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien (2021). R package version 1.7-6. https://CRAN.R-project.org/package=e1071

  25. Naik, G.R.: Non-negative Matrix Factorization Techniques. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-48331-2

    Book  MATH  Google Scholar 

  26. Salcedo-Sanz, S., Rojo-Álvarez, J.L., Martínez-Ramón, M., Camps-Valls, G.: Support vector machines in engineering: an overview. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 4(3), 234–267 (2014)

    Article  Google Scholar 

  27. Tsuda, K., Akaho, S., Kawanabe, M., Müller, K.R.: Asymptotic properties of the fisher kernel. Neural Comput. 16(1), 115–137 (2004)

    Article  Google Scholar 

  28. Zhang, D., Zhou, Z.-H., Chen, S.: Non-negative matrix factorization on kernels. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 404–412. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-36668-3_44

    Chapter  Google Scholar 

  29. Zhang, X.D.: Matrix Analysis and Applications. Cambridge University Press, Cambridge (2017)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pau Figuera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Figuera, P., Bringas, P.G. (2021). A Non-parametric Fisher Kernel. In: Sanjurjo González, H., Pastor López, I., García Bringas, P., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science(), vol 12886. Springer, Cham. https://doi.org/10.1007/978-3-030-86271-8_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86271-8_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86270-1

  • Online ISBN: 978-3-030-86271-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics