Skip to main content
Log in

Beyond hybrid generative discriminative learning: spherical data classification

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The blending of generative and discriminative approaches has been prevailed by exploring and adopting distinct characteristic of each approach toward constructing a complementar system combining the best of both. The majority of current research in classification and categorization does not completely address the true structure and nature of data for particular application at hand. In contrast to most previous research, our proposed work focuses on the modeling and classification of spherical data that are naturally generated in many data mining and knowledge discovery applications such as text classification, visual scenes categorization and gene expression analysis. This paper investigates a generative mixture model to cluster spherical data based on Langevin distribution. In particular, we formulate a unified probabilistic framework, where we build probabilistic kernels based on Fisher score and information divergences from mixture of Langevin distributions for Support Vector Machine. We demonstrate the effectiveness and the merits of the proposed learning framework through synthetic data and challenging applications involving spam filtering using both textual and visual email contents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. In particular, the authors in [10] recommended strongly the normalization of data in feature space when considering SVM and have shown that normalization leads to considerably superior generalization performance.

  2. More details and thorough discussions about the statistics of spherical data in particular and directional data in general can be found in [19].

  3. Also known as the circular normal distribution [22].

  4. Other approaches are possible also. For instance, in [35], the authors have used mixture of von Mises distributions learned using maximum likelihood for parameters estimation and bootstrap likelihood ratio approach to assess the optimal number of components and applied to study the problem of sudden infant death.

  5. Using the superscript * to denote the optimal values of the cost function.

  6. This localized data presentation alleviates many problems associated with representing data in complex applications (e.g. video categorization) such as data sparsity and curse of dimensionality.

  7. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

  8. http://www.cs.cmu.edu/~enron/.

  9. (open source by Google) http://code.google.com/p/tesseract-ocr/.

  10. In [64] the threshold (t) has been set to 0.5, 0.9, 0.999, respectively, where \(t= \frac{\lambda}{1 + \lambda}. \)

  11. Available at http://www.princeton.edu/cass/spam/spam_bench/.

References

  1. Podolak IT, Roman A (2011) Cores: fusion of supervised and unsupervised training methods for a multi-class classification problem. Pattern Anal Appl 14(4):395–413

    Article  MathSciNet  Google Scholar 

  2. Yang B, Chen S, Wu X (2011) A structurally motivated framework for discriminant analysis. Pattern Anal Appl 14(4):349–367

    Article  MathSciNet  Google Scholar 

  3. Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer, Berlin

  4. Bishop CM (2006) Pattern Recognition and Machine Learning, 1st edn. Springer, Berlin

  5. Jebara T, Kondor R, Howard A (2004) Probability product kernels. J Mach Learn Res 5:819–844

    MATH  MathSciNet  Google Scholar 

  6. Ng AY, Jordan MI (2001) On discriminative vs generative classifiers: a comparison of logistic regression and naive Bayes. In: Proceedings of 4th conference on advances in neural information processing systems. MIT Press, Cambridge, pp 841–848

  7. Raina R, Shen Y, Ng AY, McCallum A (2003) Classification with hybrid generative/discriminative models. In: Proceedings of 16th conference on advances in neural information processing systems. MIT Press

  8. Bosch A, Zisserman A, Noz XM (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727

    Article  Google Scholar 

  9. Prevost L, Oudot L, Moises A, Michel-Sendis C, Milgram M (2005) Hybrid generative/discriminative classifier for unconstrained character recognition. Pattern Recogn Lett 26(12):1840–1848

    Article  Google Scholar 

  10. Herbrich R, Graepel T (2000) A PAC-Bayesian margin bound for linear classifiers: why SVMs work. In: Proceedings of advances in neural information processing systems (NIPS). pp 224–230

  11. Wittel GL, Wu SF (2004) On attacking statistical spam filters. In: Proceedings of the first conference on email and anti-spam (CEAS). California, USA

  12. Amayri O, Bouguila N (2010) A study of spam filtering using support vector machines. Artif Intell Rev 34(1):73–108

    Article  Google Scholar 

  13. Graf AB, Smola AJ, Borer S (2003) Classification in a normalized feature space using support vector machines. IEEE Trans Neural Netw 14(3):597–605

    Article  Google Scholar 

  14. Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer, Berlin

  15. Wallace CS, Dowe DL (2000) MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Stat Comput 10(1):73–83

    Article  Google Scholar 

  16. Dowe D, Oliver J, Wallace C (1996) MML estimation of the parameters of the spherical fisher distribution. In: Arikawa S, Sharma A (eds) Proceedings of the conference on algorithmic learning theory (ALT). Lecture Notes in computer science, vol 1160. Springer, Berlin, pp 213–227

  17. Leopold E, Kindermann J (2002) Text categorization with support vector machines. How to represent texts in input space? Mach Learn 46(13):423–444

    Article  MATH  Google Scholar 

  18. Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of of ECML-98, 10th European conference on machine learning, number 1398. Chemnitz, DE. Springer, Berlin, pp 137–142

  19. Mardia KV (1975) Statistics of directional data (with discussions). J R Stat Soc Series B (Methodol) 37(3):349–393

    MATH  MathSciNet  Google Scholar 

  20. Mardia KV (1972) Statistics of directional data. Academic Press, Waltham

  21. Watson GS (1983) Statistics on spheres. Wiley, New York

    MATH  Google Scholar 

  22. Fisher NI (1993) Statistical analysis of circular data, 1st edn. Cambridge University Press, Cambridge

  23. Fisher NI, Embleton BJJ, Lewis T (1993) Statistical analysis of spherical data. Cambridge University Press, Cambridge

  24. McGraw T, Vemuri BC, Yezierski B, Mareci T (2006) von Mises–Fisher mixture model of the diffusion ODF. In: Proceedings of 3rd IEEE international symposium on biomedical imaging: from nano to macro, Arlington, VA, pp 65–68

  25. Tang H, Chu SM, Huang TS (2009) Generative model-based speaker clustering via mixture of von Mises–Fisher distributions. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, Los Alamitos, CA, USA. IEEE Computer Society, pp 4101–4104

  26. Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises–Fisher distributions. J Mach Learn Res 6:1345–1382

    MATH  MathSciNet  Google Scholar 

  27. Mardia KV, Zemroch PJ (1975) Algorithm as 81: circular statistics. Appl Stat 24(1):147–150

    Article  Google Scholar 

  28. Mardia KV, Zemroch PJ (1975) Algorithm as 80: spherical statistics. Appl Stat 24(1):144–146

    Article  Google Scholar 

  29. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  30. Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1–2):143–175

    Article  MATH  Google Scholar 

  31. Dhillon I, Fan J, Guan Y (2001) Efficient Clustering of very large document collections. Kluwer, New York

  32. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automatic Control 19(6):716–723

    Article  MATH  MathSciNet  Google Scholar 

  33. Schwarz G (1978) Estimating dimension of a model. Ann Stat 6:461–464

    Article  MATH  Google Scholar 

  34. Rissanen J (1987) Modeling by shortest data description. Automatica 14:465–471

    Article  Google Scholar 

  35. Mooney JA, Helms PJ, Jolliffe IT (2003) Fitting mixtures of von Mises distributions: a case study involving sudden infant death syndrome. Comput Stat Data Anal 41:505–513

    Article  MATH  MathSciNet  Google Scholar 

  36. Bouguila N, Ziou D (2006) Unsupervised selection of a finite Dirichlet mixture model: an MML-based approach. IEEE Trans Knowl Data Eng 18(8):993–1009

    Article  Google Scholar 

  37. Bouguila N, Ziou D (2007) High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Trans Pattern Anal Mach Intell 29(10):1716–1731

    Article  Google Scholar 

  38. Mardia KV (1975) Distribution theory for the von Mises–Fisher distribution and its application. In: Kotz S, Patial GP, Ord JK (eds) Statistical distributions for scientific work, vol 1. pp 113–130

  39. Agarwal A, Daumé H (2011) Generative kernels for exponential families. In: Proc. of the international conference on artificial intelligence and statistics (AISTAT)

  40. Jaakkola TS, Haussler D (1998) Exploiting generative models in discriminative classifiers. In: Proceedings of advances in neural information systems (NIPS). MIT Press, Cambridge, pp 487–493

  41. Chan AB, Vasconcelos N, Moreno PJ (2004) A family of probabilistic kernels based on information divergence. TechnicalReport SVCL-TR2004/01, University of California, SanDiego

  42. Bouguila N (2012) Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans Knowl Data Eng 24:2184–2202

    Article  Google Scholar 

  43. Kullback S (1959) Information theory and statistics, 1st edn. Wiley, New York

  44. Moreno PJ, Ho PP, Vasconcelos N (2003) A Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge

  45. Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between gaussian mixture models. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), vol 4. pp 317–320

  46. Lin J (1991) Divergence measure based on Shannon entropy. IEEE Trans Inf Theory 37(14):145–151

    Article  MATH  Google Scholar 

  47. Rényi A (1960) On measures of entropy and information. In: Proceedings of Berkeley symposium mathematical statistics and probability. pp 547–561

  48. Ulrich G (1984) Computer generation of distributions on the m-sphere. J Roy Stat Soc 33(2):158–163

    MATH  MathSciNet  Google Scholar 

  49. Wood ATA (1994) Simulation of the von Mises Fisher distribution. Commun Stat Simul Comput 23(1):157–164

    Article  MATH  Google Scholar 

  50. Blanzieri E, Bryl A (2008) A survey of learning-based techniques of email spam filtering. Artif Intell Rev 29:63–92

    Article  Google Scholar 

  51. Zhu Y, Tan Y (2011) A local-concentration-based feature extraction approach for spam filtering. IEEE Trans Inf Forensics Secur 6(2):486–497

    Article  Google Scholar 

  52. Özgür L, Güngör T (2012) Optimization of dependency and pruning usage in text classification. Pattern Anal Appl 15(1):45–58

    Article  MathSciNet  Google Scholar 

  53. Cormack GV, Lynam TR (2007) Online supervised spam filter evaluation. ACM Trans Inf Syst 25:1–29

    Article  Google Scholar 

  54. Hershkop S, Stolfo SJ (2005) Combining email models for false positive reduction. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (KDD). pp 98–107

  55. Chang M, Yih W, Meek C (2008) Partitioned logistic regression for spam filtering. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). pp 97–105

  56. Yoshida K, Adachi F, Washio T, Motoda H, Homma T, Nakashima A, Fujikawa H, Yamazaki K (2004) Density-based spam detector. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD). pp 486–493

  57. Chirita P, Diederich J, Nejdl W (2005) Mailrank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM). pp 373–380

  58. Tseng C, Huang J, Chen M (2007) Promail: using progressive email social network for spam detection. In: Zhou Z, Li H, Yang Q (eds) PAKDD. Lecture notes in computer science, vol 4426. Springer, Berlin, pp 833–840

  59. Wu CH (2009) Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst Appl 36(3):4321–4330

    Article  Google Scholar 

  60. Fumera G, Pillai I, Roli F (2006) Spam filtering based on the analysis of text information embedded into images. J Mach Learn Res 7:2699–2720

    Google Scholar 

  61. Konstantinidis K, Vonikakis V, Panitsidis G, Andreadis I (2011) A center-surround histogram for content-based image retrieval. Pattern Anal Appl 14(3):251–260

    Article  MathSciNet  Google Scholar 

  62. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110

    Article  Google Scholar 

  63. Androutsopoulos I, Koutsias J, Cb KV, Spyropoulos CD (2000) An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, New York, pp 160–167

  64. Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. In: Proceedings of National Conference on artificial intelligence

  65. Cormack GV, Lynam TR (2005) Trec 2005 spam track overview. In: Proceedings of the fourteenth text retrieval conference (TREC05), Gaithersburg, MD

  66. Mehta B, Nangia S, Gupta M, Nejdl W (2008) Detecting image spam using visual features and near duplicate detection. In: Proceedings of the 17th international conference on World Wide Web (WWW). pp 497–506

  67. Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: Proceedings of the 4th Conference on Email and Anti-Spam (CEAS). pp 487–493

  68. Kailath T (1967) The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol 15(1):52–60

    Article  Google Scholar 

Download references

Acknowledgments

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nizar Bouguila.

Appendices

Appendix 1: Proof of Eq. 30

In the case of Langevin model, we can show that

$$ \begin{aligned} &\int\limits_{\Upomega} p({\bf X}|\Uptheta)^{\rho} q({\bf X}|\acute{\Uptheta})^{\rho} \,{\rm d}{\bf X} \\ &\quad= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\rho} \left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{\rho}\int\limits_{\Upomega} \left( e^{\kappa {\varvec{\mu}}^{T}{\bf X}}\right)^{\rho} \left( e^{\acute{\kappa} \acute{{\varvec{\mu}}}^{T}{\bf X}}\right)^{\rho} \,{\rm d}{\bf X} \\ &\quad= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\rho} \left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{\rho}\int\limits _{\Upomega} e^{(\kappa {\varvec{\mu}}+ \acute{\kappa} {\acute{\varvec{\mu}}})^{T}{\bf X} \rho } \,{\rm d}{\bf X} \end{aligned} $$
(47)

The product of two Langevin can be written as

$$ M_{p}({\bf X}| {\varvec{\mu}},\kappa)M_{p}({\bf X}| {\acute{\varvec{\mu}}},{\acute{\kappa}})\propto M_{p}({\bf X}| \tau_{{\varvec{\mu}}, {\acute{\varvec{\mu}}}},\xi_{\kappa, \acute{\kappa}}) $$

where

$$ \begin{aligned} \xi_{\kappa, \acute{\kappa}}&= \sqrt{ \kappa^2+\acute{\kappa}^2+2\kappa\acute{\kappa}({\varvec{\mu}}\acute{{\varvec{\mu}}})}\\ \tau_{{{\varvec{\mu}}}, \acute{{\varvec{\mu}}}} &= \frac{\kappa{{\varvec{\mu}}}+\acute{\kappa}\acute{{\varvec{\mu}}}}{\xi_{\kappa, \acute{\kappa}}} \end{aligned} $$
(48)

Using 48 and Langevin integral, we obtain

$$ \int\limits_{\Upomega} p({\bf X}|\Uptheta)^{\rho} q({\bf X}|\acute{\Uptheta})^{\rho} \,{\rm d}{\bf X}= \left[ \left( \frac{\kappa \acute{\kappa}}{4} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{p} I_{\frac{p}{2}-1}(\kappa)I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{\rho} \left[ \frac{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\xi_{\kappa, \acute{\kappa}} \rho)}{(\xi_{\kappa, \acute{\kappa}} \rho)^{\frac{p}{2}-1}} \right] $$
(49)

Appendix 2: Proof of Eq. 36

The KL divergence between two exponential distributions is presented by [68]

$$ KL(p({\bf X}|\Uptheta), q({\bf X}|\acute{\Uptheta}))= \Upphi(\theta) - \Upphi(\acute{\theta}) + [G(\theta) - G(\acute{\theta})]^T E_{\theta}[T({\bf X})] $$
(50)

where E θ is the expectation with respect to \(p({\bf X}|\Uptheta), G(\theta)=(G_{1}(\theta), \ldots, G_{l}(\theta)),T({\bf X})= (T_{1}({\bf X}), \ldots, T_{l}({\bf X}))\) where l is the number of parameters of the distribution and T denotes transpose. Furthermore, we have the following:

$$ E_{\theta}[T({\bf X})]= - \acute{\Upphi}(\theta) $$
(51)

Then by letting \(\Upphi_{\theta}= - a_{p}(\kappa)\) and \(G_{\theta}= \kappa{\varvec{\mu}}. \) Thus, the KL divergence between two Langevin distributions is given as

$$ KL(p({\bf X}|\Uptheta), q({\bf X}|\acute{\Uptheta}))= - \log \frac{\kappa^{\frac{p}{2}-1}}{(2 \pi)^{\frac{p}{2}}I_{\frac{p}{2}-1}(\kappa)}+ \log \frac{\acute{\kappa}^{\frac{p}{2}-1}}{(2 \pi)^{\frac{p}{2}}I_{\frac{p}{2}-1}(\acute{\kappa})}+ [\kappa{\varvec{\mu}} - \acute{\kappa}{\acute{\varvec{\mu}}} ]^T \acute{a}_{p}(\kappa){\varvec{\mu}} $$
(52)

Appendix 3: Proof of Eq. 42

In the case of Langevin model, we can show the Shannon entropy is given by

$$ \begin{aligned} H[p({\bf X}|\theta)]&= - \int\limits_{\Upomega} p({\bf X}|\theta) \log p({\bf X}|\theta) \,{\rm d}{\bf X}\\ &= - \int\limits_{\Upomega} p({\bf X}|\theta) \left[ \sum_{p=1}^{P} \kappa {\varvec{\mu}}^{T} X-a_{p}(\kappa)\right] \,{\rm d}{\bf X}\\ &= - \left[ E_{\theta}[\sum_{p=1}^{P} \kappa {\varvec{\mu}}^{T} {\bf X}]- a_{p}(\kappa)\right]\\ &= \kappa a'_{p}(\kappa) {\varvec{\mu}}^{T}{\varvec{\mu}} - a_{p}(\kappa) \end{aligned} $$
(53)

Substitute Eq. 53 into Eq. 41 we obtain the Jensen–Shannon divergence for Langevin model.

Appendix 4: Proof of Eq. 44

We can show that the Rényi divergence between two Langevin distribution is given by

$$ \begin{aligned} &\int\limits_{\Upomega} p({\bf X}|\Uptheta)^{\sigma} q({\bf X}|\acute{\Uptheta})^{1-\sigma} \,{\rm d}{\bf X}\\ &\quad= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\sigma} \left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{1-\sigma}\int\limits _{\Upomega} \left( e^{\kappa {\varvec{\mu}}^{T}{\bf X}}\right)^{\sigma} \left( e^{\acute{\kappa} {\acute{\varvec{\mu}}}^{T}{\bf X}}\right)^{1-\sigma} \,{\rm d}{\bf X}\\ &\quad= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\sigma} \left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{1-\sigma}\int\limits _{\Upomega} e^{(\kappa {\varvec{\mu}}^{T}{\bf X} \sigma + \acute{\kappa}{\acute{\varvec{\mu}}}^{T}{\bf X}(1-\sigma))}\,{\rm d}{\bf X} \end{aligned} $$
(54)

Assume that \(\zeta_{\kappa,\acute{\kappa}}= \sqrt{(\sigma\kappa)^2+((1-\sigma)\acute{\kappa})^2+2\sigma\kappa(1-\sigma)\acute{\kappa}({\varvec{\mu}}\cdot {\acute{\varvec{\mu}}})}\) and \(\psi_{{\varvec{\mu}},\acute{\varvec{\mu}}}= \frac{\sigma\kappa{\varvec{\mu}}+(1-\sigma)\acute{\kappa}{\acute{\varvec{\mu}}}}{\zeta_{\kappa,\acute{\kappa}}},\) and hence

$$ \int\limits_{\Upomega} p({\bf X})^{\sigma} q({\bf X})^{1-\sigma} \,{\rm d}{\bf X}= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\sigma}\left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{1-\sigma} \left[ \frac{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\zeta_{\kappa,\acute{\kappa}})}{(\zeta_{\kappa,\acute{\kappa}})^{\frac{p}{2}-1}} \right] $$
(55)

By substituting Eq. 55 in Eq. 43 we obtain the symmetric Rényi divergence for two Langevin distributions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amayri, O., Bouguila, N. Beyond hybrid generative discriminative learning: spherical data classification. Pattern Anal Applic 18, 113–133 (2015). https://doi.org/10.1007/s10044-013-0323-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-013-0323-0

Keywords

Navigation