Beyond hybrid generative discriminative learning: spherical data classification

Amayri, Ola; Bouguila, Nizar

doi:10.1007/s10044-013-0323-0

Beyond hybrid generative discriminative learning: spherical data classification

Theoretical Advances
Published: 23 February 2013

Volume 18, pages 113–133, (2015)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Ola Amayri¹ &
Nizar Bouguila²

374 Accesses
Explore all metrics

Abstract

The blending of generative and discriminative approaches has been prevailed by exploring and adopting distinct characteristic of each approach toward constructing a complementar system combining the best of both. The majority of current research in classification and categorization does not completely address the true structure and nature of data for particular application at hand. In contrast to most previous research, our proposed work focuses on the modeling and classification of spherical data that are naturally generated in many data mining and knowledge discovery applications such as text classification, visual scenes categorization and gene expression analysis. This paper investigates a generative mixture model to cluster spherical data based on Langevin distribution. In particular, we formulate a unified probabilistic framework, where we build probabilistic kernels based on Fisher score and information divergences from mixture of Langevin distributions for Support Vector Machine. We demonstrate the effectiveness and the merits of the proposed learning framework through synthetic data and challenging applications involving spam filtering using both textual and visual email contents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised classification of spam emails with natural language stylometry

Article 03 November 2015

Supervised Machine Learning Classifier for Email Spam Filtering

Hybrid generative discriminative approaches based on Multinomial Scaled Dirichlet mixture models

Article 07 March 2019

Notes

In particular, the authors in [10] recommended strongly the normalization of data in feature space when considering SVM and have shown that normalization leads to considerably superior generalization performance.
More details and thorough discussions about the statistics of spherical data in particular and directional data in general can be found in [19].
Also known as the circular normal distribution [22].
Other approaches are possible also. For instance, in [35], the authors have used mixture of von Mises distributions learned using maximum likelihood for parameters estimation and bootstrap likelihood ratio approach to assess the optimal number of components and applied to study the problem of sudden infant death.
Using the superscript * to denote the optimal values of the cost function.
This localized data presentation alleviates many problems associated with representing data in complex applications (e.g. video categorization) such as data sparsity and curse of dimensionality.
http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
http://www.cs.cmu.edu/~enron/.
(open source by Google) http://code.google.com/p/tesseract-ocr/.
In [64] the threshold (t) has been set to 0.5, 0.9, 0.999, respectively, where $t= \frac{\lambda}{1 + \lambda}. $
Available at http://www.princeton.edu/cass/spam/spam_bench/.

References

Podolak IT, Roman A (2011) Cores: fusion of supervised and unsupervised training methods for a multi-class classification problem. Pattern Anal Appl 14(4):395–413
Article MathSciNet Google Scholar
Yang B, Chen S, Wu X (2011) A structurally motivated framework for discriminant analysis. Pattern Anal Appl 14(4):349–367
Article MathSciNet Google Scholar
Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer, Berlin
Bishop CM (2006) Pattern Recognition and Machine Learning, 1st edn. Springer, Berlin
Jebara T, Kondor R, Howard A (2004) Probability product kernels. J Mach Learn Res 5:819–844
MATH MathSciNet Google Scholar
Ng AY, Jordan MI (2001) On discriminative vs generative classifiers: a comparison of logistic regression and naive Bayes. In: Proceedings of 4th conference on advances in neural information processing systems. MIT Press, Cambridge, pp 841–848
Raina R, Shen Y, Ng AY, McCallum A (2003) Classification with hybrid generative/discriminative models. In: Proceedings of 16th conference on advances in neural information processing systems. MIT Press
Bosch A, Zisserman A, Noz XM (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727
Article Google Scholar
Prevost L, Oudot L, Moises A, Michel-Sendis C, Milgram M (2005) Hybrid generative/discriminative classifier for unconstrained character recognition. Pattern Recogn Lett 26(12):1840–1848
Article Google Scholar
Herbrich R, Graepel T (2000) A PAC-Bayesian margin bound for linear classifiers: why SVMs work. In: Proceedings of advances in neural information processing systems (NIPS). pp 224–230
Wittel GL, Wu SF (2004) On attacking statistical spam filters. In: Proceedings of the first conference on email and anti-spam (CEAS). California, USA
Amayri O, Bouguila N (2010) A study of spam filtering using support vector machines. Artif Intell Rev 34(1):73–108
Article Google Scholar
Graf AB, Smola AJ, Borer S (2003) Classification in a normalized feature space using support vector machines. IEEE Trans Neural Netw 14(3):597–605
Article Google Scholar
Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer, Berlin
Wallace CS, Dowe DL (2000) MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Stat Comput 10(1):73–83
Article Google Scholar
Dowe D, Oliver J, Wallace C (1996) MML estimation of the parameters of the spherical fisher distribution. In: Arikawa S, Sharma A (eds) Proceedings of the conference on algorithmic learning theory (ALT). Lecture Notes in computer science, vol 1160. Springer, Berlin, pp 213–227
Leopold E, Kindermann J (2002) Text categorization with support vector machines. How to represent texts in input space? Mach Learn 46(13):423–444
Article MATH Google Scholar
Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of of ECML-98, 10th European conference on machine learning, number 1398. Chemnitz, DE. Springer, Berlin, pp 137–142
Mardia KV (1975) Statistics of directional data (with discussions). J R Stat Soc Series B (Methodol) 37(3):349–393
MATH MathSciNet Google Scholar
Mardia KV (1972) Statistics of directional data. Academic Press, Waltham
Watson GS (1983) Statistics on spheres. Wiley, New York
MATH Google Scholar
Fisher NI (1993) Statistical analysis of circular data, 1st edn. Cambridge University Press, Cambridge
Fisher NI, Embleton BJJ, Lewis T (1993) Statistical analysis of spherical data. Cambridge University Press, Cambridge
McGraw T, Vemuri BC, Yezierski B, Mareci T (2006) von Mises–Fisher mixture model of the diffusion ODF. In: Proceedings of 3rd IEEE international symposium on biomedical imaging: from nano to macro, Arlington, VA, pp 65–68
Tang H, Chu SM, Huang TS (2009) Generative model-based speaker clustering via mixture of von Mises–Fisher distributions. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, Los Alamitos, CA, USA. IEEE Computer Society, pp 4101–4104
Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises–Fisher distributions. J Mach Learn Res 6:1345–1382
MATH MathSciNet Google Scholar
Mardia KV, Zemroch PJ (1975) Algorithm as 81: circular statistics. Appl Stat 24(1):147–150
Article Google Scholar
Mardia KV, Zemroch PJ (1975) Algorithm as 80: spherical statistics. Appl Stat 24(1):144–146
Article Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1–2):143–175
Article MATH Google Scholar
Dhillon I, Fan J, Guan Y (2001) Efficient Clustering of very large document collections. Kluwer, New York
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automatic Control 19(6):716–723
Article MATH MathSciNet Google Scholar
Schwarz G (1978) Estimating dimension of a model. Ann Stat 6:461–464
Article MATH Google Scholar
Rissanen J (1987) Modeling by shortest data description. Automatica 14:465–471
Article Google Scholar
Mooney JA, Helms PJ, Jolliffe IT (2003) Fitting mixtures of von Mises distributions: a case study involving sudden infant death syndrome. Comput Stat Data Anal 41:505–513
Article MATH MathSciNet Google Scholar
Bouguila N, Ziou D (2006) Unsupervised selection of a finite Dirichlet mixture model: an MML-based approach. IEEE Trans Knowl Data Eng 18(8):993–1009
Article Google Scholar
Bouguila N, Ziou D (2007) High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Trans Pattern Anal Mach Intell 29(10):1716–1731
Article Google Scholar
Mardia KV (1975) Distribution theory for the von Mises–Fisher distribution and its application. In: Kotz S, Patial GP, Ord JK (eds) Statistical distributions for scientific work, vol 1. pp 113–130
Agarwal A, Daumé H (2011) Generative kernels for exponential families. In: Proc. of the international conference on artificial intelligence and statistics (AISTAT)
Jaakkola TS, Haussler D (1998) Exploiting generative models in discriminative classifiers. In: Proceedings of advances in neural information systems (NIPS). MIT Press, Cambridge, pp 487–493
Chan AB, Vasconcelos N, Moreno PJ (2004) A family of probabilistic kernels based on information divergence. TechnicalReport SVCL-TR2004/01, University of California, SanDiego
Bouguila N (2012) Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans Knowl Data Eng 24:2184–2202
Article Google Scholar
Kullback S (1959) Information theory and statistics, 1st edn. Wiley, New York
Moreno PJ, Ho PP, Vasconcelos N (2003) A Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge
Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between gaussian mixture models. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), vol 4. pp 317–320
Lin J (1991) Divergence measure based on Shannon entropy. IEEE Trans Inf Theory 37(14):145–151
Article MATH Google Scholar
Rényi A (1960) On measures of entropy and information. In: Proceedings of Berkeley symposium mathematical statistics and probability. pp 547–561
Ulrich G (1984) Computer generation of distributions on the m-sphere. J Roy Stat Soc 33(2):158–163
MATH MathSciNet Google Scholar
Wood ATA (1994) Simulation of the von Mises Fisher distribution. Commun Stat Simul Comput 23(1):157–164
Article MATH Google Scholar
Blanzieri E, Bryl A (2008) A survey of learning-based techniques of email spam filtering. Artif Intell Rev 29:63–92
Article Google Scholar
Zhu Y, Tan Y (2011) A local-concentration-based feature extraction approach for spam filtering. IEEE Trans Inf Forensics Secur 6(2):486–497
Article Google Scholar
Özgür L, Güngör T (2012) Optimization of dependency and pruning usage in text classification. Pattern Anal Appl 15(1):45–58
Article MathSciNet Google Scholar
Cormack GV, Lynam TR (2007) Online supervised spam filter evaluation. ACM Trans Inf Syst 25:1–29
Article Google Scholar
Hershkop S, Stolfo SJ (2005) Combining email models for false positive reduction. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (KDD). pp 98–107
Chang M, Yih W, Meek C (2008) Partitioned logistic regression for spam filtering. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). pp 97–105
Yoshida K, Adachi F, Washio T, Motoda H, Homma T, Nakashima A, Fujikawa H, Yamazaki K (2004) Density-based spam detector. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD). pp 486–493
Chirita P, Diederich J, Nejdl W (2005) Mailrank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM). pp 373–380
Tseng C, Huang J, Chen M (2007) Promail: using progressive email social network for spam detection. In: Zhou Z, Li H, Yang Q (eds) PAKDD. Lecture notes in computer science, vol 4426. Springer, Berlin, pp 833–840
Wu CH (2009) Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst Appl 36(3):4321–4330
Article Google Scholar
Fumera G, Pillai I, Roli F (2006) Spam filtering based on the analysis of text information embedded into images. J Mach Learn Res 7:2699–2720
Google Scholar
Konstantinidis K, Vonikakis V, Panitsidis G, Andreadis I (2011) A center-surround histogram for content-based image retrieval. Pattern Anal Appl 14(3):251–260
Article MathSciNet Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Article Google Scholar
Androutsopoulos I, Koutsias J, Cb KV, Spyropoulos CD (2000) An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, New York, pp 160–167
Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. In: Proceedings of National Conference on artificial intelligence
Cormack GV, Lynam TR (2005) Trec 2005 spam track overview. In: Proceedings of the fourteenth text retrieval conference (TREC05), Gaithersburg, MD
Mehta B, Nangia S, Gupta M, Nejdl W (2008) Detecting image spam using visual features and near duplicate detection. In: Proceedings of the 17th international conference on World Wide Web (WWW). pp 497–506
Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: Proceedings of the 4th Conference on Email and Anti-Spam (CEAS). pp 487–493
Kailath T (1967) The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol 15(1):52–60
Article Google Scholar

Download references

Acknowledgments

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, Concordia University Montreal, Montreal, H3G 2W1, Canada
Ola Amayri
Concordia Institute for Information Systems Engineering, Concordia University Montreal, Montreal, H3G 2W1, Canada
Nizar Bouguila

Authors

Ola Amayri
View author publications
You can also search for this author inPubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Nizar Bouguila.

Appendices

Appendix 1: Proof of Eq. 30

In the case of Langevin model, we can show that

$$ \begin{aligned} &\int\limits_{\Upomega} p({\bf X}|\Uptheta)^{\rho} q({\bf X}|\acute{\Uptheta})^{\rho} \,{\rm d}{\bf X} \\ &\quad= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\rho} \left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{\rho}\int\limits_{\Upomega} \left( e^{\kappa {\varvec{\mu}}^{T}{\bf X}}\right)^{\rho} \left( e^{\acute{\kappa} \acute{{\varvec{\mu}}}^{T}{\bf X}}\right)^{\rho} \,{\rm d}{\bf X} \\ &\quad= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\rho} \left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{\rho}\int\limits _{\Upomega} e^{(\kappa {\varvec{\mu}}+ \acute{\kappa} {\acute{\varvec{\mu}}})^{T}{\bf X} \rho } \,{\rm d}{\bf X} \end{aligned} $$

(47)

The product of two Langevin can be written as

$$ M_{p}({\bf X}| {\varvec{\mu}},\kappa)M_{p}({\bf X}| {\acute{\varvec{\mu}}},{\acute{\kappa}})\propto M_{p}({\bf X}| \tau_{{\varvec{\mu}}, {\acute{\varvec{\mu}}}},\xi_{\kappa, \acute{\kappa}}) $$

where

$$ \begin{aligned} \xi_{\kappa, \acute{\kappa}}&= \sqrt{ \kappa^2+\acute{\kappa}^2+2\kappa\acute{\kappa}({\varvec{\mu}}\acute{{\varvec{\mu}}})}\\ \tau_{{{\varvec{\mu}}}, \acute{{\varvec{\mu}}}} &= \frac{\kappa{{\varvec{\mu}}}+\acute{\kappa}\acute{{\varvec{\mu}}}}{\xi_{\kappa, \acute{\kappa}}} \end{aligned} $$

(48)

Using 48 and Langevin integral, we obtain

$$ \int\limits_{\Upomega} p({\bf X}|\Uptheta)^{\rho} q({\bf X}|\acute{\Uptheta})^{\rho} \,{\rm d}{\bf X}= \left[ \left( \frac{\kappa \acute{\kappa}}{4} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{p} I_{\frac{p}{2}-1}(\kappa)I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{\rho} \left[ \frac{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\xi_{\kappa, \acute{\kappa}} \rho)}{(\xi_{\kappa, \acute{\kappa}} \rho)^{\frac{p}{2}-1}} \right] $$

(49)

Appendix 2: Proof of Eq. 36

The KL divergence between two exponential distributions is presented by [68]

$$ KL(p({\bf X}|\Uptheta), q({\bf X}|\acute{\Uptheta}))= \Upphi(\theta) - \Upphi(\acute{\theta}) + [G(\theta) - G(\acute{\theta})]^T E_{\theta}[T({\bf X})] $$

(50)

where E _θ is the expectation with respect to $p({\bf X}|\Uptheta), G(\theta)=(G_{1}(\theta), \ldots, G_{l}(\theta)),T({\bf X})= (T_{1}({\bf X}), \ldots, T_{l}({\bf X}))$ where l is the number of parameters of the distribution and T denotes transpose. Furthermore, we have the following:

$$ E_{\theta}[T({\bf X})]= - \acute{\Upphi}(\theta) $$

(51)

Then by letting $\Upphi_{\theta}= - a_{p}(\kappa)$ and $G_{\theta}= \kappa{\varvec{\mu}}. $ Thus, the KL divergence between two Langevin distributions is given as

$$ KL(p({\bf X}|\Uptheta), q({\bf X}|\acute{\Uptheta}))= - \log \frac{\kappa^{\frac{p}{2}-1}}{(2 \pi)^{\frac{p}{2}}I_{\frac{p}{2}-1}(\kappa)}+ \log \frac{\acute{\kappa}^{\frac{p}{2}-1}}{(2 \pi)^{\frac{p}{2}}I_{\frac{p}{2}-1}(\acute{\kappa})}+ [\kappa{\varvec{\mu}} - \acute{\kappa}{\acute{\varvec{\mu}}} ]^T \acute{a}_{p}(\kappa){\varvec{\mu}} $$

(52)

Appendix 3: Proof of Eq. 42

In the case of Langevin model, we can show the Shannon entropy is given by

$$ \begin{aligned} H[p({\bf X}|\theta)]&= - \int\limits_{\Upomega} p({\bf X}|\theta) \log p({\bf X}|\theta) \,{\rm d}{\bf X}\\ &= - \int\limits_{\Upomega} p({\bf X}|\theta) \left[ \sum_{p=1}^{P} \kappa {\varvec{\mu}}^{T} X-a_{p}(\kappa)\right] \,{\rm d}{\bf X}\\ &= - \left[ E_{\theta}[\sum_{p=1}^{P} \kappa {\varvec{\mu}}^{T} {\bf X}]- a_{p}(\kappa)\right]\\ &= \kappa a'_{p}(\kappa) {\varvec{\mu}}^{T}{\varvec{\mu}} - a_{p}(\kappa) \end{aligned} $$

(53)

Substitute Eq. 53 into Eq. 41 we obtain the Jensen–Shannon divergence for Langevin model.

Appendix 4: Proof of Eq. 44

We can show that the Rényi divergence between two Langevin distribution is given by

$$ \begin{aligned} &\int\limits_{\Upomega} p({\bf X}|\Uptheta)^{\sigma} q({\bf X}|\acute{\Uptheta})^{1-\sigma} \,{\rm d}{\bf X}\\ &\quad= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\sigma} \left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{1-\sigma}\int\limits _{\Upomega} \left( e^{\kappa {\varvec{\mu}}^{T}{\bf X}}\right)^{\sigma} \left( e^{\acute{\kappa} {\acute{\varvec{\mu}}}^{T}{\bf X}}\right)^{1-\sigma} \,{\rm d}{\bf X}\\ &\quad= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\sigma} \left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{1-\sigma}\int\limits _{\Upomega} e^{(\kappa {\varvec{\mu}}^{T}{\bf X} \sigma + \acute{\kappa}{\acute{\varvec{\mu}}}^{T}{\bf X}(1-\sigma))}\,{\rm d}{\bf X} \end{aligned} $$

(54)

Assume that $\zeta_{\kappa,\acute{\kappa}}= \sqrt{(\sigma\kappa)^2+((1-\sigma)\acute{\kappa})^2+2\sigma\kappa(1-\sigma)\acute{\kappa}({\varvec{\mu}}\cdot {\acute{\varvec{\mu}}})}$ and $\psi_{{\varvec{\mu}},\acute{\varvec{\mu}}}= \frac{\sigma\kappa{\varvec{\mu}}+(1-\sigma)\acute{\kappa}{\acute{\varvec{\mu}}}}{\zeta_{\kappa,\acute{\kappa}}},$ and hence

$$ \int\limits_{\Upomega} p({\bf X})^{\sigma} q({\bf X})^{1-\sigma} \,{\rm d}{\bf X}= \left[ \left( \frac{\kappa}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\kappa)} \right]^{\sigma}\left[ \left( \frac{\acute{\kappa}}{2} \right)^{\frac{p}{2}-1} \frac{1}{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\acute{\kappa})} \right]^{1-\sigma} \left[ \frac{(2 \pi)^{\frac{p}{2}} I_{\frac{p}{2}-1}(\zeta_{\kappa,\acute{\kappa}})}{(\zeta_{\kappa,\acute{\kappa}})^{\frac{p}{2}-1}} \right] $$

(55)

By substituting Eq. 55 in Eq. 43 we obtain the symmetric Rényi divergence for two Langevin distributions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amayri, O., Bouguila, N. Beyond hybrid generative discriminative learning: spherical data classification. Pattern Anal Applic 18, 113–133 (2015). https://doi.org/10.1007/s10044-013-0323-0

Download citation

Received: 12 March 2012
Accepted: 30 January 2013
Published: 23 February 2013
Issue Date: February 2015
DOI: https://doi.org/10.1007/s10044-013-0323-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond hybrid generative discriminative learning: spherical data classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Supervised classification of spam emails with natural language stylometry

Supervised Machine Learning Classifier for Email Spam Filtering

Hybrid generative discriminative approaches based on Multinomial Scaled Dirichlet mixture models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Eq. 30

Appendix 2: Proof of Eq. 36

Appendix 3: Proof of Eq. 42

Appendix 4: Proof of Eq. 44

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now