Skip to main content

Kernel-Based Text Classification on Statistical Manifold

  • Conference paper
Advances in Neural Networks - ISNN 2008 (ISNN 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5263))

Included in the following conference series:

  • 3044 Accesses

Abstract

In the text literature, a variety of useful kernel methods have been developed by many researchers. However, embedding text data into Euclidean space is the key characteristic of common kernels-based text categorization. In this paper, we focus on representation text vectors as points on Riemann manifold and use kernels to integrate discriminative and generative model. And then, we present diffuse kernel based on Dirichlet Compound Multinomial manifold (DCM manifold) which is a space about Dirichlet Compound Multinomial model combining inverse document frequency and information gain. More specifically, as demonstrated by our experimental results on various real-world text datasets, we show that the kernel based on this DCM manifold is more desirable than Euclidean space for text categorization. And our kernel method provides much better computational accuracy than some current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  2. Jaakkola, T.S., Haussler, D.: Exploiting Generative Models in Discriminative Classifier. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems, Denver, Colorado, USA, vol. 11, pp. 487–493. MIT Press, Cambridge (1999)

    Google Scholar 

  3. Jebara, T., Kondor, R., Howard, A.: Probability Product Kernels. The Journal of Machine Learning Research 5, 819–844 (2004)

    MathSciNet  Google Scholar 

  4. Kondor, R., Lafferty, J.: Diffusion Kernels on Graphs and Other Discrete Input Spaces. In: Proceedings of the Nineteenth International Conference on Machine Learning, San Mateo, CA, USA, pp. 315–322. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  5. Lafferty, J., Lebanon, G.: Diffusion Kernels on Statistical Manifolds. Journal of Machine Learning Research(JMLR) 6, 129–163 (2005)

    MathSciNet  Google Scholar 

  6. Zhang, D., Chen, X., Lee, W.S.: Text Classification with Kernels on the Multinomial Manifold. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, pp. 266–273. ACM Press, New York (2005)

    Chapter  Google Scholar 

  7. Madsen, R.E., Kauchak, D., Elkan, C.: Modeling Word Burstiness Using the Dirichlet Distribution. In: Proceedings of the 22nd International Conference on Machine Learning, New York, NY, USA, pp. 545–552. Morgan Kaufmann, San Francisco (2005)

    Chapter  Google Scholar 

  8. Lebanon, G.: Metric Learning for Text Documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 497–508 (2006)

    Article  Google Scholar 

  9. Minka, T.: Estimating a Dirichlet Distribution (unpublished Paper, 2003) http://research.microsoft.com/~minka

  10. Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, S., Feng, S., Liu, Y. (2008). Kernel-Based Text Classification on Statistical Manifold. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87732-5_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87732-5_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87731-8

  • Online ISBN: 978-3-540-87732-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics