Skip to main content

Representative Term Based Feature Selection Method for SVM Based Document Classification

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3681))

Abstract

This paper describes a document classifier for web documents in the fields of Information Technology and uses SVM to learn a model, which is constructed from the training sets and its representative terms. To reduce information overload, it needs to exploit automatic text classification for handling enormous documents. Support Vector Machine (SVM) is a model that is calculated as a weighted sum of kernel function outputs. The basic idea is to exploit the representative terms meaning distribution in coherent thematic texts of each category by simple statistics methods. Vector-space model is applied to represent documents in the categories by using feature selection scheme based on TFiDF. We apply a category factor which represents effects in category of any term to the feature selection. Experiments show the results of categorization and the correlation of vector length.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yan, T.W., Garcia-Molina, H.: Sift - a tool for wide-area information dissemination. In: Proceedings of the 1995 USENIX Technical Conference, pp. 177–186 (1995)

    Google Scholar 

  2. Salton, G.: Automatic Text Processing: The Transformation Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  3. Vapnik, V.: Statistical Learning Tehory. John Wiley and Sons, Inc., New York (1998)

    Google Scholar 

  4. Chapelle, O., Haffner, P., Vapnik, V.: Svm for histogram-based image classification. IEEE Trans. on Neural Networks 10(5), 1055–1065 (1999)

    Article  Google Scholar 

  5. Yang, Y., Pdedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. Of the 14th Internatinal Conference on Machine Learning ICML 1997, pp. 412–429 (1997)

    Google Scholar 

  6. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  7. Joachims: SVMLight (1998), http://ais.gmd.de/~thorsten/svm_light

  8. Lewis, D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proc. SIGIR 1994, Dublin, Ireland, pp. 3–12 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kang, Y. (2005). Representative Term Based Feature Selection Method for SVM Based Document Classification. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552413_9

Download citation

  • DOI: https://doi.org/10.1007/11552413_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28894-7

  • Online ISBN: 978-3-540-31983-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics