Skip to main content

Abstract

The merits of modern web search engines that intend to access such pages limit relatively to users’ requirement relying highly on information retrieval techniques. For accessing most relevant user subject specific pages, building a categorization system that can analyse the content and present information precisely could be a good alternative. For Text Categorization, most of the researchers relied highly on trained dataset. Each trained dataset is usually large in size due to which most approximations, computations are time consuming. This makes the entire categorization system slow and inaccurate. The proposed method is novel and the number of features is used. This paper explores the effect of word and other values of word in the document, which express the features of a word in the document. The proposed features are exploited by tf-itf, position of the word and compactness. These features are combined and evaluated. The Experimental results showed a significant improvement in Text categorization process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ian, H.: Witten Computer Science, University of Waikato. Hamilton, New Zealand

    Google Scholar 

  2. Xue, X.-B., Zhou, Z.-H.: Distributional Features for Text Categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 497–508. Springer, Heidelberg (2006)

    Google Scholar 

  3. Pattern Recognition and Machine Learning, Christopher Bishop. Springer (2006)

    Google Scholar 

  4. Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification. Wiley and Sons

    Google Scholar 

  5. Ng, A.Y., Jordan, M.I.: On Discriminative vs. Generative Classifiers: A comparison of Logistic Regression and Naive Bayes. Neural Information Processing Systems (2002)

    Google Scholar 

  6. Li, B., Yu, S., Lu, Q.: An Improved k-Nearest Neighbor Algorithm for Text Categorization Institute of Computational Linguistics Department of Computer Science and Technology Peking University, Beijing, P.R. China, 100871

    Google Scholar 

  7. Auria, L., Rouslan: Support Vector Machines (SVM) as a Technique for Solvency Analysis

    Google Scholar 

  8. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Google Scholar 

  9. Lewis, D.D.: An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–50 (1992)

    Google Scholar 

  10. Mladeni, D., Grobelink, M.: Word Sequences as Features in Text Learning. In: Proceedings of the 17th Electro Technical and Computer Science Conference (ERK 1998). IEEE section, Ljubljana (1998)

    Google Scholar 

  11. Xue, X.-B., Zhou, Z.-H.: Distributional features for text categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 497–508. Springer, Heidelberg (2006)

    Google Scholar 

  12. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. of Int’l Conf. on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  13. Zečević, A.: On feature distributional clustering for text categorization. In: Proceedings of the Student Research Workshop Associated with RANLP, pp. 145–149. Hissar, Bulgaria (September 13, 2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sohail, A., Kotha, C., Chavali, R.K., Meghana, K., Manne, S., Fatima, S. (2014). An Extensive Selection of Features as Combinations for Automatic Text Categorization. In: Satapathy, S., Udgata, S., Biswal, B. (eds) Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013. Advances in Intelligent Systems and Computing, vol 247. Springer, Cham. https://doi.org/10.1007/978-3-319-02931-3_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02931-3_42

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02930-6

  • Online ISBN: 978-3-319-02931-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics