An Extensive Selection of Features as Combinations for Automatic Text Categorization

Sohail, Aamir; Kotha, Chaitanya; Chavali, Rishanth Kanakadri; Meghana, Krishna; Manne, Suneetha; Fatima, Sameen

doi:10.1007/978-3-319-02931-3_42

Aamir Sohail⁵,
Chaitanya Kotha⁵,
Rishanth Kanakadri Chavali⁵,
Krishna Meghana⁵,
Suneetha Manne⁵ &
…
Sameen Fatima⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 247))

2243 Accesses

Abstract

The merits of modern web search engines that intend to access such pages limit relatively to users’ requirement relying highly on information retrieval techniques. For accessing most relevant user subject specific pages, building a categorization system that can analyse the content and present information precisely could be a good alternative. For Text Categorization, most of the researchers relied highly on trained dataset. Each trained dataset is usually large in size due to which most approximations, computations are time consuming. This makes the entire categorization system slow and inaccurate. The proposed method is novel and the number of features is used. This paper explores the effect of word and other values of word in the document, which express the features of a word in the document. The proposed features are exploited by tf-itf, position of the word and compactness. These features are combined and evaluated. The Experimental results showed a significant improvement in Text categorization process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ian, H.: Witten Computer Science, University of Waikato. Hamilton, New Zealand
Google Scholar
Xue, X.-B., Zhou, Z.-H.: Distributional Features for Text Categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 497–508. Springer, Heidelberg (2006)
Google Scholar
Pattern Recognition and Machine Learning, Christopher Bishop. Springer (2006)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification. Wiley and Sons
Google Scholar
Ng, A.Y., Jordan, M.I.: On Discriminative vs. Generative Classifiers: A comparison of Logistic Regression and Naive Bayes. Neural Information Processing Systems (2002)
Google Scholar
Li, B., Yu, S., Lu, Q.: An Improved k-Nearest Neighbor Algorithm for Text Categorization Institute of Computational Linguistics Department of Computer Science and Technology Peking University, Beijing, P.R. China, 100871
Google Scholar
Auria, L., Rouslan: Support Vector Machines (SVM) as a Technique for Solvency Analysis
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Google Scholar
Lewis, D.D.: An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–50 (1992)
Google Scholar
Mladeni, D., Grobelink, M.: Word Sequences as Features in Text Learning. In: Proceedings of the 17th Electro Technical and Computer Science Conference (ERK 1998). IEEE section, Ljubljana (1998)
Google Scholar
Xue, X.-B., Zhou, Z.-H.: Distributional features for text categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 497–508. Springer, Heidelberg (2006)
Google Scholar
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. of Int’l Conf. on Machine Learning, pp. 412–420 (1997)
Google Scholar
Zečević, A.: On feature distributional clustering for text categorization. In: Proceedings of the Student Research Workshop Associated with RANLP, pp. 145–149. Hissar, Bulgaria (September 13, 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of IT, VRSEC, Vijayawada, India
Aamir Sohail, Chaitanya Kotha, Rishanth Kanakadri Chavali, Krishna Meghana & Suneetha Manne
Depratment of CSE, Osmania University, Hyderabad, India
Sameen Fatima

Authors

Aamir Sohail
View author publications
You can also search for this author in PubMed Google Scholar
Chaitanya Kotha
View author publications
You can also search for this author in PubMed Google Scholar
Rishanth Kanakadri Chavali
View author publications
You can also search for this author in PubMed Google Scholar
Krishna Meghana
View author publications
You can also search for this author in PubMed Google Scholar
Suneetha Manne
View author publications
You can also search for this author in PubMed Google Scholar
Sameen Fatima
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, Andhra Pradesh, India
Suresh Chandra Satapathy
University of Hyderabad, Hyderabad, Andhra Pradesh, India
Siba K Udgata
Bhubaneswar Engineering College, Bhubaneswar, India
Bhabendra Narayan Biswal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sohail, A., Kotha, C., Chavali, R.K., Meghana, K., Manne, S., Fatima, S. (2014). An Extensive Selection of Features as Combinations for Automatic Text Categorization. In: Satapathy, S., Udgata, S., Biswal, B. (eds) Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013. Advances in Intelligent Systems and Computing, vol 247. Springer, Cham. https://doi.org/10.1007/978-3-319-02931-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-02931-3_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02930-6
Online ISBN: 978-3-319-02931-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics