Skip to main content

A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization

  • Conference paper
Networked Digital Technologies (NDT 2010)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 88))

Included in the following conference series:

Abstract

Feature reduction methods have been successfully applied to text categorization. In this paper, we perform a comparative study on three feature reduction methods for text categorization, including Document Frequency (DF), Term Frequency Inverse Document Frequency (TFIDF) and Latent Semantic Analyses (LSA). Our feature set is relatively large (since there are thousands of different terms in different texts files). We propose the use of the previous feature reduction methods as a preprocessor of Back-Propagation Neural Network (BPNN) to reduce the input data on training process. The experimental results on an Arabic data set demonstrate that among the three dimensionality reduction techniques proposed, TFIDF was found to be the most effective in reducing the dimensionality of the feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Duwairi, R.M., Al-Refai, M.N., Khasawneh, N.: Feature Reduction Techniques for Arabic Text Categorization. Journal of the American society for information science and technology 60(11), 2347–2352 (2009)

    Article  Google Scholar 

  2. Encyclopedia of the Nine Books for the Honorable Prophetic Traditions, Sakhr Company, http://www.Harf.com

  3. Harrag, F., El-Qawasmeh, E.: Neural Network for Arabic Text Classification. In: The Second International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2009), pp. 778–783 (2009)

    Google Scholar 

  4. Lam, S.L.Y., Lee, D.L.: Feature Reduction for Neural Network Based Text Categorization. In: Sixth International Conference on Database Systems for Advanced Applications (DASFAA 1999), pp. 195–202 (1999)

    Google Scholar 

  5. Larkey, L., Ballesteros, L., Connell, M.E.: Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis. In: Proceedings of SIGIR 2002, pp. 275–282 (2002)

    Google Scholar 

  6. Mesleh, A.A.: Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System. Journal of Computer Science 3(6), 430–435 (2007)

    Article  Google Scholar 

  7. Salton, G., Buckley, C.: Term-weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  8. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  9. Syiam, M.M., Fayed, Z.T., Habib, M.B.: An Intelligent System for Arabic Text Categorization. International Journal of Intelligent Computing and Information Sciences 6(1), 1–19 (2006)

    Google Scholar 

  10. Wermeter, S.: Neural Network Agents for Learning Semantic Text Classification. Information Retrieval 3(2), 87–103 (2000)

    Article  Google Scholar 

  11. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  12. Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: 22nd ACM International Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 42–49. ACM Press, New York (1999)

    Google Scholar 

  13. Yu, B., Zong-ben, X., Cheng-hua, L.: Latent Semantic Analysis for Text Categorization Using Neural Network. Knowledge-Based Systems Journal 21, 900–904 (2008)

    Article  Google Scholar 

  14. Zahran, B.M., Kanaan, G.: Text Feature Selection using Particle Swarm Optimization Algorithm. World Applied Sciences Journal 7 (Special Issue of Computer & IT), 69–74 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Harrag, F., El-Qawasmeh, E., Al-Salman, A.M.S. (2010). A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds) Networked Digital Technologies. NDT 2010. Communications in Computer and Information Science, vol 88. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14306-9_67

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14306-9_67

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14305-2

  • Online ISBN: 978-3-642-14306-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics