research-article

A comparative study for Arabic text classification algorithms based on stop words elimination

Authors:
Bassam Al-Shargabi

Al-Isra University, Amman-Jordan

Al-Isra University, Amman-Jordan
View Profile

,
Waseem Al-Romimah

University of Science and Technology, Sana'a-Yemen

University of Science and Technology, Sana'a-Yemen
View Profile

,
Fekry Olayah

Al-Isra University, Amman-Jordan

Al-Isra University, Amman-Jordan
View Profile

ISWSA '11: Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and ApplicationsApril 2011Article No.: 11Pages 1–5https://doi.org/10.1145/1980822.1980833

Published:18 April 2011Publication History

ISWSA '11: Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications

Pages 1–5

ABSTRACT

This paper compares three techniques for Arabic text classification; these techniques are Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO), Naïve Bayesian (NB), and J48. The main objective of this paper is to measure the accuracy for each classifier and to determine which classifier is more accurate for Arabic text classification based on stop words elimination. The accuracy for classifier is measured by Percentage split method (holdout), and K-fold cross validation methods,. The results show that the SMO classifier achieves the highest accuracy and the lowest error rate, and shows that the time needed to build the SMO model is the smallest time.

References

Al-Harbi, S., Almuhareb, A. Al-Thubaity, A. Khorsheed, M. S., and Al-Rajeh, A. 2008. Automatic Arabic Text Classification, 9es Journées internationales d'Analyse statistique des Données Textuelles.Google Scholar
Al-Shalabi, R., Kanaan, G., Jaam, J.M.. Hasnah, A. and Hilat, E. 2004. Stop-word Removal Algorithm for Arabic Language. Proceedings of 1st International Conference on Information & Communication Technologies: from Theory to Applications, CTTA'04, (Damascus, Syria, April 2004). IEEE-France, 545--550.Google Scholar
Sawaf, H. Zaplo, J. and Ney, H. 2001. Statistical Classification Methods for Arabic News Articles. In Proceedings of the ACL/EACL 2001 Workshop on Arabic Language Processing: Status and Prospects, Toulouse, France.Google Scholar
El Kourdi, M. Bensaid, A., and Tajje-eddine, R. 2004. Automatic Arabic Document Categorization Based on the Naive Bayes Algorithm. In proceedings of the COLING- 2004 Workshop on Computational Approaches to Arabic Script Based Languages, Switzerland, 51--58. Google ScholarDigital Library
Gharib, T.F, and Badieh H.M, 2009, Arabic Text Classification Using Support Vector Machines, International Journal of Computers and Their Applications, 16, 4.Google Scholar
Abo Alkhair, A. 2006. Effect of stop words removing for Arabic information Retrieval. International journal of computing & information science, 4, 3 (Dec- 2006).Google Scholar
Dina A Said, Nayer M Wanas, Nevin M Darwish et al. 2009. A Study of Text Preprocessing Tools for Arabic Text Categorization. In The Second International Conference on Arabic Language. 230--236Google Scholar
El-Kourdi M., Bensaid A. and Rachidi T. 2004. Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm. 20th International Conference on Computational Linguistics. August, GenevaGoogle Scholar
Mitchell, T. Machine Learning, McGraw-Hill, New York. 1997. Google ScholarDigital Library
Rogati, M. and Yang, Y. 2002. High-Performing Feature Selection for Text classification. In Proceedings of the eleventh international conference on Information and knowledge management CIKM'02, 659--661. Google ScholarDigital Library
Mena, B. H., Zaki T. F., and Tarek, F. G. 2006. A Hybrid Feature Selection Approach for Arabic Documents Classification, Egyptian Computer Science Journal, 28, 4, (2006): 1--7.Google Scholar
John P. 1998. Sequetial minimal optimization: A fast algorithm for training support vector machine. Technical Report MST-TR-98-14. Microsoft Research.Google Scholar
Evegniy, G. and M. Shaul, 2004. Text Classification with many redundant features: Using aggressive feature selection to make svms competitive with C4.5. Proceeding of the 21st International Conference Machine Learning, July 4--8, Banff, Alberta, Canada, pp: 41. http://Doi.acm.org/10.1145/1015330.1015388. Google ScholarDigital Library
Witten, I.H., Frank, E. 2005. Data mining: practical machine learning tools and techniques, 2<sup>nd</sup> edn. Morgan Kaufmann, San Francisco. Google ScholarDigital Library

Index Terms

A comparative study for Arabic text classification algorithms based on stop words elimination
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

An Experimental Study for the Effect of Stop Words Elimination for Arabic Text Classification Algorithms

In this paper, an experimental study was conducted on three techniques for Arabic text classification. These techniques are Support Vector Machine SVM with Sequential Minimal Optimization SMO, Naïve Bayesian NB, and J48. The paper assesses the accuracy ...
Read More
Efficient Feature Representation Based on the Effect of Words Frequency for Arabic Documents Classification
ICTCE '18: Proceedings of the 2nd International Conference on Telecommunications and Communication Engineering

This paper is based on the influence of the frequency of words in the classification of Arabic documents, its effects on the representation of characteristics namely Bag of word (Bow) and Term frequency- Inverse Documents Frequency (TF-IDF). Three ...
Read More
Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier Models
SITA'20: Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications

Text classification is the process of assigning appropriate categories to free text according to its content. It is one of the important task in Text mining. Numerous studies have been conducted for natural languages processing using Japanese, French, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISWSA '11: Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications
April 2011
112 pages
ISBN:9781450304740
DOI:10.1145/1980822
Conference Chair:
Ayman Alnsour,
Program Chair:
Shadi Aljawarneh
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Arabic text classification
naive bayesian
stop word elimination
support vector machine
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 379
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A comparative study for Arabic text classification algorithms based on stop words elimination

ISWSA '11: Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Experimental Study for the Effect of Stop Words Elimination for Arabic Text Classification Algorithms

Efficient Feature Representation Based on the Effect of Words Frequency for Arabic Documents Classification

Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier Models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A comparative study for Arabic text classification algorithms based on stop words elimination

ISWSA '11: Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Experimental Study for the Effect of Stop Words Elimination for Arabic Text Classification Algorithms

Efficient Feature Representation Based on the Effect of Words Frequency for Arabic Documents Classification

Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier Models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media