ABSTRACT
This paper compares three techniques for Arabic text classification; these techniques are Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO), Naïve Bayesian (NB), and J48. The main objective of this paper is to measure the accuracy for each classifier and to determine which classifier is more accurate for Arabic text classification based on stop words elimination. The accuracy for classifier is measured by Percentage split method (holdout), and K-fold cross validation methods,. The results show that the SMO classifier achieves the highest accuracy and the lowest error rate, and shows that the time needed to build the SMO model is the smallest time.
- Al-Harbi, S., Almuhareb, A. Al-Thubaity, A. Khorsheed, M. S., and Al-Rajeh, A. 2008. Automatic Arabic Text Classification, 9es Journées internationales d'Analyse statistique des Données Textuelles.Google Scholar
- Al-Shalabi, R., Kanaan, G., Jaam, J.M.. Hasnah, A. and Hilat, E. 2004. Stop-word Removal Algorithm for Arabic Language. Proceedings of 1st International Conference on Information & Communication Technologies: from Theory to Applications, CTTA'04, (Damascus, Syria, April 2004). IEEE-France, 545--550.Google Scholar
- Sawaf, H. Zaplo, J. and Ney, H. 2001. Statistical Classification Methods for Arabic News Articles. In Proceedings of the ACL/EACL 2001 Workshop on Arabic Language Processing: Status and Prospects, Toulouse, France.Google Scholar
- El Kourdi, M. Bensaid, A., and Tajje-eddine, R. 2004. Automatic Arabic Document Categorization Based on the Naive Bayes Algorithm. In proceedings of the COLING- 2004 Workshop on Computational Approaches to Arabic Script Based Languages, Switzerland, 51--58. Google ScholarDigital Library
- Gharib, T.F, and Badieh H.M, 2009, Arabic Text Classification Using Support Vector Machines, International Journal of Computers and Their Applications, 16, 4.Google Scholar
- Abo Alkhair, A. 2006. Effect of stop words removing for Arabic information Retrieval. International journal of computing & information science, 4, 3 (Dec- 2006).Google Scholar
- Dina A Said, Nayer M Wanas, Nevin M Darwish et al. 2009. A Study of Text Preprocessing Tools for Arabic Text Categorization. In The Second International Conference on Arabic Language. 230--236Google Scholar
- El-Kourdi M., Bensaid A. and Rachidi T. 2004. Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm. 20th International Conference on Computational Linguistics. August, GenevaGoogle Scholar
- Mitchell, T. Machine Learning, McGraw-Hill, New York. 1997. Google ScholarDigital Library
- Rogati, M. and Yang, Y. 2002. High-Performing Feature Selection for Text classification. In Proceedings of the eleventh international conference on Information and knowledge management CIKM'02, 659--661. Google ScholarDigital Library
- Mena, B. H., Zaki T. F., and Tarek, F. G. 2006. A Hybrid Feature Selection Approach for Arabic Documents Classification, Egyptian Computer Science Journal, 28, 4, (2006): 1--7.Google Scholar
- John P. 1998. Sequetial minimal optimization: A fast algorithm for training support vector machine. Technical Report MST-TR-98-14. Microsoft Research.Google Scholar
- Evegniy, G. and M. Shaul, 2004. Text Classification with many redundant features: Using aggressive feature selection to make svms competitive with C4.5. Proceeding of the 21st International Conference Machine Learning, July 4--8, Banff, Alberta, Canada, pp: 41. http://Doi.acm.org/10.1145/1015330.1015388. Google ScholarDigital Library
- Witten, I.H., Frank, E. 2005. Data mining: practical machine learning tools and techniques, 2<sup>nd</sup> edn. Morgan Kaufmann, San Francisco. Google ScholarDigital Library
Index Terms
- A comparative study for Arabic text classification algorithms based on stop words elimination
Recommendations
An Experimental Study for the Effect of Stop Words Elimination for Arabic Text Classification Algorithms
In this paper, an experimental study was conducted on three techniques for Arabic text classification. These techniques are Support Vector Machine SVM with Sequential Minimal Optimization SMO, Naïve Bayesian NB, and J48. The paper assesses the accuracy ...
Efficient Feature Representation Based on the Effect of Words Frequency for Arabic Documents Classification
ICTCE '18: Proceedings of the 2nd International Conference on Telecommunications and Communication EngineeringThis paper is based on the influence of the frequency of words in the classification of Arabic documents, its effects on the representation of characteristics namely Bag of word (Bow) and Term frequency- Inverse Documents Frequency (TF-IDF). Three ...
Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier Models
SITA'20: Proceedings of the 13th International Conference on Intelligent Systems: Theories and ApplicationsText classification is the process of assigning appropriate categories to free text according to its content. It is one of the important task in Text mining. Numerous studies have been conducted for natural languages processing using Japanese, French, ...
Comments