Combining Words and Concepts for Automatic Arabic Text Classification

Alahmadi, Alaa; Joorabchi, Arash; Mahdi, Abdulhussain E.

doi:10.1007/978-3-319-73500-9_8

Alaa Alahmadi¹⁴,
Arash Joorabchi¹⁴ &
Abdulhussain E. Mahdi¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 782))

Included in the following conference series:

International Conference on Arabic Language Processing

973 Accesses

Abstract

The paper examines combining words and concepts for text representation for Arabic Automatic Text Classification (ATC) and its impact on the accuracy of the classification, when used with various stemming methods and classifiers. An experimental Arabic ATC system was developed and the effects of its main components on the classification accuracy are assessed. Firstly, variants of the standard Bag-of-Words model with different stemming methods are examined and compared. Arabic Wikipedia and WordNet were examined and compared for providing concepts for effective Bag-of-Concepts representation. Based on this, Wikipedia was then utilized to provide concepts, and different strategies for combining words and concepts, including two new in-house developed approaches, were examined for effective Arabic text representation in terms of their impact on the overall classification accuracy. Our experimental results show that text representation is a key element in the performance of Arabic ATC, and combining words and concepts to represent Arabic text enhances the classification accuracy as compared to using words or concepts alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Comparative Study on Arabic Text Classification: Challenges and Opportunities

Arabic Stemming Techniques as Feature Extraction Applied in Arabic Text Classification

Machine Learning Implementations in Arabic Text Classification

References

Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)
Article MATH Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48 (1998)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34, 1–47 (2002)
Article Google Scholar
Hotho, A., Staab, S., Stumme, G.: Wordnet improves Text Document Clustering (2003)
Google Scholar
Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: IJCAI, vol. 5, pp. 1048–1053 (2005)
Google Scholar
Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In: AAAI, vol. 6, pp. 1301–1306 (2006)
Google Scholar
Kehagias, A., Petridis, V., Kaburlasos, V.G., Fragkou, P.: A comparison of word-and sense-based text categorization using several classification algorithms. J. Intell. Inf. Syst. 21, 227–247 (2003)
Article Google Scholar
de Buenaga Rodríguez, M., Hidalgo, J.M.G., Agudo, B.D.: Using WordNet to complement training information in text categorization. arXiv preprint cmp-lg/9709007 (1997)
Scott, S., Matwin, S.: Text classification using WordNet hypernyms. In: Use of WordNet in Natural Language Processing Systems, Proceedings of the Conference, pp. 38–44 (1998)
Google Scholar
Wang, P., Hu, J., Zeng, H.-J., Chen, L., Chen, Z.: Improving text classification by using encyclopedia knowledge, pp. 332–341 (2007)
Google Scholar
Wang, P., Hu, J., Zeng, H.-J., Chen, Z.: Using Wikipedia knowledge to improve text classification. Knowl. Inf. Syst. 19, 265–281 (2008)
Article Google Scholar
Benkhalifa, M., Mouradi, A., Bouyakhf, H.: Integrating external knowledge to supplement training data in semi-supervised learning for text categorization. Inf. Retr. 4, 91–113 (2001)
Article MATH Google Scholar
Hu, J., Fang, L., Cao, Y., Zeng, H.-J., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging Wikipedia semantics. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 179–186. ACM (2008)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering, pp. 541–544 (2003)
Google Scholar
Harrag, F., El-Qawasmah, E., Al-Salman, A.M.S.: Stemming as a feature reduction technique for arabic text categorization. In: 2011 10th International Symposium on Programming and Systems (ISPS), pp. 128–133. IEEE (2011)
Google Scholar
Syiam, M.M., Fayed, Z.T., Habib, M.B.: An intelligent system for Arabic text categorization. Int. J. Intell. Comput. Inf. Sci. 6, 1–19 (2006)
Article Google Scholar
Darwish, K., Oard, D.W.: Adapting morphology for Arabic information retrieval*. In: Soudi, A., van den Bosch, A., Neumann, G. (eds.) Arabic Computational Morphology. TLTB, vol. 38, pp. 245–262. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6046-5_13
Chapter Google Scholar
Al-Shammari, E.T.: Improving Arabic document categorization: introducing local stem. In: 2010 10th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 385–390. IEEE (2010)
Google Scholar
Larkey, L.S., Ballesteros, L., Connell, M.E.: Light stemming for Arabic information retrieval. In: Soudi, A., van den Bosch, A., Neumann, G. (eds.) Arabic Computational Morphology, vol. 38, pp. 221–243. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6046-5_12
Chapter Google Scholar
Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., Al-Rajeh, A.: Automatic Arabic text classification (2008)
Google Scholar
Moh'd A Mesleh, A.: Chi square feature extraction based SVMs Arabic language text categorization system. J. Comput. Sci. 3, 430–435 (2007)
Article Google Scholar
Kanaan, G., Al-Shalabi, R., Ghwanmeh, S., Al-Ma’adeed, H.: A comparison of text-classification techniques applied to Arabic text. J. Am. Soc. Inform. Sci. Technol. 60, 1836–1844 (2009)
Article Google Scholar
Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–282. ACM (2002)
Google Scholar
Alsaleem, S.: Automated Arabic text categorization using SVM and NB. Int. Arab J. e-Technol. 2, 124–128 (2011)
Google Scholar
Khreisat, L.: A machine learning approach for Arabic text classification using N-gram frequency statistics. J. Informetr. 3, 72–77 (2009)
Article Google Scholar
Khoja, S., Garside, R.: Stemming arabic text. Computing Department, Lancaster University, Lancaster, UK (1999)
Google Scholar
Al-Shalabi, R., Obeidat, R.: Improving KNN Arabic text classification with n-grams based document indexing. In: Proceedings of the Sixth International Conference on Informatics and Systems, Cairo, Egypt, pp. 108–112. Citeseer (2008)
Google Scholar
Elberrichi, Z., Abidi, K.: Arabic text categorization: a comparative study of different representation modes. Int. Arab J. Inf. Technol. (IAJIT) 9, 465–470 (2012)
Google Scholar
Yousif, S.A., Samawi, V.W., Elkabani, I., Zantout, R.: The Effect of Combining Different Semantic Relations on Arabic Text Classification
Google Scholar
Saad, M.K., Ashour, W.: Osac: open source arabic corpora. In: 6th ArchEng International Symposiums, EEECS, vol. 10 (2010)
Google Scholar
Milne, D., Witten, I.H.: An open-source toolkit for mining Wikipedia. Artif. Intell. 194, 222–239 (2013)
Article MathSciNet Google Scholar
Abbas, M., Smaili, K.: Comparison of topic identification methods for arabic language. In: Proceedings of International Conference on Recent Advances in Natural Language Processing, RANLP, pp. 14–17 (2005)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)
Article Google Scholar
Ben-Hur, A., Weston, J.: A user’s guide to support vector machines. In: Carugo, O., Eisenhaber, F. (eds.) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol. 609, pp. 223–239. Humana Press, New York (2010). https://doi.org/10.1007/978-1-60327-241-4_13
Chapter Google Scholar
Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. J. Artif. Intell. Res. 34, 443–498 (2009)
MATH Google Scholar
Duwairi, R., Al-Refai, M.N., Khasawneh, N.: Feature reduction techniques for Arabic text categorization. J. Am. Soc. Inform. Sci. Technol. 60, 2347–2352 (2009)
Article Google Scholar
Saad, M.K.: The impact of text preprocessing and term weighting on Arabic text classification. The Islamic University-Gaza (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Electronic and Computer Engineering Department, University of Limerick, Limerick, Ireland
Alaa Alahmadi, Arash Joorabchi & Abdulhussain E. Mahdi

Authors

Alaa Alahmadi
View author publications
You can also search for this author in PubMed Google Scholar
Arash Joorabchi
View author publications
You can also search for this author in PubMed Google Scholar
Abdulhussain E. Mahdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdulhussain E. Mahdi .

Editor information

Editors and Affiliations

Ex ENSA-USMBA, Fez, Morocco
Abdelmonaime Lachkar
EMI, UM5, Rabat, Morocco
Karim Bouzoubaa
FS, UMP, Oujda, Morocco
Azzedine Mazroui
IERA, UM5, Rabat, Morocco
Abdelfettah Hamdani
FS, UMP, Oujda, Morocco
Abdelhak Lekhouaja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alahmadi, A., Joorabchi, A., Mahdi, A.E. (2018). Combining Words and Concepts for Automatic Arabic Text Classification. In: Lachkar, A., Bouzoubaa, K., Mazroui, A., Hamdani, A., Lekhouaja, A. (eds) Arabic Language Processing: From Theory to Practice. ICALP 2017. Communications in Computer and Information Science, vol 782. Springer, Cham. https://doi.org/10.1007/978-3-319-73500-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-73500-9_8
Published: 05 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73499-6
Online ISBN: 978-3-319-73500-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics