skip to main content
10.1145/3231830.3231833acmotherconferencesArticle/Chapter ViewAbstractPublication PagesawictConference Proceedingsconference-collections
research-article

A new similarity measure for automatic text categorization based on vector space model

Published: 13 November 2017 Publication History

Abstract

Text classification is the process of assigning a predefine class or category to an anonymous text based on its content. It is an important task in text mining. Several Text classification algorithms were developed for natural languages, such as English, Chinese and Dutch. However, the number of related works for Arabic is limited. In this research, we will attempt to generalize the method to compute category representative vectorand propose a new similarity measure(referred to, hereafter, as origin-similarity) based on aVector Space Model to classify Arabic documents and compare proposed method with well-known similarity techniques.The measurement used a dataset that consists of 250 Arabictextsindependently classified into five classes: art and culture, economics, politics, society, and sport. The experimental findings show that Arabic text classification using VSMprovides the best results and could attribute the category of a text with an accuracy of 91 %.

References

[1]
CHERIF, W., MADANI, A. and KISSI, M. 2015. A new modeling approach for Arabic opinion mining recognition. (25-26 March2015), 1--6.
[2]
ZENG, C., LU, Z. and GU, J. 2008. A New Approach to Email Classification Using Concept Vector Space Model. (13-15 Dec.2008), 162--166.
[3]
HAMMAD, A. A. and EL-HALEES, A. 2015. An Approach for Detecting Spam in Arabic Opinion Reviews. The International Arab Journal of Information Technology 12, 1 (2015), 7.
[4]
MESLEH, A. M. D. A. 2007. Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System Journal of Computer Science 3 (2007), 6.
[5]
AL-SHALABI R., O. R. 2008. Improving KNN Arabic Text Classification with N-Grams Based Document Indexing. Proceedings of the Sixth International Conference on Informatics and Systems. Cairo, Egypt (March 27-29, 2008)
[6]
HARRAG, F. and EL-QAWASMAH, E. 2009. Neural Network for Arabic text classification. (4-6 Aug. 2009), 778--783.
[7]
MAJED ISMAIL HUSSIEN, F. O., MINWER AL-DWAN, AHLAM SHAMSAN. 2011. ARABIC TEXT CLASSIFICATION USING SMO, NAÏVE BAYESIAN, J48 ALGORITHMS. International Journal of Research and Reviews in Applied Sciences 9, 2 (November 2011), 10.
[8]
BAHASSINE, S., KISSI, M. and MADANI, A. 2014. New stemming for Arabic text classification using feature selection and decision trees. (2014), 200--205.
[9]
BAHASSINE, S., MADANI, A. and KISSI, M. 2016. An improved Chi-sqaure feature selection for Arabic text classification using decision tree. (19-20 Oct. 2016), 1--5.
[10]
AL-ANZI, F. S. and ABUZEINA, D. 2017. Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing. Journal of King Saud University - Computer and Information Sciences 29, 2 (2017), 189--195.
[11]
AL-HARBI, S., ALMUHAREB, A., AL-THUBAITY, A., KHORSHEED, M. S. and AL-RAJEH, A.2008. Automatic Arabic Text Classification2008).
[12]
KOURDI, M. E., BENSAID, A. and RACHIDI, T.-E.2004.Automatic Arabic document categorization based on the Naïve Bayes algorithm, Association for Computational Linguistics, Geneva, Switzerland,(2004).
[13]
ABABNEH, J., ALMOMANI, O., HADI, W., EL-OMARI, N. K. T. and AL-IBRAHIM, A. 2014. Vector Space Models to Classify Arabic Text. International Journal of Computer Trends and Technology (IJCTT) 7, 4 (2014), 219--223.
[14]
SUZUKI, M., YAMAGISHI, N., ISHIDA, T., GOTO, M. and HIRASAWA, S. 2010. On a new model for automatic text categorization based on Vector Space Model. (10-13 Oct. 2010), 3152--3159.
[15]
BAHASSINE, S., MADANI, A. and KISSI, M. 2017. ARABIC TEXT CLASSIFICATION USING NEW STEMMER FOR FEATURE SELECTION AND DECISION TREES. Journal of Engineering Science and Technology 12, 6 (2017), 1475--1487.
[16]
UYSAL, A. K. and GUNAL, S. 2014. The impact of preprocessing on text classification. Information Processing & Management 50, 1 (2014), 104--112.
[17]
SAAD, M. K. 2010. The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification. THESIS (2010).
[18]
AHMED, R. E. A. M. 2015. Arabic Text Classification review. International Journal of Computer Science and Software Engineering (IJCSSE) 4, 1 (2015), 5.
[19]
AL-SHALABI, R. and EVENS, M.1998.A computational morphology system for Arabic, Association for Computational Linguistics, Montreal, Quebec, Canada,(1998).
[20]
BOUDCHICHE, M., MAZROUI, A., OULD ABDALLAHI OULD BEBAH, M., LAKHOUAJA, A. and BOUDLAL, A. 2017. AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. Journal of King Saud University - Computer and Information Sciences 29, 2 (2017), 141--146.
[21]
KHOJA, S. and GARSIDE, R. 1999. Stemming Arabic Text. Lancaster, UK, Computing Department, Lancaster University, (1999).
[22]
CHERIF, W., MADANI, A. and KISSI, M. 2015. New rules-based algorithm to improve Arabic stemming accuracy. International Journal of Knowledge Engineering and Data Mining 3, 3-4 (2015), 315--336.
[23]
SALTON, G., WONG, A. and YANG, C. S. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613--620.
[24]
JOVITA, LINDA, HARTAWAN, A. and SUHARTONO, D. 2015. Using Vector Space Model in Question Answering System. Procedia Computer Science 59 (2015/01/01/ 2015), 305--311.
[25]
ABU-ERRUB, A. 2014. Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements. International Journal of Computer Applications 93 (May 2014), 6.
[26]
LIN, Y. S., JIANG, J. Y. and LEE, S. J. 2014. A Similarity Measure for Text Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 26, 7 (2014), 1575--1590.

Cited By

View all
  • (2023)A systematic review of Arabic text classification: areas, applications, and future directionsSoft Computing10.1007/s00500-023-08384-628:2(1545-1566)Online publication date: 9-May-2023
  • (2021)Text Summarization for Information of Famous Indian Historical MonumentsSoft Computing for Problem Solving10.1007/978-981-16-2709-5_38(499-509)Online publication date: 14-Oct-2021
  • (2020)Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier ModelsProceedings of the 13th International Conference on Intelligent Systems: Theories and Applications10.1145/3419604.3419778(1-5)Online publication date: 23-Sep-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AWICT 2017: Proceedings of the Second International Conference on Advanced Wireless Information, Data, and Communication Technologies
November 2017
116 pages
ISBN:9781450353106
DOI:10.1145/3231830
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • CNRS: Centre National De La Rechercue Scientifique

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Arabic text classification
  2. Similarity Coefficients
  3. Stemming
  4. Term frequency inverse document frequency
  5. Term weighting
  6. Vector Space Model

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AWICT 2017

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A systematic review of Arabic text classification: areas, applications, and future directionsSoft Computing10.1007/s00500-023-08384-628:2(1545-1566)Online publication date: 9-May-2023
  • (2021)Text Summarization for Information of Famous Indian Historical MonumentsSoft Computing for Problem Solving10.1007/978-981-16-2709-5_38(499-509)Online publication date: 14-Oct-2021
  • (2020)Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier ModelsProceedings of the 13th International Conference on Intelligent Systems: Theories and Applications10.1145/3419604.3419778(1-5)Online publication date: 23-Sep-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media