research-article

A comparison of Text Classification methods Method of weighted terms selected by different Stemming Techniques

Authors:

K. El Moutaouakil,

Kh. SatoriAuthors Info & Claims

BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications

Article No.: 43, Pages 1 - 9

https://doi.org/10.1145/3090354.3090398

Published: 29 March 2017 Publication History

Abstract

In the retrieval information, three factors have an important impact on the systems performance: the stemmer algorithm, the extract feature method and the classification tool. In this work, we compare three well-known stemming Techniques: Lovins stemmer, iterated Lovins and snowball Stemmer. Concerning the classification phase, we compare, experimentally, five methods: BNET, NBMU, RF, SLogicF, and SVM. Basing on these latter, we propose a new retrieval system by calling the vote method to improve the performance of the classical systems. In this paper, we use the TFIDF algorithm to extract features. The envisaged systems are testing on two databases: BBCNEWS and BBCSPORT. The systems based on Lovins stemmers and on the voting technique give the best results. In fact, for the first databases, the best accuracy observed is for the system Lovins +Vote with a recognition rate about 97%. Concerning the second database, the system snowball +Vote that gives us 99% as recognition rate.

References

[1]

L. Soulier. 2014. Définition et évaluation de modèles de recherche d'information, collaborative basés sur les compétences de domaine et les rôles des utilisateurs: Université de Toulouse, Toulouse, 2014.

[2]

F. Damak. 2014. Etude Des Facteurs de Pertinence dans La Recherche de Microblogs: Université Paul Sabatier, 2014

[3]

A. W. C. S. Y. G. Salton. 1975. A vector space model for automatic indexing: Communications of the ACM, v.18 n.11, p.613--620, 1975.

Digital Library

[4]

G. K. Zipf. 1949. Human Behavior and the Principle of Least Effort: Ed Addison Wesley Publishing, 1949.

[5]

H. Luhn. 1958. The automatic establishment of literature abstracts: IBM Journal of Research and Development, 2 (2): 159--165 and 317, April 1958.

Digital Library

[6]

E. Fox. 1983. Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types: PhD thesis, Cornell University, University Microfilms, Ann Arbor, Michagan, 1983.

Digital Library

[7]

M.A. 1960. On relevance, probabilistic indexing and informationretrieval: Journal of the Association for Computing Machinery, 7: p. 216--244, 1960.

Digital Library

[8]

S. P. Ruba Rani, B. Ramesh, M. Anusha and Dr. J.G.R. Sathiaseelan, Evaluation of Stemming Techniques for Text Classification: International Journal of Computer Science and Mobile Computing.

[9]

A. G. Jivani. 2011. A Comparative Study of Stemming Algorithms: Int. J. Comp. Tech. Appl, 2011

[10]

J. B. Lovins. Development of a Stemming Algorithm: [Mechanical Translation and Computational Linguistics, vol.11, nos.1 and 2, March and June 1968].

[11]

A. Handojo. Document Searching Engine Using Term SimilarityVector Space Model on English and Indonesian Document: Springer.

[12]

N. Aharrane, K. El moutaouakil, and K. Satori. 2015. A comparison of supervised classification methods for a statistical set of features:Application: IEEE Amazigh OCR. In Intelligent Systems and Computer Vision (ISCV), pp. 1--8, March 2015.

[13]

L. Breiman. Random Forests: Springer.

[14]

A. Mccallum, K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification: 1998 [In: AAAI-98 Workshop on Learning for Text Categorization].

[15]

B. Uffe Kjaerulff and L. Anders Madsen. Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis: Springer.

[16]

N. Landwehr, M. Hall and E. Frank. Logistic model trees: Springer.

[17]

D. Greene and P. Cunningham. 2006. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering: Proc. ICML 2006.

Digital Library

[18]

A. Mccallum, K. Nigam. 1998. A Comparison of Event Models for Naive Bayes Text Classification: 1998 [In: AAAI-98 Workshop on Learning for Text Categorization].

[19]

S. E. Robertson and S. Walker. On relevance weights with little relevance information: ACM Press, [In 20th annual international ACM SIGIR Conference on Research and development in information retrieval, pages 16--24.].

[20]

G. a. B. C. Salton. Term-weighting approaches in automatic text retrieval: Information Processing & Management (IPM) 1988.

[21]

M. Sokolova, N. Japkowicz and S. Szpakowicz. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation: Lecture Notes in Computer Science, Vol. 4304, 2006, pp. 1015--1021

Digital Library

[22]

A. Dahmouni, K. El Moutaouakil, and K. Satori. Robust Face Recognition Using Local Gradient Probabilistic Pattern (LGPP): Springer International Publishing.

[23]

K. Haddouch, A. El Allaoui, A. Messaoudi, and K. El Moutaouakil. 2015. Clustering Problem with 0--1 Quadratic Programming: In Proceedings of the Mediterranean Conference on Information & Communication Technologies 2015(pp. 111--120). Springer International Publishing.

Cited By

Khosa SRustam FMehmood AChoi GAshraf I(2023)Incorporating Word Embedding and Hybrid Model Random Forest Softmax Regression for Predicting News CategoriesMultimedia Tools and Applications10.1007/s11042-023-16491-783:11(31279-31295)Online publication date: 15-Sep-2023
https://doi.org/10.1007/s11042-023-16491-7
Bounabi MMoutaouakil KSatori K(2020)Neural Embedding & Hybrid ML Models for Text Classification2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)10.1109/IRASET48871.2020.9092230(1-6)Online publication date: Apr-2020
https://doi.org/10.1109/IRASET48871.2020.9092230
Bounabi MMoutaouakil KSatori K(2020)The Automatic option of inference rules for the fuzzy TF-IDF2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS)10.1109/ICECOCS50124.2020.9314404(1-6)Online publication date: 2-Dec-2020
https://doi.org/10.1109/ICECOCS50124.2020.9314404
Show More Cited By

Index Terms

A comparison of Text Classification methods Method of weighted terms selected by different Stemming Techniques
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
  2. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

A comparison of text classification methods using different stemming techniques

In the retrieval of information, two factors have an important impact on the performance of systems: the extract features and the matching process. In this work, we compare three well-known stemming techniques: Lovins stemmer, iterated Lovins and snowball ...
Light stemming approaches for the French, Portuguese, German and Hungarian languages
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

This paper describes and evaluates various general stemming approaches for the French, Portuguese (Brazilian), German and Hungarian languages. Based on the CLEF test-collections, we demonstrate that light stemmers for the French, Portuguese and ...
Using text classification method in relevance feedback
ACIIDS'10: Proceedings of the Second international conference on Intelligent information and database systems: Part II

In modern Information Retrieval, traditional relevance feedback techniques, which utilize the terms in the relevant documents to enrich the user's initial query, is an effective method to improve retrieval performance. In this paper, we re-examine this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications

March 2017

685 pages

ISBN:9781450348522

DOI:10.1145/3090354

Conference Chairs:
Mohamed Lazaar
ENSA, Tetuan, Morocco
,
Youness Tabii
ENSA, Tetuan, Morocco
,
Mohamed Chrayah
ENSA, Tetuan - Morocco
,
Mohammed Al Achhab
ENSA, Tetuan, Morocco

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Ministère de I'enseignement supérieur: Ministère de I'enseignement supérieur

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

BDCA'17

BDCA'17: 2nd international Conference on Big Data, Cloud and Applications

March 29 - 30, 2017

Tetouan, Morocco

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
119
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Khosa SRustam FMehmood AChoi GAshraf I(2023)Incorporating Word Embedding and Hybrid Model Random Forest Softmax Regression for Predicting News CategoriesMultimedia Tools and Applications10.1007/s11042-023-16491-783:11(31279-31295)Online publication date: 15-Sep-2023
https://doi.org/10.1007/s11042-023-16491-7
Bounabi MMoutaouakil KSatori K(2020)Neural Embedding & Hybrid ML Models for Text Classification2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)10.1109/IRASET48871.2020.9092230(1-6)Online publication date: Apr-2020
https://doi.org/10.1109/IRASET48871.2020.9092230
Bounabi MMoutaouakil KSatori K(2020)The Automatic option of inference rules for the fuzzy TF-IDF2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS)10.1109/ICECOCS50124.2020.9314404(1-6)Online publication date: 2-Dec-2020
https://doi.org/10.1109/ICECOCS50124.2020.9314404
Khaldi REl Afia AChiheb RLazaar MDuvallet CAl Achhab MMahboub OSilkan H(2019)Impact of Multistep Forecasting Strategies on Recurrent Neural Networks Performance for Short and Long HorizonsProceedings of the 4th International Conference on Big Data and Internet of Things10.1145/3372938.3372979(1-8)Online publication date: 23-Oct-2019
https://dl.acm.org/doi/10.1145/3372938.3372979
Bounabi MEl Moutaouakil KSatori KLazaar MDuvallet CAl Achhab MMahboub OSilkan H(2019)Text classification using Fuzzy TF-IDF and Machine Learning ModelsProceedings of the 4th International Conference on Big Data and Internet of Things10.1145/3372938.3372956(1-6)Online publication date: 23-Oct-2019
https://dl.acm.org/doi/10.1145/3372938.3372956

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten