skip to main content
10.1145/3090354.3090398acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdcaConference Proceedingsconference-collections
research-article

A comparison of Text Classification methods Method of weighted terms selected by different Stemming Techniques

Published: 29 March 2017 Publication History

Abstract

In the retrieval information, three factors have an important impact on the systems performance: the stemmer algorithm, the extract feature method and the classification tool. In this work, we compare three well-known stemming Techniques: Lovins stemmer, iterated Lovins and snowball Stemmer. Concerning the classification phase, we compare, experimentally, five methods: BNET, NBMU, RF, SLogicF, and SVM. Basing on these latter, we propose a new retrieval system by calling the vote method to improve the performance of the classical systems. In this paper, we use the TFIDF algorithm to extract features. The envisaged systems are testing on two databases: BBCNEWS and BBCSPORT. The systems based on Lovins stemmers and on the voting technique give the best results. In fact, for the first databases, the best accuracy observed is for the system Lovins +Vote with a recognition rate about 97%. Concerning the second database, the system snowball +Vote that gives us 99% as recognition rate.

References

[1]
L. Soulier. 2014. Définition et évaluation de modèles de recherche d'information, collaborative basés sur les compétences de domaine et les rôles des utilisateurs: Université de Toulouse, Toulouse, 2014.
[2]
F. Damak. 2014. Etude Des Facteurs de Pertinence dans La Recherche de Microblogs: Université Paul Sabatier, 2014
[3]
A. W. C. S. Y. G. Salton. 1975. A vector space model for automatic indexing: Communications of the ACM, v.18 n.11, p.613--620, 1975.
[4]
G. K. Zipf. 1949. Human Behavior and the Principle of Least Effort: Ed Addison Wesley Publishing, 1949.
[5]
H. Luhn. 1958. The automatic establishment of literature abstracts: IBM Journal of Research and Development, 2 (2): 159--165 and 317, April 1958.
[6]
E. Fox. 1983. Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types: PhD thesis, Cornell University, University Microfilms, Ann Arbor, Michagan, 1983.
[7]
M.A. 1960. On relevance, probabilistic indexing and informationretrieval: Journal of the Association for Computing Machinery, 7: p. 216--244, 1960.
[8]
S. P. Ruba Rani, B. Ramesh, M. Anusha and Dr. J.G.R. Sathiaseelan, Evaluation of Stemming Techniques for Text Classification: International Journal of Computer Science and Mobile Computing.
[9]
A. G. Jivani. 2011. A Comparative Study of Stemming Algorithms: Int. J. Comp. Tech. Appl, 2011
[10]
J. B. Lovins. Development of a Stemming Algorithm: [Mechanical Translation and Computational Linguistics, vol.11, nos.1 and 2, March and June 1968].
[11]
A. Handojo. Document Searching Engine Using Term SimilarityVector Space Model on English and Indonesian Document: Springer.
[12]
N. Aharrane, K. El moutaouakil, and K. Satori. 2015. A comparison of supervised classification methods for a statistical set of features:Application: IEEE Amazigh OCR. In Intelligent Systems and Computer Vision (ISCV), pp. 1--8, March 2015.
[13]
L. Breiman. Random Forests: Springer.
[14]
A. Mccallum, K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification: 1998 [In: AAAI-98 Workshop on Learning for Text Categorization].
[15]
B. Uffe Kjaerulff and L. Anders Madsen. Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis: Springer.
[16]
N. Landwehr, M. Hall and E. Frank. Logistic model trees: Springer.
[17]
D. Greene and P. Cunningham. 2006. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering: Proc. ICML 2006.
[18]
A. Mccallum, K. Nigam. 1998. A Comparison of Event Models for Naive Bayes Text Classification: 1998 [In: AAAI-98 Workshop on Learning for Text Categorization].
[19]
S. E. Robertson and S. Walker. On relevance weights with little relevance information: ACM Press, [In 20th annual international ACM SIGIR Conference on Research and development in information retrieval, pages 16--24.].
[20]
G. a. B. C. Salton. Term-weighting approaches in automatic text retrieval: Information Processing & Management (IPM) 1988.
[21]
M. Sokolova, N. Japkowicz and S. Szpakowicz. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation: Lecture Notes in Computer Science, Vol. 4304, 2006, pp. 1015--1021
[22]
A. Dahmouni, K. El Moutaouakil, and K. Satori. Robust Face Recognition Using Local Gradient Probabilistic Pattern (LGPP): Springer International Publishing.
[23]
K. Haddouch, A. El Allaoui, A. Messaoudi, and K. El Moutaouakil. 2015. Clustering Problem with 0--1 Quadratic Programming: In Proceedings of the Mediterranean Conference on Information & Communication Technologies 2015(pp. 111--120). Springer International Publishing.

Cited By

View all
  • (2023)Incorporating Word Embedding and Hybrid Model Random Forest Softmax Regression for Predicting News CategoriesMultimedia Tools and Applications10.1007/s11042-023-16491-783:11(31279-31295)Online publication date: 15-Sep-2023
  • (2020)Neural Embedding & Hybrid ML Models for Text Classification2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)10.1109/IRASET48871.2020.9092230(1-6)Online publication date: Apr-2020
  • (2020)The Automatic option of inference rules for the fuzzy TF-IDF2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS)10.1109/ICECOCS50124.2020.9314404(1-6)Online publication date: 2-Dec-2020
  • Show More Cited By

Index Terms

  1. A comparison of Text Classification methods Method of weighted terms selected by different Stemming Techniques
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image ACM Other conferences
              BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications
              March 2017
              685 pages
              ISBN:9781450348522
              DOI:10.1145/3090354
              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              In-Cooperation

              • Ministère de I'enseignement supérieur: Ministère de I'enseignement supérieur

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              Published: 29 March 2017

              Permissions

              Request permissions for this article.

              Check for updates

              Author Tags

              1. Classification
              2. NB
              3. NBMU
              4. RF
              5. SLogiF
              6. SVM
              7. Stemmer
              8. term-weighting

              Qualifiers

              • Research-article
              • Research
              • Refereed limited

              Conference

              BDCA'17

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • Downloads (Last 12 months)6
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 27 Jan 2025

              Other Metrics

              Citations

              Cited By

              View all
              • (2023)Incorporating Word Embedding and Hybrid Model Random Forest Softmax Regression for Predicting News CategoriesMultimedia Tools and Applications10.1007/s11042-023-16491-783:11(31279-31295)Online publication date: 15-Sep-2023
              • (2020)Neural Embedding & Hybrid ML Models for Text Classification2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)10.1109/IRASET48871.2020.9092230(1-6)Online publication date: Apr-2020
              • (2020)The Automatic option of inference rules for the fuzzy TF-IDF2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS)10.1109/ICECOCS50124.2020.9314404(1-6)Online publication date: 2-Dec-2020
              • (2019)Impact of Multistep Forecasting Strategies on Recurrent Neural Networks Performance for Short and Long HorizonsProceedings of the 4th International Conference on Big Data and Internet of Things10.1145/3372938.3372979(1-8)Online publication date: 23-Oct-2019
              • (2019)Text classification using Fuzzy TF-IDF and Machine Learning ModelsProceedings of the 4th International Conference on Big Data and Internet of Things10.1145/3372938.3372956(1-6)Online publication date: 23-Oct-2019

              View Options

              Login options

              View options

              PDF

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              Figures

              Tables

              Media

              Share

              Share

              Share this Publication link

              Share on social media