research-article

Applying GA-SVM for Optimizing Statistical and Semantic Features in Document Classification

Authors:
Upasana Pandey

Computer Sciences & Engineering, IMS Engineering College, Ghaziabad, India

Computer Sciences & Engineering, IMS Engineering College, Ghaziabad, India
View Profile

,
Geeta Rani

Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India

Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India
View Profile

,
Vijaypal Singh Dhaka

Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India

Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India
View Profile

DSMLAI '21': Proceedings of the International Conference on Data Science, Machine Learning and Artificial IntelligenceAugust 2021Pages 253–260https://doi.org/10.1145/3484824.3484912

Published:13 January 2022Publication History

DSMLAI '21': Proceedings of the International Conference on Data Science, Machine Learning and Artificial Intelligence

Pages 253–260

ABSTRACT

The objective of this research is to develop a hybrid model for optimizing the performance of text classification techniques. The authors applied the Genetic Algorithm and Multi-Class Support Vector Machine on the publicly available datasets viz. 20 Newsgroup corpus, and the Reuters 21,578 corpus. They also used their handcrafted 'Creative corpus' prepared by collecting news articles from the Times of India news portal. They evaluated the performance of their model on large as well as small corpora. They employed the Genetic Algorithm that dynamically decides the weights of the contextual features to achieve the highest classification accuracy. The model achieves the highest accuracy of 100 % on small datasets of Reuters 21,578 and Creative corpus. The authors also presented a comparative analysis of the statistical and context-based approaches applied for the text classification. Based on the experimental results they proved that statistical approaches are better for text classification in the case of small-sized documents. Whereas the context-based approaches are efficient in the classification of huge documents enriched with text. This showed the importance of the hybrid approach. The hybrid approach taps the power of ontological databases and can adapt to varying corpora flawlessly. Thus, it makes effective use of textual data available in reports for crime detection, crime classification, and disease diagnosis, etc.

References

Upasana, S. Chakraverty. 2011. A Review of Text Classification Approaches for E-mail Management, International Journal of Computer Theory and Engineering, Vol.3, No. 2, Pages 137--144.Google Scholar
Giovanni Angelini, Marco Gori, Leonardo Rigutini, Franco Scarselli Marco Ernandes. 2007. An Adaptive Context based algorithm for Term Weighting. In Proceeding of 20th International Joint Conference on Artifical intelligence, San Francisco, USA, 2748--2753. DOI: https://dl.acm.org/doi/10.5555/1625275.1625717Google Scholar
Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2008. TFIDF, LSI and Multi-word in Information Retrivel and Text Categorization. In Proceeding of IEEE International Conference on System, Man, Cybernetics (SMC 2008), 108--113. DOI.10.1.1.458.587Google Scholar
Jin Li and Wei Yi Liu Kun Yue. 2008. An adaptive Markov Model for Text Categorization. In Proceeding of 3rd International Conference on Intelligent System and Knowledge Engineering, 802--807. DOI: 10.1109/ISKE.2008.4731039Google ScholarCross Ref
Silky Arora and Shampa Chakraverty. 2011. A Parallel Approach to Context-based Term Weighting. In Proceeding of World Congress on Information and Communication Technologies. 951--956. DOI:10.1109/WICT.2011.6141376Google ScholarCross Ref
S. M. Khalessizadeh, R. Zaefarian, and S. H. Nasseri, and E. Ardil. 2006. Genetic Mining: Genetic Algorithm for topic based on concept distribution. In Proceeding of World Academy of Science, Engineering and Technology. 144--147.Google Scholar
David E. Goldberg. 2001. Genetic Algorithm, 4th ed. Delhi, India: Pearson Education.Google Scholar
U. Pandey. 2016. A Framework for Collaborative Document Classification with GA-SVM. International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET), Vol. 2, Issue 6, 104--114.Google Scholar
M. Thangaraj, M. Sivakami. 2018. Text Classification Techniques: A Literature Review. Interdisciplinary Journal of Information, Knowledge, Management, Vol 13, 117--135.DOI: https://doi.org/10.28945/4066.Google ScholarCross Ref
Berna Altınel, Murat Can Ganiz. 2018. Semantic text classification: A survey of past and recent advances. Information Processing and Management, 54, 1129--1153. DOI:10.1016/j.ipm.2018.08.001Google ScholarCross Ref
Shadi Diab, Nasim Kamal. 2019. Optimizing Support Vector Machine Classification Based on Semantic-Text Knowledge Enrichment. Palestinian Journal of Technology & Applied Sciences, No. 2. DOI:10.5281/zenodo.2582946Google ScholarCross Ref
José R. Méndez, Tomás R. Cotos-Yañez, David Ruano-Ordás. 2019. A new semantic-based feature selection method for spam filtering. Applied Soft Computing Journal, 89--104. DOI:http://hdl.handle.net/11093/1149Google Scholar
Bla_z_Skrlj et al. 2020. tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification. Computer Speech & Language. Vol 65, DOI:https://doi.org/10.1016/j.csl.2020.101104Google ScholarCross Ref
Air Cervantes, Farid Garcia-Lamont, Lisbeth Rodríguez-Mazahua, Asdrubal Lopez. 2020. A comprehensive survey on support vector machine classification: Applications, challenges and trends, Neurocomputing, Volume 408, 2020, 189--215. ISSN 0925--2312. DOI: https://doi.org/10.1016/j.neucom.2019.10.118.Google ScholarCross Ref
20 News Group, http://qwone.com/~jason/20Newsgroups/Google Scholar
Reuters 21578 data collection. [Online]. http://www.daviddlewis.com/resources/testcollections/reuters21578 Times of India. [Online]. http://timesofindia.indiatimes.com/topic/Google Scholar
L. Barak, Ido Dagan & Eyal Shnarch. 2009. Text categorization from category name via lexical reference. In Proceeding of Human Language Technoligies, NAACL HLT, 33--36. DOI: DOI:10.3115/1620853.1620864Google ScholarCross Ref
Dinakar Jayarajan. 2008. Lexical Chains as Document Feature. In Proceeding of 3rd International Joint Conference on Natural Language Processing, Vol 1, Hyderabad, IndiaGoogle Scholar

Index Terms

Applying GA-SVM for Optimizing Statistical and Semantic Features in Document Classification
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning

Recommendations

Chinese Question Classification Based on Semantic Gram and SVM
IFCSTA '09: Proceedings of the 2009 International Forum on Computer Science-Technology and Applications - Volume 01

Question classification plays a crucial important role in the question answering system. Recent research on question classification for open-domain mostly concentrates on using machine learning methods to resolve the special kind of text classification. ...
Read More
Adaboost with SVM-based classifier for the classification of brain motor imagery tasks
UAHCI'11: Proceedings of the 6th international conference on Universal access in human-computer interaction: users diversity - Volume Part II

The Adaboost with SVM-based component classifier is generally considered to break the Boosting principle for the difficulty in training of SVM and have imbalance between the diversity and accuracy over basic SVM classifiers. The Adaboost classifier in ...
Read More
A Tree-Based Multi-class SVM Classifier for Digital Library Document
MMIT '08: Proceedings of the 2008 International Conference on MultiMedia and Information Technology

In this paper, we present a new method of using Support Vector Machine (SVM) for multiclass classification. In our method, we use a tree based SVM classifier for classification. Compared with the other SVM multi-class classification methods in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DSMLAI '21': Proceedings of the International Conference on Data Science, Machine Learning and Artificial Intelligence
August 2021
415 pages
ISBN:9781450387637
DOI:10.1145/3484824
Editors:
Dharm Singh Jat
Namibia University of Science and Technology
,
Colin Stanley
Namibia University of Science and Technology
,
José Quenum
Namibia University of Science and Technology
,
Nilanjan Dey
JIS University, Kolkata
,
Arpit Jain
Namibia University of Science and Technology
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 January 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Genetic algorithm
Hybrid
Machine Learning
Natural Language Processing
SVM
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 26
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Applying GA-SVM for Optimizing Statistical and Semantic Features in Document Classification

DSMLAI '21': Proceedings of the International Conference on Data Science, Machine Learning and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Chinese Question Classification Based on Semantic Gram and SVM

Adaboost with SVM-based classifier for the classification of brain motor imagery tasks

A Tree-Based Multi-class SVM Classifier for Digital Library Document

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Applying GA-SVM for Optimizing Statistical and Semantic Features in Document Classification

DSMLAI '21': Proceedings of the International Conference on Data Science, Machine Learning and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Chinese Question Classification Based on Semantic Gram and SVM

Adaboost with SVM-based classifier for the classification of brain motor imagery tasks

A Tree-Based Multi-class SVM Classifier for Digital Library Document

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media