research-article

Natural Language Processing System for Text Classification Corpus Based on Machine Learning

Author:

Yawen SuAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 8

Article No.: 123, Pages 1 - 15

https://doi.org/10.1145/3648361

Published: 08 August 2024 Publication History

Abstract

A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in five levels. Time Frame Return Frequency TextRank text classification method based on key content extraction and text classification model based on Convolutional Neural Network and Bidirectional Encoder Representations from Transforms models were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.

References

[1]

B. A. Xavier and P. H. Chen. 2022. Natural language processing for imaging protocol assignment: Machine learning for multiclass classification of abdominal CT protocols using indication text data. J. Dig. Imag. 58, 7 (2022), 69–74.

[2]

A. Ieracitano Cosimo, M. Paviglianiti, A. Campolo, E. Hussain, F. Pasero et al. 2021. A novel automatic classification system based on hybrid unsupervised and supervised machine learning for electrospun nanofibers. IEEE/CAA J. Automatica Sinica 8, 01 (2021), 68–80.

[3]

B. Guhan, S. Sowmiya, U. Snekhalatha, and T. Rajalakshmi. 2021. Automated segmentation of heel fissures based on thermal image processing and classification based on machine learning algorithms. Biomed. Eng.: Appl. Basis Commun. 36, 7 (2021), 96–102.

[4]

Z. Hamid and H. K. Khafaji. 2021. A general algorithm of association rule-based machine learning dedicated for text classification. J. Phys. Conf. Ser. 1773, 1 (2021), 012011.

[5]

Pilar López-Úbeda, Manuel Carlos Díaz-Galiano, Teodoro Martín-Noguerol, Antonio Luna, L. Alfonso Urea-López, and M. Teresa Martín-Valdivia. 2021. Automatic medical protocol classification using machine learning approaches. Comput. Methods Programs Biomed. 200, 9 (2021), 15–16.

[6]

H. Faris, M. Habib, M. Faris, A. Alomari, P. A. Castillo, and M. Alomari. 2022. Classification of Arabic healthcare questions based on word embeddings learned from massive consultations: A deep learning approach. J. Ambient Intell. Human. Comput. 85, 4 (2022), 13.

[7]

A. Occhipinti, L. Rogers, and C. Angione. 2022. A pipeline and comparative study of 12 machine learning models for text classification. Cornell University 123, 7 (2022), 56–59.

[8]

T. O. B. Odden, A. Marin, and J. L. Rudolph. 2021. How has science education changed over the last 100 years? An analysis using natural language processing. Sci. Edu. 854, 6 (2021), 65–68.

[9]

N. Rajkumar, T. S. Subashini, K. Rajan, and V. Ramalingam. 2021. An efficient feature extraction with bidirectional long short-term memory based deep learning model for Tamil document classification. J. Comput. Theoret. Nanosci. 874, 3 (2021), 18.

[10]

G. Song. 2021. Sentiment analysis of Japanese text and vocabulary learning based on natural language processing and SVM. J. Ambient Intell. Human. Comput. 45, 5 (2021), 75–78.

[11]

H. Faris, M. Habib, M. Faris, A. Alomari, and M. Alomari. 2021. Classification of Arabic healthcare questions based on word embeddings learned from massive consultations: A deep learning approach. J. Ambient Intell. Human. Comput. 65, 2 (2021), 35–39.

[12]

K. Gasmi. 2022. Medical text classification based on an optimized machine learning and external semantic resource. J. Circ. Syst. Comput. 847, 52 (2022), 125–129.

[13]

Guberney Muetón-Santa, D. Escobar-Grisales, Felipe Orlando López-Pabón, Paula Andrea Pérez-Toro, and J. R. Orozco-Arroyave. 2022. Classification of poverty condition using natural language processing. Soc. Indicat. Res. 162, 3 (2022), 1413–1435.

[14]

W. Cherif, A. Madani, and M. Kissi. 2021. Text categorization based on a new classification by thresholds. Progr. Artific. Intell. 452, 7 (2021), 1–15.

[15]

R. B. Penfold, D. S. Carrell, D. J. Cronkite, C. Pabiniak, T. Dodd, A. M. Glass et al. 2022. Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening. BMC Med. Info. Decis. Mak. 22, 1 (2022), 1–13.

[16]

A. Alexis, L. Kyubum, C. Qingyu, L. Ling, and L. Zhiyong. 2021. Litsuggest: A web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res. 96, 74, 88–92.

[17]

E. Hagberg, D. Hagerman, R. Johansson, N. Hosseini, J. Liu, E. Bjrnsson et al. 2022. Semi-supervised learning with natural language processing for right ventricle classification in echocardiography—A scalable approach. Comput. Biol. Med. 143, 4 (2022), 105282.

Digital Library

[18]

A. Mariyam, S. A. H. Basha, and S. V. Raju. 2021. A literature survey on recurrent attention learning for text classification. IOP Conf. Ser.: Mater. Sci. Eng. 1042, 1 (2021), 012030.

[19]

S. Iqbal, S. U. Hassan, N. R. Aljohani, S. Alelyani, R. Nawaz, and L. Bornmann. 2021. A decade of in-text citation analysis based on natural language processing and machine learning techniques: An overview of empirical studies. Scientometrics 126, 3 (2021), 666–668.

Index Terms

Natural Language Processing System for Text Classification Corpus Based on Machine Learning
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification

Recommendations

TextCNN-based ensemble learning model for Japanese Text Multi-classification
Abstract
In this paper, we aim at improving Japanese text classification using TextCNN-based ensemble learning model. Specifically, we first construct three different sub-classifiers, combining ALBERT, RoBERTa, DistilBERT with TextCNN, respectively; and ...
Graphical abstract

Display Omitted
Highlights
- Three TextCNN-based sub-classifiers for Japanese text classification are designed.
- A Bagging ensemble learning model is proposed to combine three different subclassifiers for multi-label Japanese text classification.
- A Japanese ...
Fundamental Sentiment Analysis by Natural Language Processing and Machine Learning for Email Classification
APIT '23: Proceedings of the 2023 5th Asia Pacific Information Technology Conference

Due to its ease of use, speed, adaptability, and ability to keep a complete record of correspondence, email is a commonly used and trusted communication medium. The vulnerability of these emails to cyberattacks has increased. This study utilized the ...
Combining Homogeneous Classifiers for Centroid-based Text Classification
ISCC '02: Proceedings of the Seventh International Symposium on Computers and Communications (ISCC'02)

Centroid-based text classification is one of the most popular supervised approaches to classify texts into a set of pre-defined classes. Based on the vector-space model, the performance of this classification particularly depends on the way to weight ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 8

August 2024

343 pages

EISSN:2375-4702

DOI:10.1145/3613611

Editor:
Imed Zitouni
Google, USA
,
Guest Editors:
Deepak Kumar Jain,
Thierry Boumans,
Stefano Berretti

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 August 2024

Online AM: 19 February 2024

Accepted: 31 January 2024

Revised: 24 December 2023

Received: 30 October 2023

Published in TALLIP Volume 23, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Study on Application Limits and ethical Risk Assessment of rural Artificial Intelligence Education
Fujian Province Education Science “14th Five-year Plan” 2022 annual special

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
296
Total Downloads

Downloads (Last 12 months)236
Downloads (Last 6 weeks)13

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents