skip to main content
research-article

Natural Language Processing System for Text Classification Corpus Based on Machine Learning

Published: 08 August 2024 Publication History

Abstract

A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in five levels. Time Frame Return Frequency TextRank text classification method based on key content extraction and text classification model based on Convolutional Neural Network and Bidirectional Encoder Representations from Transforms models were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.

References

[1]
B. A. Xavier and P. H. Chen. 2022. Natural language processing for imaging protocol assignment: Machine learning for multiclass classification of abdominal CT protocols using indication text data. J. Dig. Imag. 58, 7 (2022), 69–74.
[2]
A. Ieracitano Cosimo, M. Paviglianiti, A. Campolo, E. Hussain, F. Pasero et al. 2021. A novel automatic classification system based on hybrid unsupervised and supervised machine learning for electrospun nanofibers. IEEE/CAA J. Automatica Sinica 8, 01 (2021), 68–80.
[3]
B. Guhan, S. Sowmiya, U. Snekhalatha, and T. Rajalakshmi. 2021. Automated segmentation of heel fissures based on thermal image processing and classification based on machine learning algorithms. Biomed. Eng.: Appl. Basis Commun. 36, 7 (2021), 96–102.
[4]
Z. Hamid and H. K. Khafaji. 2021. A general algorithm of association rule-based machine learning dedicated for text classification. J. Phys. Conf. Ser. 1773, 1 (2021), 012011.
[5]
Pilar López-Úbeda, Manuel Carlos Díaz-Galiano, Teodoro Martín-Noguerol, Antonio Luna, L. Alfonso Urea-López, and M. Teresa Martín-Valdivia. 2021. Automatic medical protocol classification using machine learning approaches. Comput. Methods Programs Biomed. 200, 9 (2021), 15–16.
[6]
H. Faris, M. Habib, M. Faris, A. Alomari, P. A. Castillo, and M. Alomari. 2022. Classification of Arabic healthcare questions based on word embeddings learned from massive consultations: A deep learning approach. J. Ambient Intell. Human. Comput. 85, 4 (2022), 13.
[7]
A. Occhipinti, L. Rogers, and C. Angione. 2022. A pipeline and comparative study of 12 machine learning models for text classification. Cornell University 123, 7 (2022), 56–59.
[8]
T. O. B. Odden, A. Marin, and J. L. Rudolph. 2021. How has science education changed over the last 100 years? An analysis using natural language processing. Sci. Edu. 854, 6 (2021), 65–68.
[9]
N. Rajkumar, T. S. Subashini, K. Rajan, and V. Ramalingam. 2021. An efficient feature extraction with bidirectional long short-term memory based deep learning model for Tamil document classification. J. Comput. Theoret. Nanosci. 874, 3 (2021), 18.
[10]
G. Song. 2021. Sentiment analysis of Japanese text and vocabulary learning based on natural language processing and SVM. J. Ambient Intell. Human. Comput. 45, 5 (2021), 75–78.
[11]
H. Faris, M. Habib, M. Faris, A. Alomari, and M. Alomari. 2021. Classification of Arabic healthcare questions based on word embeddings learned from massive consultations: A deep learning approach. J. Ambient Intell. Human. Comput. 65, 2 (2021), 35–39.
[12]
K. Gasmi. 2022. Medical text classification based on an optimized machine learning and external semantic resource. J. Circ. Syst. Comput. 847, 52 (2022), 125–129.
[13]
Guberney Muetón-Santa, D. Escobar-Grisales, Felipe Orlando López-Pabón, Paula Andrea Pérez-Toro, and J. R. Orozco-Arroyave. 2022. Classification of poverty condition using natural language processing. Soc. Indicat. Res. 162, 3 (2022), 1413–1435.
[14]
W. Cherif, A. Madani, and M. Kissi. 2021. Text categorization based on a new classification by thresholds. Progr. Artific. Intell. 452, 7 (2021), 1–15.
[15]
R. B. Penfold, D. S. Carrell, D. J. Cronkite, C. Pabiniak, T. Dodd, A. M. Glass et al. 2022. Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening. BMC Med. Info. Decis. Mak. 22, 1 (2022), 1–13.
[16]
A. Alexis, L. Kyubum, C. Qingyu, L. Ling, and L. Zhiyong. 2021. Litsuggest: A web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res. 96, 74, 88–92.
[17]
E. Hagberg, D. Hagerman, R. Johansson, N. Hosseini, J. Liu, E. Bjrnsson et al. 2022. Semi-supervised learning with natural language processing for right ventricle classification in echocardiography—A scalable approach. Comput. Biol. Med. 143, 4 (2022), 105282.
[18]
A. Mariyam, S. A. H. Basha, and S. V. Raju. 2021. A literature survey on recurrent attention learning for text classification. IOP Conf. Ser.: Mater. Sci. Eng. 1042, 1 (2021), 012030.
[19]
S. Iqbal, S. U. Hassan, N. R. Aljohani, S. Alelyani, R. Nawaz, and L. Bornmann. 2021. A decade of in-text citation analysis based on natural language processing and machine learning techniques: An overview of empirical studies. Scientometrics 126, 3 (2021), 666–668.

Index Terms

  1. Natural Language Processing System for Text Classification Corpus Based on Machine Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 8
    August 2024
    343 pages
    EISSN:2375-4702
    DOI:10.1145/3613611
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 August 2024
    Online AM: 19 February 2024
    Accepted: 31 January 2024
    Revised: 24 December 2023
    Received: 30 October 2023
    Published in TALLIP Volume 23, Issue 8

    Check for updates

    Author Tags

    1. Safety social engineering
    2. air traffic control system
    3. hazard sources
    4. HFACS model
    5. TFIDF TextRank method
    6. SVM optimization

    Qualifiers

    • Research-article

    Funding Sources

    • Study on Application Limits and ethical Risk Assessment of rural Artificial Intelligence Education
    • Fujian Province Education Science “14th Five-year Plan” 2022 annual special

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 296
      Total Downloads
    • Downloads (Last 12 months)236
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media