skip to main content
research-article

Language Model Method for Collocation Rules of Parts of Speech in Machine Translation System

Published: 07 August 2024 Publication History

Abstract

With the development of the times, modern society has now entered the Internet of Things (IoT) information age and Machine Translation (MT) plays an important role in increasingly frequent cross-language communication. In recent years, China's artificial intelligence industry has been in a stage of rapid construction, and the scale of its core industries has grown explosively, and a large number of artificial intelligence companies, including issuers, have emerged. Part of speech has always been a major problem in MT. One of the reasons is that there are a large number of multi-category words in Chinese and a large number of polysemy words in English, so part of speech collocation problems account for a large proportion of MT errors, which to some extent affects the credibility and accuracy of the translation. To reduce the error problem in MT of part of speech collocation, this paper used Machine Learning (ML) methods to study the Language Model (LM) of part of speech collocation based on recurrent neural network (NN) and compared it with the traditional statistical LM. In terms of the accuracy rate of the two LMS in the automatic evaluation index of machine translation, the experimental results show that the recursive NN LM established by the ML method had an accuracy rate of 80.42% and 83.57%, respectively, for the part-of-speech matching rules of the IoT machine translation system in the dialogue between Chinese and English and the translation of articles. The accuracy of traditional statistical LM evaluation was 71.29% and 69.52%, respectively. Compared to traditional statistical LM, the accuracy of translation was higher. This showed that the recurrent NN LM reduced the number of errors in the collocation of parts of speech in MT and improved the accuracy and credibility of MT.

References

[1]
Y. Xia. 2020. Research on statistical machine translation model based on the deep neural network. Computing 102, 3 (2020), 643–661.
[2]
S. M. Lee. 2020. The impact of using machine translation on EFL students’ writing. Computer Assisted Language Learning 33, 3 (2020), 157–175.
[3]
P. H. Avelar and L. C. Lamb. 2020. Assessing gender bias in machine translation: A case study with Google Translate. Neural Computing & Applications 32, 10 (2020), 6363–6381.
[4]
I. Rivera-Trigueros. 2022. Machine translation systems and quality assessment: A systematic review. Language Resources and Evaluation 56, 2 (2022), 593–619.
[5]
R. Dabre, C. Chu, and A. Kunchukuttan. 2020. A survey of multilingual neural machine translation. ACM Computing Surveys (CSUR) 53, 5 (2020), 1–38.
[6]
B. Haddow, R. Bawden, A. V. M. Barone, J. Helcl, and A. Birch. 2022. Survey of low-resource machine translation. Computational Linguistics 48, 3 (2022), 673–732.
[7]
J. Su, J. Zeng, and D. Xiong. 2018. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 3 (2018), 623–632.
[8]
A. Fan, S. Bhosale, and H. Schwenk. 2021. Beyond English-centric multilingual machine translation. The Journal of Machine Learning Research 22, 1 (2021), 4839–4886.
[9]
S. A. B. Andrabi. 2021. A review of machine translation for South Asian low resource languages. Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, 5 (2021), 1134–1147.
[10]
T. Khanna, J. N. Washington, and F. M. Tyers. 2021. Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages. Machine Translation 35, 4 (2021), 475–502.
[11]
Deepak Kumar Jain, Prasanthi Boyapati, J. Venkatesh, and M. Prakash. 2022. An intelligent cognitive-inspired computing with big data analytics framework for sentiment analysis and classification. Inf. Process. Manag. 59, 1 (2022), 102758.
[12]
O. S. Bartan. 2019. Lexical collocation errors in literary translation. Dil Dergisi 170, 1 (2019), 71–86.
[13]
F. Mezzoudj and A. Benyettou. 2018. An empirical study of statistical language models: N-gram language models vs. neural network language models. International Journal of Innovative Computing and Applications 9, 4 (2018), 189–202.
[14]
G. Louppe, K. Cho, and C. Becot. 2019. QCD-aware recursive neural networks for jet physics. Journal of High Energy Physics 2019, 1 (2019), 1–23.
[15]
F. Klubicka, A. Toral, and V. M. Sanchez-Cartagena. 2018. Quantitative fine-grained human evaluation of machine translation systems: A case study on English to Croatian. Machine Translation 32, 3 (2018), 195–215.
[16]
Pavan D. Paikrao, Amrit Mukherjee, Deepak Kumar Jain, Pushpita Chatterjee, and Waleed S. Alnumay. 2023. Smart emotion recognition framework: A secured IoVT perspective. IEEE Consumer Electron. Mag. 12, 1 (2023), 80–86
[17]
E. Reiter. 2018. A structured review of the validity of BLEU. Computational Linguistics 44, 3 (2018), 393–401.
[18]
A. Pathak, P. Pakray, and J. Bentham. 2019. English–Mizo machine translation using neural and statistical approaches. Neural Computing and Applications 31, 11 (2019), 7615–7631.
[19]
B. Zhang, D. Xiong, and J. Su. 2018. Neural machine translation with deep attention. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 1 (2018), 154–163.

Index Terms

  1. Language Model Method for Collocation Rules of Parts of Speech in Machine Translation System

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 8
    August 2024
    343 pages
    EISSN:2375-4702
    DOI:10.1145/3613611
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2024
    Online AM: 21 September 2023
    Accepted: 16 September 2023
    Revised: 07 August 2023
    Received: 14 April 2023
    Published in TALLIP Volume 23, Issue 8

    Check for updates

    Author Tags

    1. Machine translation
    2. part collocation rules
    3. machine learning
    4. language model
    5. neural networks

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 185
      Total Downloads
    • Downloads (Last 12 months)106
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media