skip to main content
short-paper

Linguistic-Relationships-Based Approach for Improving Word Alignment

Authors Info & Claims
Published:14 October 2017Publication History
Skip Abstract Section

Abstract

The unsupervised word alignments (such as GIZA++) are widely used in the phrase-based statistical machine translation. The quality of the model is proportional to the size and the quality of the bilingual corpus. However, for low-resource language pairs such as Chinese and Vietnamese, a result of unsupervised word alignment sometimes is of low quality due to the sparse data. In addition, this model does not take advantage of the linguistic relationships to improve performance of word alignment. Chinese and Vietnamese have the same language type and have close linguistic relationships. In this article, we integrate the characteristics of linguistic relationships into the word alignment model to enhance the quality of Chinese-Vietnamese word alignment. These linguistic relationships are Sino-Vietnamese and content word. The experimental results showed that our method improved the performance of word alignment as well as the quality of machine translation.

References

  1. Peter F. Brown, Vincent Della J. Pietra, Stephen, Andrew Della Pietra, and Robert Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 1993 (1993), 263--311.Google ScholarGoogle Scholar
  2. Chenhui Chu, Toshiaki Nakazawa, and Sadao Kurohashi. 2011. Japanese-Chinese phrase alignment using common Chinese characters information. In Proceedings of the Conference on Machine Translation Summit XIII (MT Summit XIII’11). 475--482.Google ScholarGoogle Scholar
  3. Raj Dabre, Fabien Cromieres, Sadao Kurohashi, and Pushpak Bhattacharyya. 2015. Leveraging small multilingual corpora for SMT using many pivot languages. In Human Language Technologies: Proceedings of the 2015 Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL’15). 1192--1202. Google ScholarGoogle ScholarCross RefCross Ref
  4. Dien Dinh and Thuy Vu. 2006. A maximum entropy approach for Vietnamese word segmentation. In In Proceedings of the 4th International Conference on Computing 8 Communication Technologies, Research, Innovation, and Vision for the Future (RIVF’06). 12--16. Google ScholarGoogle ScholarCross RefCross Ref
  5. Nadir Durrani and Philipp Koehn. 2014. Improving machine translation via triangulation and transliteration. In Proceedings of 17th Annual Conference of the European Association for Machine Translation. 71--78.Google ScholarGoogle Scholar
  6. Kuzman Ganchev, Joao V. Graca, and Ben Tasker. 2008. Better alignments = better translations?. In Human Language Technologies: Proceedings of the 2008 Annual Conference of the North American Chapter of the Association for Computational Linguistics ACL (ACL’08). 986--993.Google ScholarGoogle Scholar
  7. Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, Cambridge.Google ScholarGoogle Scholar
  8. Dinh Khan Le. 2002. Vietnamese Vocabulary having Chinese Origin. National University of HCMC Press, HCMC. [in Vietnamese]Google ScholarGoogle Scholar
  9. Tomer Levinboim and David Chiang. 2015. Multi-task word alignment triangulation for low-resource languages. In Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL’15). 1221--1226. Google ScholarGoogle ScholarCross RefCross Ref
  10. Preslav Nakov and Hwee Tou Ng. 2009. Improved statistical machine translation for resource-poor languages using related resource-rich languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 1358--1367. Google ScholarGoogle ScholarCross RefCross Ref
  11. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. In Computational Linguistics (29(1)). 19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yoon Mi Oh, Francois Pellegrino, Egidio Marsico, and Christophe Coupe. 2013. A quantitative and typological approach to correlating linguistic complexity. In Proceedings of the 5th Conference on Quantitative Investigations in Theoretical Linguistics.Google ScholarGoogle Scholar
  13. Santanu Pal, Sudip Kumar Naskar, and Sivaji Bandyopadhyay. 2013. A hybrid word alignment model for phrase-based statistical machine translation. In Proceedings of the 2nd Workshop on Hybrid Approaches to Translation. 94--101.Google ScholarGoogle Scholar
  14. Theeravat Songyot and David Chiang. 2014. Improving word alignment using word similarity. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1840--1845. Google ScholarGoogle ScholarCross RefCross Ref
  15. Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th International Conference on Computational Linguistics (COLING’96). 836--841.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xiaolin Wang, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2014. Refining word segmentation using a manually aligned corpus for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1654--1664. Google ScholarGoogle ScholarCross RefCross Ref
  17. Hua Wu and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 856--863. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Linguistic-Relationships-Based Approach for Improving Word Alignment

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 1
      March 2018
      152 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3141228
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 October 2017
      • Accepted: 1 August 2017
      • Revised: 1 April 2017
      • Received: 1 October 2016
      Published in tallip Volume 17, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader