short-paper

Linguistic-Relationships-Based Approach for Improving Word Alignment

Authors:
Phuoc Tran

Ton Duc Thang University, Ho Chi Minh City, Vietnam

Ton Duc Thang University, Ho Chi Minh City, Vietnam
View Profile

,
Dien Dinh

VNU University of Science, Ho Chi Minh City, Vietnam

VNU University of Science, Ho Chi Minh City, Vietnam
View Profile

,
Tan Le

Universite Du Quebec A Montreal, Montreal, Canada

Universite Du Quebec A Montreal, Montreal, Canada
View Profile

,
Long H. B. Nguyen

VNU University of Science, Ho Chi Minh City, Vietnam

VNU University of Science, Ho Chi Minh City, Vietnam
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 17 Issue 1Article No.: 5pp 1–16https://doi.org/10.1145/3133323

Published:14 October 2017Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

The unsupervised word alignments (such as GIZA++) are widely used in the phrase-based statistical machine translation. The quality of the model is proportional to the size and the quality of the bilingual corpus. However, for low-resource language pairs such as Chinese and Vietnamese, a result of unsupervised word alignment sometimes is of low quality due to the sparse data. In addition, this model does not take advantage of the linguistic relationships to improve performance of word alignment. Chinese and Vietnamese have the same language type and have close linguistic relationships. In this article, we integrate the characteristics of linguistic relationships into the word alignment model to enhance the quality of Chinese-Vietnamese word alignment. These linguistic relationships are Sino-Vietnamese and content word. The experimental results showed that our method improved the performance of word alignment as well as the quality of machine translation.

References

Peter F. Brown, Vincent Della J. Pietra, Stephen, Andrew Della Pietra, and Robert Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 1993 (1993), 263--311.Google Scholar
Chenhui Chu, Toshiaki Nakazawa, and Sadao Kurohashi. 2011. Japanese-Chinese phrase alignment using common Chinese characters information. In Proceedings of the Conference on Machine Translation Summit XIII (MT Summit XIII’11). 475--482.Google Scholar
Raj Dabre, Fabien Cromieres, Sadao Kurohashi, and Pushpak Bhattacharyya. 2015. Leveraging small multilingual corpora for SMT using many pivot languages. In Human Language Technologies: Proceedings of the 2015 Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL’15). 1192--1202. Google ScholarCross Ref
Dien Dinh and Thuy Vu. 2006. A maximum entropy approach for Vietnamese word segmentation. In In Proceedings of the 4th International Conference on Computing 8 Communication Technologies, Research, Innovation, and Vision for the Future (RIVF’06). 12--16. Google ScholarCross Ref
Nadir Durrani and Philipp Koehn. 2014. Improving machine translation via triangulation and transliteration. In Proceedings of 17th Annual Conference of the European Association for Machine Translation. 71--78.Google Scholar
Kuzman Ganchev, Joao V. Graca, and Ben Tasker. 2008. Better alignments = better translations?. In Human Language Technologies: Proceedings of the 2008 Annual Conference of the North American Chapter of the Association for Computational Linguistics ACL (ACL’08). 986--993.Google Scholar
Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, Cambridge.Google Scholar
Dinh Khan Le. 2002. Vietnamese Vocabulary having Chinese Origin. National University of HCMC Press, HCMC. [in Vietnamese]Google Scholar
Tomer Levinboim and David Chiang. 2015. Multi-task word alignment triangulation for low-resource languages. In Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL’15). 1221--1226. Google ScholarCross Ref
Preslav Nakov and Hwee Tou Ng. 2009. Improved statistical machine translation for resource-poor languages using related resource-rich languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 1358--1367. Google ScholarCross Ref
Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. In Computational Linguistics (29(1)). 19--51. Google ScholarDigital Library
Yoon Mi Oh, Francois Pellegrino, Egidio Marsico, and Christophe Coupe. 2013. A quantitative and typological approach to correlating linguistic complexity. In Proceedings of the 5th Conference on Quantitative Investigations in Theoretical Linguistics.Google Scholar
Santanu Pal, Sudip Kumar Naskar, and Sivaji Bandyopadhyay. 2013. A hybrid word alignment model for phrase-based statistical machine translation. In Proceedings of the 2nd Workshop on Hybrid Approaches to Translation. 94--101.Google Scholar
Theeravat Songyot and David Chiang. 2014. Improving word alignment using word similarity. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1840--1845. Google ScholarCross Ref
Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th International Conference on Computational Linguistics (COLING’96). 836--841.Google ScholarDigital Library
Xiaolin Wang, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2014. Refining word segmentation using a manually aligned corpus for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1654--1664. Google ScholarCross Ref
Hua Wu and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 856--863. Google ScholarDigital Library

Index Terms

Linguistic-Relationships-Based Approach for Improving Word Alignment
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Word alignment of English-Chinese bilingual corpus based on chunks
EMNLP '00: Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13

In this paper, a method for the word alignment of English-Chinese corpus based on chunks is proposed. The chunks of English sentences are identified firstly. Then the chunk boundaries of Chinese sentences are predicted by the translations of English ...
Read More
A Word Alignment Algorithm of Laos-Chinese Based on Language Feature
ICVR 2018: Proceedings of the 4th International Conference on Virtual Reality

Lao and Chinese are an isolated language, and Laotian words have no morphological changes, and word order and imaginary words are important means of expressing grammatical meaning, There is no natural interval between words and words, and there is no ...
Read More
Research of English-Chinese Alignment at Word Granularity on Parallel Corpora
ICIS '08: Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)

Bilingual alignment is a crucial problem in the research of natural language processing, and word alignment is a nodus among all granularities of alignment. This paper describes an English-Chinese word alignment model based on a bilingual lexicon and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 17, Issue 1
March 2018
152 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3141228
Editor:
Nianwen Xue
Brandeis University, Waltham, USA
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2017
- Accepted: 1 August 2017
- Revised: 1 April 2017
- Received: 1 October 2016
Published in tallip Volume 17, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Chinese-Vietnamese machine translation
Sino-Vietnamese
Word alignment
content word
linguistic relationships
Qualifiers
- short-paper
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 170
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Linguistic-Relationships-Based Approach for Improving Word Alignment

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Word alignment of English-Chinese bilingual corpus based on chunks

A Word Alignment Algorithm of Laos-Chinese Based on Language Feature

Research of English-Chinese Alignment at Word Granularity on Parallel Corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Linguistic-Relationships-Based Approach for Improving Word Alignment

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Word alignment of English-Chinese bilingual corpus based on chunks

A Word Alignment Algorithm of Laos-Chinese Based on Language Feature

Research of English-Chinese Alignment at Word Granularity on Parallel Corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media