Abstract
The unsupervised word alignments (such as GIZA++) are widely used in the phrase-based statistical machine translation. The quality of the model is proportional to the size and the quality of the bilingual corpus. However, for low-resource language pairs such as Chinese and Vietnamese, a result of unsupervised word alignment sometimes is of low quality due to the sparse data. In addition, this model does not take advantage of the linguistic relationships to improve performance of word alignment. Chinese and Vietnamese have the same language type and have close linguistic relationships. In this article, we integrate the characteristics of linguistic relationships into the word alignment model to enhance the quality of Chinese-Vietnamese word alignment. These linguistic relationships are Sino-Vietnamese and content word. The experimental results showed that our method improved the performance of word alignment as well as the quality of machine translation.
- Peter F. Brown, Vincent Della J. Pietra, Stephen, Andrew Della Pietra, and Robert Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 1993 (1993), 263--311.Google Scholar
- Chenhui Chu, Toshiaki Nakazawa, and Sadao Kurohashi. 2011. Japanese-Chinese phrase alignment using common Chinese characters information. In Proceedings of the Conference on Machine Translation Summit XIII (MT Summit XIII’11). 475--482.Google Scholar
- Raj Dabre, Fabien Cromieres, Sadao Kurohashi, and Pushpak Bhattacharyya. 2015. Leveraging small multilingual corpora for SMT using many pivot languages. In Human Language Technologies: Proceedings of the 2015 Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL’15). 1192--1202. Google ScholarCross Ref
- Dien Dinh and Thuy Vu. 2006. A maximum entropy approach for Vietnamese word segmentation. In In Proceedings of the 4th International Conference on Computing 8 Communication Technologies, Research, Innovation, and Vision for the Future (RIVF’06). 12--16. Google ScholarCross Ref
- Nadir Durrani and Philipp Koehn. 2014. Improving machine translation via triangulation and transliteration. In Proceedings of 17th Annual Conference of the European Association for Machine Translation. 71--78.Google Scholar
- Kuzman Ganchev, Joao V. Graca, and Ben Tasker. 2008. Better alignments = better translations?. In Human Language Technologies: Proceedings of the 2008 Annual Conference of the North American Chapter of the Association for Computational Linguistics ACL (ACL’08). 986--993.Google Scholar
- Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, Cambridge.Google Scholar
- Dinh Khan Le. 2002. Vietnamese Vocabulary having Chinese Origin. National University of HCMC Press, HCMC. [in Vietnamese]Google Scholar
- Tomer Levinboim and David Chiang. 2015. Multi-task word alignment triangulation for low-resource languages. In Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL’15). 1221--1226. Google ScholarCross Ref
- Preslav Nakov and Hwee Tou Ng. 2009. Improved statistical machine translation for resource-poor languages using related resource-rich languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 1358--1367. Google ScholarCross Ref
- Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. In Computational Linguistics (29(1)). 19--51. Google ScholarDigital Library
- Yoon Mi Oh, Francois Pellegrino, Egidio Marsico, and Christophe Coupe. 2013. A quantitative and typological approach to correlating linguistic complexity. In Proceedings of the 5th Conference on Quantitative Investigations in Theoretical Linguistics.Google Scholar
- Santanu Pal, Sudip Kumar Naskar, and Sivaji Bandyopadhyay. 2013. A hybrid word alignment model for phrase-based statistical machine translation. In Proceedings of the 2nd Workshop on Hybrid Approaches to Translation. 94--101.Google Scholar
- Theeravat Songyot and David Chiang. 2014. Improving word alignment using word similarity. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1840--1845. Google ScholarCross Ref
- Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th International Conference on Computational Linguistics (COLING’96). 836--841.Google ScholarDigital Library
- Xiaolin Wang, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2014. Refining word segmentation using a manually aligned corpus for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1654--1664. Google ScholarCross Ref
- Hua Wu and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 856--863. Google ScholarDigital Library
Index Terms
- Linguistic-Relationships-Based Approach for Improving Word Alignment
Recommendations
Word alignment of English-Chinese bilingual corpus based on chunks
EMNLP '00: Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13In this paper, a method for the word alignment of English-Chinese corpus based on chunks is proposed. The chunks of English sentences are identified firstly. Then the chunk boundaries of Chinese sentences are predicted by the translations of English ...
A Word Alignment Algorithm of Laos-Chinese Based on Language Feature
ICVR 2018: Proceedings of the 4th International Conference on Virtual RealityLao and Chinese are an isolated language, and Laotian words have no morphological changes, and word order and imaginary words are important means of expressing grammatical meaning, There is no natural interval between words and words, and there is no ...
Research of English-Chinese Alignment at Word Granularity on Parallel Corpora
ICIS '08: Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)Bilingual alignment is a crucial problem in the research of natural language processing, and word alignment is a nodus among all granularities of alignment. This paper describes an English-Chinese word alignment model based on a bilingual lexicon and ...
Comments