Bilingual Segmentation for Alignment and Translation

Huang, Chung-Chi; Chen, Wei-Teh; Chang, Jason S.

doi:10.1007/978-3-540-78135-6_38

Chung-Chi Huang¹,
Wei-Teh Chen¹ &
Jason S. Chang¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1512 Accesses

Abstract

We propose a method that bilingually segments sentences in languages with no clear delimiter for word boundaries. In our model, we first convert the search for the segmentation into a sequential tagging problem, allowing for a polynomial-time dynamic-programming solution, and incorporate a control to balance monolingual and bilingual information at hand. Our bilingual segmentation algorithm, the integration of a monolingual language model and a statistical translation model, is devised to tokenize sentences more suitably for bilingual applications such as word alignment and machine translation. Empirical results show that bilingually-motivated segmenters outperform pure monolingual one in both the word-aligning (12% reduction in error rate) and the translating (5% improvement in BLEU) tasks, suggesting monolingual segmentation is useful in some aspects but, in a sense, not built for bilingual researches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic Long Sentence Segmentation for Neural Machine Translation

An Improved Method of Applying a Machine Translation Model to a Chinese Word Segmentation Task

Optimized Uyghur Segmentation for Statistical Machine Translation

References

Brown, P.F., et al.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 263–311 (1993)
Google Scholar
Cherry, C., Lin, D.: A probability model to improve word alignment. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 88–95 (2003)
Google Scholar
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Conference of the Association for Computational Linguistics, pp. 263–270 (2005)
Google Scholar
Liu, Y., Liu, Q., Lin, S.: Tree-to-string alignment template for statistical machine translation. In: Proceedings of the Annual Conference of the Association for Computational Linguistics, pp. 609–616 (2006)
Google Scholar
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the Annual Conference of the Association for Computational Linguistics, pp. 440–447 (2000)
Google Scholar
Toutanova, K., Ilhan, H.T., Manning, C.D.: Extentions to HMM-based statistical word alignment models. In: Proceedings of the Conference on Empirical Methods in Natural Processing Language (2002)
Google Scholar
Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proceedings of the 16th conference on Computational linguistics, pp. 836–841 (1996)
Google Scholar
Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 377–403 (1997)
Google Scholar
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the Annual Conference of the Association for Computational Linguistics (2001)
Google Scholar
Zen, R., Ney, H.: A comparative study on reordering constraints in statistical machine translation. In: Proceedings of the Annual Conference of the Association for Computational Linguistics, pp. 144–151 (2003)
Google Scholar
Zhang, H., Gildea, D.: Stochastic lexicalized inversion transduction grammar for alignment. In: Proceedings of the Annual Meeting of the ACL, pp. 475–482 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Systems and Applications, NTHU, HsingChu, Taiwan R.O.C., 300
Chung-Chi Huang, Wei-Teh Chen & Jason S. Chang

Authors

Chung-Chi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Teh Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jason S. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, CC., Chen, WT., Chang, J.S. (2008). Bilingual Segmentation for Alignment and Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_38

Download citation

DOI: https://doi.org/10.1007/978-3-540-78135-6_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics