Abstract
We propose a method that bilingually segments sentences in languages with no clear delimiter for word boundaries. In our model, we first convert the search for the segmentation into a sequential tagging problem, allowing for a polynomial-time dynamic-programming solution, and incorporate a control to balance monolingual and bilingual information at hand. Our bilingual segmentation algorithm, the integration of a monolingual language model and a statistical translation model, is devised to tokenize sentences more suitably for bilingual applications such as word alignment and machine translation. Empirical results show that bilingually-motivated segmenters outperform pure monolingual one in both the word-aligning (12% reduction in error rate) and the translating (5% improvement in BLEU) tasks, suggesting monolingual segmentation is useful in some aspects but, in a sense, not built for bilingual researches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brown, P.F., et al.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 263–311 (1993)
Cherry, C., Lin, D.: A probability model to improve word alignment. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 88–95 (2003)
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Conference of the Association for Computational Linguistics, pp. 263–270 (2005)
Liu, Y., Liu, Q., Lin, S.: Tree-to-string alignment template for statistical machine translation. In: Proceedings of the Annual Conference of the Association for Computational Linguistics, pp. 609–616 (2006)
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the Annual Conference of the Association for Computational Linguistics, pp. 440–447 (2000)
Toutanova, K., Ilhan, H.T., Manning, C.D.: Extentions to HMM-based statistical word alignment models. In: Proceedings of the Conference on Empirical Methods in Natural Processing Language (2002)
Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proceedings of the 16th conference on Computational linguistics, pp. 836–841 (1996)
Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 377–403 (1997)
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the Annual Conference of the Association for Computational Linguistics (2001)
Zen, R., Ney, H.: A comparative study on reordering constraints in statistical machine translation. In: Proceedings of the Annual Conference of the Association for Computational Linguistics, pp. 144–151 (2003)
Zhang, H., Gildea, D.: Stochastic lexicalized inversion transduction grammar for alignment. In: Proceedings of the Annual Meeting of the ACL, pp. 475–482 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, CC., Chen, WT., Chang, J.S. (2008). Bilingual Segmentation for Alignment and Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-78135-6_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)