Abstract
Sentence alignment plays an extremely important role in machine translation. Most of the hybrid approaches get either a bad recall or low precision. We tackle disadvantages of several novel sentence alignment approaches, which combine length-based and word correspondences. Word clustering is applied in our method in order to improve the quality of the sentence aligner, especially when dealing with the sparse data problem. Our approach overcomes the limits of previous hybrid methods and obtains both highly recall and reasonable precision rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Braune, F., Fraser, A.: Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 81–89 (2010)
Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics, Berkeley, California, pp. 169–176 (1991)
Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)
Chen, S.F.: Aligning sentences in bilingual corpora using lexical information. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 9–16 (1993)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19(1), 75–102 (1993)
Kay, M., Röscheisen, M.: Text-translation alignment. Computational Linguistics 19(1), 121–142 (1993)
Ma, X.: Champollion: a robust parallel text sentence aligner. In: LREC 2006: Fifth International Conference on Language Resources and Evaluation, pp. 489–492 (2006)
Melamed, I.D.: A geometric approach to mapping bitext correspondence. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1–12 (1996)
Fattah, M.A.: The Use of MSVM and HMM for Sentence Alignment. JIPS 8(2), 301–314 (2012)
Moore, R.C.: Fast and Accurate Sentence Alignment of Bilingual Corpora. In: Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users, pp. 135–144 (2002)
Sennrich, R., Volk, M.: MT-based sentence alignment for OCR-generated parallel texts. In: The Ninth Conference of the Association for Machine Translation in the Americas (AMTA 2010), Denver, Colorado (2010)
Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Proceedings of the RANLP 2005, pp. 590–596 (2005)
Wu, D.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 80–87 (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Trieu, HL., Nguyen, PT., Nguyen, KA. (2014). Improving Moore’s Sentence Alignment Method Using Bilingual Word Clustering. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 244. Springer, Cham. https://doi.org/10.1007/978-3-319-02741-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-02741-8_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02740-1
Online ISBN: 978-3-319-02741-8
eBook Packages: EngineeringEngineering (R0)