Abstract
The merit of phrase-based statistical machine translation is often reduced by the complexity to construct it. In this paper, we address some issues in phrase-based statistical machine translation, namely: the size of the phrase translation table, the use of underlying translation model probability and the length of the phrase unit. We present Level-Of-Detail (LOD) approach, an agglomerative approach for learning phrase-level alignment. Our experiments show that LOD approach significantly improves the performance of the word-based approach. LOD demonstrates a clear advantage that the phrase translation table grows only sub-linearly over the maximum phrase length, while having a performance comparable to those of other phrase-based approaches.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Och, F.J., Tillmann, C., Ney, H.: Improved alignment models for statistical machine translation. In: Proc of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20–28. University of Maryland, College Park (1999)
Och, F.J., Ney, H.: A Comparison of alignment models for statistical machine translation. In: Proc. of the 18th International Conference of Computational Linguistics, Saarbruken, Germany (July 2000)
Marcu, D., Wong, W.: A phrase-Based, joint probability model for statistical machine translation. In: Proc. of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, pp. 133–139 (July 2002)
Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proc. of COLING 1996: The 16th International Conference of Computational Linguistics, Copenhagen, Denmark, pp. 836–841 (1996)
Tillmann, C.: A projection extension algorithm for statistical machine translation. In: Proc. of the Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan (2003)
Zhang, Y., Vogel, S., Waibel, A.: Integrated phrase segmentation and alignment algorithm for statistical machine translation. In: Proc. of the Conference on Natural Language Processing and Knowledge Engineering, Beijing, China (2003)
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-based Translation. In: Proc. of the Human Language Technology Conference, Edmonton, Canada, May/June, pp. 127–133 (2003)
Venugopal, A., Vogel, S., Waibel, A.: Effective phrase translation extraction from alignment models. In: Proc. of 41st Annual Meeting of Association of Computational Linguistics, Sapporo, Japan, pp. 319–326 (July 2004)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: A method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Report (2001)
Doddington, G.: Automatic evaluation of machine translation quality using N-gram co-occurence statistics. In: Proc. of the Conference on Human Language Technology, San Diego, CA, USA, pp. 138–135 (2002)
Zens, R., Ney, H.: Improvements in phrase-Based statistical machine translation. In: Proc. of Conference on Human Language Technology, Boston, MA, USA, pp. 257–264 (2004)
Melamed, I.D.: Automatic discovery of non-compositional compounds in parallel data. In: Proc. of 2nd Conference on Empirical Methods in Natural Language Processing, Provicence, RI (1997)
Moore, R.C.: Towards a simple and accurate statistical approach to learning translation relationships among words. In: Proc. of Workshop on Data-driven Machine Translation, 39th Annual Meeting and 10th Conference of the European Chapter, Association for Computational Linguistics, Toulouse, France, pp. 79–86 (2001)
Schwartz, R., Chow, Y.L.: The N-best algorithm: An efficient and exact procedure for finding the N most likely sentence hypothesis. In: Proc. of ICASSP 1990, Albuquerque, CA, pp. 81–84 (1990)
Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proc. of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 388–395 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Setiawan, H., Li, H., Zhang, M., Ooi, B.C. (2005). Phrase-Based Statistical Machine Translation: A Level of Detail Approach. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_51
Download citation
DOI: https://doi.org/10.1007/11562214_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)