Abstract
We describe an approach to improve the performance of sampling-based sub-sentential alignment method on translation tasks by investigating the distribution of n-grams in the phrase tables. This approach consists in enforcing the alignment of n-grams. We compare the quality of phrase translation tables output by this approach and that of the state-of-the-art estimation approach in statistical machine translation tasks. We report significant improvements for this approach and show that merging phrase tables outperforms the state-of-the-art techniques.
A. Lardilleux – The work was done while the author was at TLP Group, LIMSI-CNRS, France.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Contrary to the widely used terminology where it denotes a set of links between the source and target words of a sentence pair, we call “alignment” a (source, target) phrase pair, i.e., it corresponds to an entry in the so-called [phrase] translation tables.
- 3.
Option -N 1 in the program.
References
Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Software Engineering. Testing, and Quality Assurance for Natural Language Processing, Columbus, Ohio, pp. 49–57 (2008)
Brown, P., Pietra, S.D., Pietra, V.D., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Vogel, S., Ney, H., Tillman, C.: HMM-based word alignment in statistical translation. In: Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark, pp. 836–841 (1996)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp. 177–180 (2007)
Ideue, M., Yamamoto, K., Utiyama, M., Sumita, E.: A comparison of unsupervised bilingual term extraction methods using phrase-tables. In: Proceedings of MT Summit XIII, Xiamen, China, pp. 346–351 (2011)
Thurmair, G., Aleksic, V.: Creating term and lexicon entries from phrase tables. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, pp. 253–260 (2012)
Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 214–218 (2009)
Gale, W., Church, K.: Identifying word correspondences in parallel texts. In: Proceedings of the 4th DARPA Workshop on Speech and Natural Language, California, pp. 152–157 (1991)
Melamed, D.: Models of translational equivalence among words. Comput. Linguist. 26(2), 221–249 (2000)
Moore, R.: Association-based bilingual word alignment. In: Proceedings of the ACL Workshop on Building and Using Parallel Text, Ann Arbor, pp. 1–8 (2005)
Brown, P., Lai, J., Mercer, R.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, California, pp. 169–176 (1991)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, pp. 104–111 (2006)
Dyer, C., Chahuneau V., Smith, N. A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, pp. 644–648 (2013)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Edmonton, pp. 48–54 (2003)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan, pp. 160–167 (2003)
Stolcke, A.: SRILM-an extensible language modeling toolkit. In: Proceedings of the 7th International Conference on Spoken Language Processing, vol. II, pp. 901–904 (2002)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 311–318 (2002)
Lardilleux, A., Chevelu, J., Lepage, Y., Putois, G., Gosme, J.: Lexicons or phrase tables? an investigation in sampling-based multilingual alignment. In: Proceedings of the 3rd Workshop on Example-based Machine Translation, Dublin, Ireland, pp. 45–52 (2009)
Henríquez Q, A.C., Costa-jussà, R.M., Daudaravicius, V., Banchs, E. R., Mariño, B. J.: Using collocation segmentation to augment the phrase table. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, Uppsala, Sweden, pp. 98–102 (2010)
Johnson, J.H., Martin, J., Foster, G., Kuhn, R.: Improving translation quality by discarding most of the phrasetable. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 967–975 (2007)
Acknowledgments
Part of the research presented in this paper has been done under a Japanese grant-in-aid (Kakenhi C, 23500187: Improvement of alignments and release of multilingual syntactic patterns for statistical and example-based machine translation).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Luo, J., Lardilleux, A., Lepage, Y. (2014). Improving the Distribution of N-Grams in Phrase Tables Obtained by the Sampling-Based Method. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-08958-4_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)