Skip to main content

Improving the Distribution of N-Grams in Phrase Tables Obtained by the Sampling-Based Method

  • Conference paper
  • First Online:
Human Language Technology Challenges for Computer Science and Linguistics (LTC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

  • 841 Accesses

Abstract

We describe an approach to improve the performance of sampling-based sub-sentential alignment method on translation tasks by investigating the distribution of n-grams in the phrase tables. This approach consists in enforcing the alignment of n-grams. We compare the quality of phrase translation tables output by this approach and that of the state-of-the-art estimation approach in statistical machine translation tasks. We report significant improvements for this approach and show that merging phrase tables outperforms the state-of-the-art techniques.

A. Lardilleux – The work was done while the author was at TLP Group, LIMSI-CNRS, France.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://anymalign.limsi.fr/

  2. 2.

    Contrary to the widely used terminology where it denotes a set of links between the source and target words of a sentence pair, we call “alignment” a (source, target) phrase pair, i.e., it corresponds to an entry in the so-called [phrase] translation tables.

  3. 3.

    Option -N 1 in the program.

References

  1. Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Software Engineering. Testing, and Quality Assurance for Natural Language Processing, Columbus, Ohio, pp. 49–57 (2008)

    Google Scholar 

  2. Brown, P., Pietra, S.D., Pietra, V.D., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)

    Google Scholar 

  3. Vogel, S., Ney, H., Tillman, C.: HMM-based word alignment in statistical translation. In: Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark, pp. 836–841 (1996)

    Google Scholar 

  4. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp. 177–180 (2007)

    Google Scholar 

  5. Ideue, M., Yamamoto, K., Utiyama, M., Sumita, E.: A comparison of unsupervised bilingual term extraction methods using phrase-tables. In: Proceedings of MT Summit XIII, Xiamen, China, pp. 346–351 (2011)

    Google Scholar 

  6. Thurmair, G., Aleksic, V.: Creating term and lexicon entries from phrase tables. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, pp. 253–260 (2012)

    Google Scholar 

  7. Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 214–218 (2009)

    Google Scholar 

  8. Gale, W., Church, K.: Identifying word correspondences in parallel texts. In: Proceedings of the 4th DARPA Workshop on Speech and Natural Language, California, pp. 152–157 (1991)

    Google Scholar 

  9. Melamed, D.: Models of translational equivalence among words. Comput. Linguist. 26(2), 221–249 (2000)

    Article  Google Scholar 

  10. Moore, R.: Association-based bilingual word alignment. In: Proceedings of the ACL Workshop on Building and Using Parallel Text, Ann Arbor, pp. 1–8 (2005)

    Google Scholar 

  11. Brown, P., Lai, J., Mercer, R.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, California, pp. 169–176 (1991)

    Google Scholar 

  12. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  13. Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, pp. 104–111 (2006)

    Google Scholar 

  14. Dyer, C., Chahuneau V., Smith, N. A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, pp. 644–648 (2013)

    Google Scholar 

  15. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Edmonton, pp. 48–54 (2003)

    Google Scholar 

  16. Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan, pp. 160–167 (2003)

    Google Scholar 

  17. Stolcke, A.: SRILM-an extensible language modeling toolkit. In: Proceedings of the 7th International Conference on Spoken Language Processing, vol. II, pp. 901–904 (2002)

    Google Scholar 

  18. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 311–318 (2002)

    Google Scholar 

  19. Lardilleux, A., Chevelu, J., Lepage, Y., Putois, G., Gosme, J.: Lexicons or phrase tables? an investigation in sampling-based multilingual alignment. In: Proceedings of the 3rd Workshop on Example-based Machine Translation, Dublin, Ireland, pp. 45–52 (2009)

    Google Scholar 

  20. Henríquez Q, A.C., Costa-jussà, R.M., Daudaravicius, V., Banchs, E. R., Mariño, B. J.: Using collocation segmentation to augment the phrase table. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, Uppsala, Sweden, pp. 98–102 (2010)

    Google Scholar 

  21. Johnson, J.H., Martin, J., Foster, G., Kuhn, R.: Improving translation quality by discarding most of the phrasetable. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 967–975 (2007)

    Google Scholar 

Download references

Acknowledgments

Part of the research presented in this paper has been done under a Japanese grant-in-aid (Kakenhi C, 23500187: Improvement of alignments and release of multilingual syntactic patterns for statistical and example-based machine translation).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Luo, J., Lardilleux, A., Lepage, Y. (2014). Improving the Distribution of N-Grams in Phrase Tables Obtained by the Sampling-Based Method. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics