Skip to main content

Marker-Based Chunking in Eleven European Languages for Analogy-Based Translation

  • Conference paper
  • First Online:
Book cover Human Language Technology Challenges for Computer Science and Linguistics (LTC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

Abstract

An example-based machine translation (EBMT) system based on proportional analogies requires numerous proportional analogies between linguistic units to work properly. Consequently, long sentences cannot be handled directly in such a framework. Cutting sentences into chunks would be a solution. Using different markers, we count the number of proportional analogies between chunks in 11 European languages. As expected, the number of proportional analogies between chunks found is very high. These results, and preliminary experiments in translation, are promising for the EBMT system that we intend to build.

This paper is part of the outcome of research performed under a Waseda University Grant for Special Research Project (project number: 2010A-906).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lepage, Y., Denoual, E.: Purest ever example-based machine translation: detailed presentation and assessment. Mach. Transl. 19(3), 251–282 (2005)

    Google Scholar 

  2. Lepage, Y.: Analogy and formal languages. Electron. Notes Theoret. Comput. Sci. 53, 180–191 (2004)

    Article  Google Scholar 

  3. Green, T.: The necessity of syntax markers: two experiments with artificial languages. J. Verbal Learn. Verbal Behav. 18(4), 481–496 (1979)

    Article  Google Scholar 

  4. Gough, N., Way, A.: Robust large-scale EBMT with marker-based segmentation. In: Proceedings of TMI-04, pp. 95–104 (2004)

    Google Scholar 

  5. Stroppa, N., Way, A.: MaTrEx: the DCU machine translation system for IWSLT 2006. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 31–36 (2006)

    Google Scholar 

  6. Van Den Bosch, A., Stroppa, N., Way, A.: A memory-based classification approach to marker-based EBMT. In: Proceedings of the METIS-II Workshop on New Approaches to Machine Translation, Leuven, Belgium, pp. 63–72 (2007)

    Google Scholar 

  7. Harris, Z.: From phoneme to morpheme. Language 31(2), 190–222 (1955)

    Article  Google Scholar 

  8. Tanaka-Ishii, K.: Entropy as an indicator of context boundaries: an experiment using a web search engine. In: Dale, R., Wong, K.-F., Su, J., Kwong, O. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 93–105. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Jin, Z., Tanaka-Ishii, K.: Unsupervised segmentation of Chinese text by use of branching entropy. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 428–435. Association for Computational Linguistics (2006)

    Google Scholar 

  10. Magistry, P., Sagot, B.: Unsupervized word segmentation: the case for Mandarin chinese. In: Annual Meeting of the Association for Computational Linguistics (ACL 2012), Jeju, Korea, ACL, July 2012 (2012)

    Google Scholar 

  11. Lepage, Y., Migeot, J., Guillerm, E.: A measure of the number of true analogies between Chunks in Japanese. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS, vol. 5603, pp. 154–164. Springer, Heidelberg (2009)

    Google Scholar 

  12. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Proceedings of MT Summit X, Phuket, Thailand, pp. 79–86 (2005)

    Google Scholar 

  13. Lardilleux, A., Lepage, Y.: A truly multilingual, high coverage, accurate, yet simple, subsentential alignment method. In: Proceedings of the Xth conference of the Association for Machine Translation in the Americas, Waikiki, Hawai’i, October 2008, pp. 125–132 (2008)

    Google Scholar 

  14. Koehn, P., Och, F., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Alberta, pp. 127–133 (2003)

    Google Scholar 

  15. Lepage, Y., Migeot, J., Guillerm, E.: Analogies of form between chunks in Japanese are massive and far from being misleading. In: Proceedings of the 3rd Language and Technology Conference (LTC 2007), Poznań, Poland, October 2007, pp. 503–507 (2007)

    Google Scholar 

  16. Lepage, Y., Migeot, J., Guillerm, E.: A corpus study on the number of true proportional analogies between chunks in two typologically different languages. In: Proceedings of the seventh international Symposium on Natural Language Processing (SNLP 2007), Kasetsart University, Pattaya, Thailand, December 2007, pp. 117–122 (2007). ISBN:978-974-623-062-9

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yves Lepage .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Takeya, K., Lepage, Y. (2014). Marker-Based Chunking in Eleven European Languages for Analogy-Based Translation. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics