Marker-Based Chunking in Eleven European Languages for Analogy-Based Translation

Takeya, Kota; Lepage, Yves

doi:10.1007/978-3-319-08958-4_35

Kota Takeya⁶ &
Yves Lepage⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

Language and Technology Conference

Abstract

An example-based machine translation (EBMT) system based on proportional analogies requires numerous proportional analogies between linguistic units to work properly. Consequently, long sentences cannot be handled directly in such a framework. Cutting sentences into chunks would be a solution. Using different markers, we count the number of proportional analogies between chunks in 11 European languages. As expected, the number of proportional analogies between chunks found is very high. These results, and preliminary experiments in translation, are promising for the EBMT system that we intend to build.

This paper is part of the outcome of research performed under a Waseda University Grant for Special Research Project (project number: 2010A-906).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lepage, Y., Denoual, E.: Purest ever example-based machine translation: detailed presentation and assessment. Mach. Transl. 19(3), 251–282 (2005)
Google Scholar
Lepage, Y.: Analogy and formal languages. Electron. Notes Theoret. Comput. Sci. 53, 180–191 (2004)
Article Google Scholar
Green, T.: The necessity of syntax markers: two experiments with artificial languages. J. Verbal Learn. Verbal Behav. 18(4), 481–496 (1979)
Article Google Scholar
Gough, N., Way, A.: Robust large-scale EBMT with marker-based segmentation. In: Proceedings of TMI-04, pp. 95–104 (2004)
Google Scholar
Stroppa, N., Way, A.: MaTrEx: the DCU machine translation system for IWSLT 2006. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 31–36 (2006)
Google Scholar
Van Den Bosch, A., Stroppa, N., Way, A.: A memory-based classification approach to marker-based EBMT. In: Proceedings of the METIS-II Workshop on New Approaches to Machine Translation, Leuven, Belgium, pp. 63–72 (2007)
Google Scholar
Harris, Z.: From phoneme to morpheme. Language 31(2), 190–222 (1955)
Article Google Scholar
Tanaka-Ishii, K.: Entropy as an indicator of context boundaries: an experiment using a web search engine. In: Dale, R., Wong, K.-F., Su, J., Kwong, O. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 93–105. Springer, Heidelberg (2005)
Chapter Google Scholar
Jin, Z., Tanaka-Ishii, K.: Unsupervised segmentation of Chinese text by use of branching entropy. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 428–435. Association for Computational Linguistics (2006)
Google Scholar
Magistry, P., Sagot, B.: Unsupervized word segmentation: the case for Mandarin chinese. In: Annual Meeting of the Association for Computational Linguistics (ACL 2012), Jeju, Korea, ACL, July 2012 (2012)
Google Scholar
Lepage, Y., Migeot, J., Guillerm, E.: A measure of the number of true analogies between Chunks in Japanese. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS, vol. 5603, pp. 154–164. Springer, Heidelberg (2009)
Google Scholar
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Proceedings of MT Summit X, Phuket, Thailand, pp. 79–86 (2005)
Google Scholar
Lardilleux, A., Lepage, Y.: A truly multilingual, high coverage, accurate, yet simple, subsentential alignment method. In: Proceedings of the Xth conference of the Association for Machine Translation in the Americas, Waikiki, Hawai’i, October 2008, pp. 125–132 (2008)
Google Scholar
Koehn, P., Och, F., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Alberta, pp. 127–133 (2003)
Google Scholar
Lepage, Y., Migeot, J., Guillerm, E.: Analogies of form between chunks in Japanese are massive and far from being misleading. In: Proceedings of the 3rd Language and Technology Conference (LTC 2007), Poznań, Poland, October 2007, pp. 503–507 (2007)
Google Scholar
Lepage, Y., Migeot, J., Guillerm, E.: A corpus study on the number of true proportional analogies between chunks in two typologically different languages. In: Proceedings of the seventh international Symposium on Natural Language Processing (SNLP 2007), Kasetsart University, Pattaya, Thailand, December 2007, pp. 117–122 (2007). ISBN:978-974-623-062-9
Google Scholar

Download references

Author information

Authors and Affiliations

IPS, Waseda University, Hibikino 2-7, Kitakyushu, Fukuoka, 808-0135, Japan
Kota Takeya & Yves Lepage

Authors

Kota Takeya
View author publications
You can also search for this author in PubMed Google Scholar
Yves Lepage
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yves Lepage .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
IMMI-CNRS, Orsay, France
Joseph Mariani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takeya, K., Lepage, Y. (2014). Marker-Based Chunking in Eleven European Languages for Analogy-Based Translation. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-08958-4_35
Published: 26 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics