Abstract
An example-based machine translation (EBMT) system based on proportional analogies requires numerous proportional analogies between linguistic units to work properly. Consequently, long sentences cannot be handled directly in such a framework. Cutting sentences into chunks would be a solution. Using different markers, we count the number of proportional analogies between chunks in 11 European languages. As expected, the number of proportional analogies between chunks found is very high. These results, and preliminary experiments in translation, are promising for the EBMT system that we intend to build.
This paper is part of the outcome of research performed under a Waseda University Grant for Special Research Project (project number: 2010A-906).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lepage, Y., Denoual, E.: Purest ever example-based machine translation: detailed presentation and assessment. Mach. Transl. 19(3), 251–282 (2005)
Lepage, Y.: Analogy and formal languages. Electron. Notes Theoret. Comput. Sci. 53, 180–191 (2004)
Green, T.: The necessity of syntax markers: two experiments with artificial languages. J. Verbal Learn. Verbal Behav. 18(4), 481–496 (1979)
Gough, N., Way, A.: Robust large-scale EBMT with marker-based segmentation. In: Proceedings of TMI-04, pp. 95–104 (2004)
Stroppa, N., Way, A.: MaTrEx: the DCU machine translation system for IWSLT 2006. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 31–36 (2006)
Van Den Bosch, A., Stroppa, N., Way, A.: A memory-based classification approach to marker-based EBMT. In: Proceedings of the METIS-II Workshop on New Approaches to Machine Translation, Leuven, Belgium, pp. 63–72 (2007)
Harris, Z.: From phoneme to morpheme. Language 31(2), 190–222 (1955)
Tanaka-Ishii, K.: Entropy as an indicator of context boundaries: an experiment using a web search engine. In: Dale, R., Wong, K.-F., Su, J., Kwong, O. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 93–105. Springer, Heidelberg (2005)
Jin, Z., Tanaka-Ishii, K.: Unsupervised segmentation of Chinese text by use of branching entropy. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 428–435. Association for Computational Linguistics (2006)
Magistry, P., Sagot, B.: Unsupervized word segmentation: the case for Mandarin chinese. In: Annual Meeting of the Association for Computational Linguistics (ACL 2012), Jeju, Korea, ACL, July 2012 (2012)
Lepage, Y., Migeot, J., Guillerm, E.: A measure of the number of true analogies between Chunks in Japanese. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS, vol. 5603, pp. 154–164. Springer, Heidelberg (2009)
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Proceedings of MT Summit X, Phuket, Thailand, pp. 79–86 (2005)
Lardilleux, A., Lepage, Y.: A truly multilingual, high coverage, accurate, yet simple, subsentential alignment method. In: Proceedings of the Xth conference of the Association for Machine Translation in the Americas, Waikiki, Hawai’i, October 2008, pp. 125–132 (2008)
Koehn, P., Och, F., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Alberta, pp. 127–133 (2003)
Lepage, Y., Migeot, J., Guillerm, E.: Analogies of form between chunks in Japanese are massive and far from being misleading. In: Proceedings of the 3rd Language and Technology Conference (LTC 2007), Poznań, Poland, October 2007, pp. 503–507 (2007)
Lepage, Y., Migeot, J., Guillerm, E.: A corpus study on the number of true proportional analogies between chunks in two typologically different languages. In: Proceedings of the seventh international Symposium on Natural Language Processing (SNLP 2007), Kasetsart University, Pattaya, Thailand, December 2007, pp. 117–122 (2007). ISBN:978-974-623-062-9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Takeya, K., Lepage, Y. (2014). Marker-Based Chunking in Eleven European Languages for Analogy-Based Translation. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-08958-4_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)