Abstract
Previous efforts to identify idiomatic expressions using a bilingual parallel corpus have focused on the method of using word alignments to catch the sense of individual words. In this paper, we propose a method of using phrase alignments rather than word alignments in a parallel corpus to recognize the sense of phrases as well as words. Our proposed scoring functions are based on the difference of translation tendency between a phrase and individual words. They can help us identify idiomatic expressions with a entropy variation and a translation difference between a phrase and individual words. Experimental results show that our proposed method is more effective than previous approaches for the identification of idiomatic expressions. In addition, we proved that linguistic constraints can be integrated into our method to improve the performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61–103 (2009)
Li, L., Sporleder, C.: Classifier combination for contextual idiom detection without labelled data. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 315–323 (2009)
Moiron, B.V., Tiedemann, J.: Identifying idiomatic expressions using automatic word alignment. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics 2006 Workshop on Multiword Expressions, pp. 33–40 (April 2006)
Lin, D.: Automatic identification of non-compositional phrases. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 317–324. Association for Computational Linguistics (June 1999)
Melamed, I.D.: Automatic discovery of non-compositional compounds in parallel data. In: Proceedings of 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP 1997), Providence, RI (1997)
Bannard, C., Callison-Burch, C.: Paraphrasing with bilingual parallel corpora. In: ACL 2005: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 597–604. Association for Computational Linguistics (2005)
Wu, D., Xia, X.: Learning an english-chinese lexicon from a parallel corpus. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas (1994)
Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Parallel Text Processing, pp. 1–17. Springer, Heidelberg (1998)
Melamed, I.D.: Measuring semantic entropy. In: Proceedings of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How, Washington, pp. 41–46 (1997)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL 2003: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54. Association for Computational Linguistics (2003)
Och, F.J., Tillmann, C., Ney, H.: Improved alignment models for statistical machine translation. In: Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20–28 (1999)
Marcu, D., Wong, W.: A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 133–139 (2002)
Zhang, Y., Vogel, S.: An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora. In: Proceedings of the Tenth Conference of the European Association for Machine Translation, EAMT 2005 (2005)
DeNero, J., Klein, D.: The complexity of phrase alignment problems. In: HLT 2008: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, Morristown, NJ, USA, pp. 25–28. Association for Computational Linguistics (2008)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, HG., Kim, MJ., Hong, G., Kim, SB., Hwang, YS., Rim, HC. (2010). Identifying Idiomatic Expressions Using Phrase Alignments in Bilingual Parallel Corpus. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-15246-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15245-0
Online ISBN: 978-3-642-15246-7
eBook Packages: Computer ScienceComputer Science (R0)