Identifying Idiomatic Expressions Using Phrase Alignments in Bilingual Parallel Corpus

Lee, Hyoung-Gyu; Kim, Min-Jeong; Hong, Gumwon; Kim, Sang-Bum; Hwang, Young-Sook; Rim, Hae-Chang

doi:10.1007/978-3-642-15246-7_14

Hyoung-Gyu Lee²¹,
Min-Jeong Kim²¹,
Gumwon Hong²¹,
Sang-Bum Kim²²,
Young-Sook Hwang²² &
…
Hae-Chang Rim²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6230))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1634 Accesses

Abstract

Previous efforts to identify idiomatic expressions using a bilingual parallel corpus have focused on the method of using word alignments to catch the sense of individual words. In this paper, we propose a method of using phrase alignments rather than word alignments in a parallel corpus to recognize the sense of phrases as well as words. Our proposed scoring functions are based on the difference of translation tendency between a phrase and individual words. They can help us identify idiomatic expressions with a entropy variation and a translation difference between a phrase and individual words. Experimental results show that our proposed method is more effective than previous approaches for the identification of idiomatic expressions. In addition, we proved that linguistic constraints can be integrated into our method to improve the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61–103 (2009)
Article Google Scholar
Li, L., Sporleder, C.: Classifier combination for contextual idiom detection without labelled data. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 315–323 (2009)
Google Scholar
Moiron, B.V., Tiedemann, J.: Identifying idiomatic expressions using automatic word alignment. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics 2006 Workshop on Multiword Expressions, pp. 33–40 (April 2006)
Google Scholar
Lin, D.: Automatic identification of non-compositional phrases. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 317–324. Association for Computational Linguistics (June 1999)
Google Scholar
Melamed, I.D.: Automatic discovery of non-compositional compounds in parallel data. In: Proceedings of 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP 1997), Providence, RI (1997)
Google Scholar
Bannard, C., Callison-Burch, C.: Paraphrasing with bilingual parallel corpora. In: ACL 2005: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 597–604. Association for Computational Linguistics (2005)
Google Scholar
Wu, D., Xia, X.: Learning an english-chinese lexicon from a parallel corpus. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas (1994)
Google Scholar
Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Parallel Text Processing, pp. 1–17. Springer, Heidelberg (1998)
Google Scholar
Melamed, I.D.: Measuring semantic entropy. In: Proceedings of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How, Washington, pp. 41–46 (1997)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL 2003: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54. Association for Computational Linguistics (2003)
Google Scholar
Och, F.J., Tillmann, C., Ney, H.: Improved alignment models for statistical machine translation. In: Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20–28 (1999)
Google Scholar
Marcu, D., Wong, W.: A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 133–139 (2002)
Google Scholar
Zhang, Y., Vogel, S.: An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora. In: Proceedings of the Tenth Conference of the European Association for Machine Translation, EAMT 2005 (2005)
Google Scholar
DeNero, J., Klein, D.: The complexity of phrase alignment problems. In: HLT 2008: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, Morristown, NJ, USA, pp. 25–28. Association for Computational Linguistics (2008)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Radio Communications Engineering, Korea University, Seoul, Korea
Hyoung-Gyu Lee, Min-Jeong Kim, Gumwon Hong & Hae-Chang Rim
Convergence Technology Center, SK Telecom,
Sang-Bum Kim & Young-Sook Hwang

Authors

Hyoung-Gyu Lee
View author publications
You can also search for this author in PubMed Google Scholar
Min-Jeong Kim
View author publications
You can also search for this author in PubMed Google Scholar
Gumwon Hong
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Bum Kim
View author publications
You can also search for this author in PubMed Google Scholar
Young-Sook Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Hae-Chang Rim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, Seoul National University, 151-744, Seoul, Korea
Byoung-Tak Zhang
Department of Computing,, Macquarie University, NSW, Sydney, Australia
Mehmet A. Orgun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, HG., Kim, MJ., Hong, G., Kim, SB., Hwang, YS., Rim, HC. (2010). Identifying Idiomatic Expressions Using Phrase Alignments in Bilingual Parallel Corpus. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-15246-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15245-0
Online ISBN: 978-3-642-15246-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics