Skip to main content

Identifying Idiomatic Expressions Using Phrase Alignments in Bilingual Parallel Corpus

  • Conference paper
PRICAI 2010: Trends in Artificial Intelligence (PRICAI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6230))

Included in the following conference series:

  • 1634 Accesses

Abstract

Previous efforts to identify idiomatic expressions using a bilingual parallel corpus have focused on the method of using word alignments to catch the sense of individual words. In this paper, we propose a method of using phrase alignments rather than word alignments in a parallel corpus to recognize the sense of phrases as well as words. Our proposed scoring functions are based on the difference of translation tendency between a phrase and individual words. They can help us identify idiomatic expressions with a entropy variation and a translation difference between a phrase and individual words. Experimental results show that our proposed method is more effective than previous approaches for the identification of idiomatic expressions. In addition, we proved that linguistic constraints can be integrated into our method to improve the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61–103 (2009)

    Article  Google Scholar 

  2. Li, L., Sporleder, C.: Classifier combination for contextual idiom detection without labelled data. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 315–323 (2009)

    Google Scholar 

  3. Moiron, B.V., Tiedemann, J.: Identifying idiomatic expressions using automatic word alignment. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics 2006 Workshop on Multiword Expressions, pp. 33–40 (April 2006)

    Google Scholar 

  4. Lin, D.: Automatic identification of non-compositional phrases. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 317–324. Association for Computational Linguistics (June 1999)

    Google Scholar 

  5. Melamed, I.D.: Automatic discovery of non-compositional compounds in parallel data. In: Proceedings of 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP 1997), Providence, RI (1997)

    Google Scholar 

  6. Bannard, C., Callison-Burch, C.: Paraphrasing with bilingual parallel corpora. In: ACL 2005: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 597–604. Association for Computational Linguistics (2005)

    Google Scholar 

  7. Wu, D., Xia, X.: Learning an english-chinese lexicon from a parallel corpus. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas (1994)

    Google Scholar 

  8. Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Parallel Text Processing, pp. 1–17. Springer, Heidelberg (1998)

    Google Scholar 

  9. Melamed, I.D.: Measuring semantic entropy. In: Proceedings of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How, Washington, pp. 41–46 (1997)

    Google Scholar 

  10. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL 2003: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54. Association for Computational Linguistics (2003)

    Google Scholar 

  11. Och, F.J., Tillmann, C., Ney, H.: Improved alignment models for statistical machine translation. In: Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20–28 (1999)

    Google Scholar 

  12. Marcu, D., Wong, W.: A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 133–139 (2002)

    Google Scholar 

  13. Zhang, Y., Vogel, S.: An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora. In: Proceedings of the Tenth Conference of the European Association for Machine Translation, EAMT 2005 (2005)

    Google Scholar 

  14. DeNero, J., Klein, D.: The complexity of phrase alignment problems. In: HLT 2008: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, Morristown, NJ, USA, pp. 25–28. Association for Computational Linguistics (2008)

    Google Scholar 

  15. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, HG., Kim, MJ., Hong, G., Kim, SB., Hwang, YS., Rim, HC. (2010). Identifying Idiomatic Expressions Using Phrase Alignments in Bilingual Parallel Corpus. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15246-7_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15245-0

  • Online ISBN: 978-3-642-15246-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics