Skip to main content

A Hybrid Approach for Multiword Expression Identification

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6001))

Abstract

Considerable attention has been given to the problem of Multiword Expression (MWE) identification and treatment, for NLP tasks like parsing and generation, to improve the quality of results. Statistical methods have been often employed for MWE identification, as an inexpensive and language independent way of finding co-occurrence patterns. On the other hand, more linguistically motivated methods for identification, which employ information such as POS filters and lexical alignment between languages, can produce more targeted candidate lists. In this paper we propose a hybrid approach that combines the strenghts of different sources of information using a machine learning algorithm to produce more robust and precise results. Automatic evaluation on gold standards shows that the performance of our hybrid method is superior to the individual results of statistical and alignment-based MWE extraction approaches for Portuguese and for English. This method can be used to aid lexicographic work by providing a more targeted MWE candidate list.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E.: Grammar of Spoken and Written English. Longman, Harlow (1999)

    Google Scholar 

  3. Jackendoff, R.: Twistin’ the night away. Language 73, 534–559 (1997)

    Article  Google Scholar 

  4. Evert, S., Krenn, B.: Using small random samples for the manual evaluation of statistical association measures. Computer Speech and Language, Special Issue on Multiword Expressions 19(4), 450–466 (2005)

    Google Scholar 

  5. Baldwin, T.: The deep lexical acquisition of English verb-particles. Computer Speech and Language, Special Issue on Multiword Expressions 19(4), 398–414 (2005)

    Google Scholar 

  6. Caseli, H.M., Villavicencio, A., Machado, A., Finatto, M.J.: Statistically-driven alignment-based multiword expression identification for technical domains. In: Proceedings of the ACL-IJCNLP 2009 Workshop on Multiword Expressions, pp. 1–8 (2009)

    Google Scholar 

  7. Villavicencio, A., Caseli, H.M., Machado, A.: Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-based Approaches. In: Proceedings of the 7th Brazilian Symposium in Information and Human Language Technology, São Carlos, SP (2009)

    Google Scholar 

  8. Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61–103 (2009)

    Article  Google Scholar 

  9. Van de Cruys, T., Villada Moirón, B.: Semantics-based Multiword Expression Extraction. In: Proceedings of the ACL 2007 Workshop on Multiword Expressions: A Broader Prespective, Prague, pp. 25–32 (2007)

    Google Scholar 

  10. Villada Moirón, B., Tiedemann, J.: Identifying idiomatic expressions using automatic word-alignment. In: Proceedings of the EACL 2006 Workshop on Multiword expressions in a Multilingual Context, Trento, Italy, pp. 33–40 (2006)

    Google Scholar 

  11. Ramisch, C., Villavicencio, A., Moura, L., Idiart, M.: Picking them up and Figuring them out: Verb-Particle Constructions, Noise and Idiomaticity. In: Proceedings of the 12th Conference on Computational Natural Language Learning (CoNLL 2008), pp. 49–56 (2008)

    Google Scholar 

  12. Melamed, I.D.: Automatic Discovery of Non-Compositional Compounds in Parallel Data (1997) eprint arXiv:cmp-lg/9706027

    Google Scholar 

  13. Coulthard, R.J.: The application of corpus methodology to translation: the JPED parallel corpus and the Pediatrics comparable corpus. Master’s thesis, Universidade Federal de Santa Catarina (2005)

    Google Scholar 

  14. Lopes, L., Vieira, R., Finatto, M.J., Martins, D., Zanette, A.: Automatic extraction of composite terms for construction of ontologies: an experiment in the health care area. RECIIS - Electronic Journal of communication information and innovation in healthq 3, 76–88 (2009)

    Google Scholar 

  15. Procter, P.: Cambridge International Dictionary of English. Cambridge University Press, Cambridge (1995)

    Google Scholar 

  16. Banerjee, S., Pedersen, T.: The Design, Implementation and Use of the Ngram Statistics Package. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 370–381 (2003)

    Google Scholar 

  17. Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the 38th Annual Meeting of the ACL, Hong Kong, China, pp. 440–447 (2000)

    Google Scholar 

  18. Armentano-Oller, C., Carrasco, R.C., Corbí-Bellot, A.M., Forcada, M.L., Ginestí-Rosell, M., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A.: Open-source Portuguese-Spanish machine translation. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 50–59. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: On the automatic learning of bilingual resources: Some relevant factors for machine translation. In: Zaverucha, G., da Costa, A.L. (eds.) SBIA 2008. LNCS (LNAI), vol. 5249, pp. 258–267. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  20. Caseli, H.M., Ramisch, C., Nunes, M.G.V., Villavicencio, A.: Alignment-based extraction of multiword expressions. Language Resources and Evaluation (2009) (to appear)

    Google Scholar 

  21. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ramisch, C., de Medeiros Caseli, H., Villavicencio, A., Machado, A., Finatto, M.J. (2010). A Hybrid Approach for Multiword Expression Identification. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds) Computational Processing of the Portuguese Language. PROPOR 2010. Lecture Notes in Computer Science(), vol 6001. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12320-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12320-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12319-1

  • Online ISBN: 978-3-642-12320-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics