Skip to main content

A Rapid Method to Extract Multiword Expressions with Statistic Measures and Linguistic Rules

  • Conference paper
Web Information Systems and Mining (WISM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6988))

Included in the following conference series:

  • 1318 Accesses

Abstract

Multiword Expressions (MWEs) have been the bottleneck in NLP. Particularly, the resource of fixed MWEs can improve the performance of tasks and implications of NLP. Due to complex characters of MWEs, it is hard to make difference between fixed MWEs and unfixed MWEs. This paper puts forwards an approach to extract fixed MWEs rapidly. First the definition of fixed MWEs is given. Features contributing to determinate fixed MWEs are considered both in statistic measures and in linguistic information. We extract fixed MWEs in the frame of multi-features and do manual evaluation. Experiment shows that the approach is effective. Our job can provide a desired list of fixed MWEs for NLP implication.

This research has been funded by Taiyuan University of Technology (Item number: 900103- 03010255 and 900103-03020632).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E.: Grammar of Spoken and Written English. Longman, Harlow (1999)

    Google Scholar 

  3. Jackendoff, R.: The Architecture of the Language Faculty, Cambridge (1997)

    Google Scholar 

  4. Baldwin, T., Bender, E.M., Flickinger, D., Kim, A., Oepen, S.: Road-testing the English Resource Grammar over the British National Corpus. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 2047–2050 (2004)

    Google Scholar 

  5. Caseli, H.M., Ramisch, C., Nunes, M.G.V., Villavicencio, A.: Alignment-based extraction of multiword expressions. Language Resources and Evaluation (2009) (to appear)

    Google Scholar 

  6. Moon, R.: Fixed Expressions and Idioms in English: A Corpus-Based Approach. Clarendom Press, Oxford (1998)

    Google Scholar 

  7. Piao, S.S.L., Sun, G., Rayson, P., Yuan, Q.: Automatic Extraction of Chinese Multiword Expressions with a Statistical Tool. In: Proceedings of the Workshop on Multiword expressions in a Multilingual Context (EACL 2006), Trento, Italy, pp. 17–24 (April 2006)

    Google Scholar 

  8. Zhang, Y., Kordoni, V., Villavicencio, A., Idiart, M.: Automated Multiword Expression Prediction for Grammar Engineering. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 36–44. Association for Computational Linguistics, Sydney (July 2006)

    Chapter  Google Scholar 

  9. Bannard, C.: A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In: Proceedings of the ACL Workshop on A Broader Perspective on Multiword Expressions, pp. 1–8 (2007)

    Google Scholar 

  10. Baldwin, T., Villavicencio, A.: Extracting the Unextractable: A Case Study on Verb-particles. In: Proceedings of the 6th Conference on Natural Language Learning (CoNLL 2002), Taipei, Taiwan, pp. 98–104 (2002)

    Google Scholar 

  11. Van de Cruys, T., Moirón, B.V.: Semantics-based multiword expression extraction. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp. 25–32 (2007)

    Google Scholar 

  12. Duan, J., Zhang, M., Tong, L., Guo, F.: A Hybrid Approach to Improve Bilingual Multiword Expression Extraction. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 541–547. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Ren, Z., Lu, Y., Cao, J., Liu, Q., Huang, Y.: Proceedings of the 2009 Workshop on Multiword Expressions, ACL-IJCNLP 2009, pp. 47–54. Suntec, Singapore (2009)

    Google Scholar 

  14. Villavicencio, A., Kordoni, V., Zhang, Y., MarcoIdiart, Ramisch, C.: Validation and Evaluation of Automatically, Acquired Multiword Expressions for Grammar Engineering. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, June 2007, pp. 1034–1043 (2007)

    Google Scholar 

  15. Pearce, D.: A comparative evaluation of collocation extraction techniques. In: Proc. of the 3rd International Conference on Language Resources and Evaluation (LREC 2002), Las Palmas, Canary Islands, pp. 1530–1536 (2002)

    Google Scholar 

  16. Pecina, P.: Lexical association measures and collocation extraction. Language Resources and Evaluation 44, 137–158 (2010)

    Article  Google Scholar 

  17. Hoang, H.H., Kim, S.N., Kan, M.-Y.: A Re-examination of Lexical Association Measures. In: Proceedings of the 2009 Workshop on Multiword Expressions, ACL-IJCNLP 2009, Suntec, Singapore, pp. 31–39 (2009)

    Google Scholar 

  18. Davidov, D., Rappoport, A.: Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency words. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney, pp. 297–304 (July 2006)

    Google Scholar 

  19. Jackendoff, R.: The Architecture of the Language Faculty, Cambridge (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, L., Liu, R. (2011). A Rapid Method to Extract Multiword Expressions with Statistic Measures and Linguistic Rules. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23982-3_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23981-6

  • Online ISBN: 978-3-642-23982-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics