Automatic Extraction of Fixed Multiword Expressions

Hore, Campbell; Asahara, Masayuki; Matsumoto, Yūji

doi:10.1007/11562214_50

Campbell Hore²²,
Masayuki Asahara²² &
Yūji Matsumoto²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

International Conference on Natural Language Processing

1649 Accesses

Abstract

Fixed multiword expressions are strings of words which together behave like a single word. This research establishes a method for the automatic extraction of such expressions. Our method involves three stages. In the first, a statistical measure is used to extract candidate bigrams. In the second, we use this list to select occurrences of candidate expressions in a corpus, together with their surrounding contexts. These examples are used as training data for supervised machine learning, resulting in a classifier which can identify target multiword expressions. The final stage is the estimation of the part of speech of each extracted expression based on its context of occurence. Evaluation demonstrated that collocation measures alone are not effective in identifying target expressions. However, when trained on one million examples, the classifier identified target multiword expressions with precision greater than 90%. Part of speech estimation had precision and recall of over 95%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multi-word Expressions: A Novel Computational Approach to Their Bottom-Up Statistical Extraction

The Difficult Identification of Multiworld Expressions: From Decision Criteria to Annotated Corpora

Using a Database of Multiword Expressions in Dependency Parsing

References

Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16, 22–29 (1990)
Google Scholar
Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19, 143–177 (1993); Special Issue on Using Large Corpora: I
Google Scholar
Thanopoulos, A., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of collocation extraction metrics. In: International Conference on Language Resources and Evaluation (LREC-2002), pp. 620–625 (2002)
Google Scholar
Sag, I., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: A pain in the neck for NLP. In: Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), Mexico City, Mexico, CICLING, pp. 1–15 (2002)
Google Scholar
Baldwin, T., Villavicencio, A.: Extracting the unextractable: A case study on verb-particles. In: Roth, D., van den Bosch, A. (eds.) Proceedings of the 6th Conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, pp. 98–104 (2002)
Google Scholar
Villavicencio, A.: Verb-particle constructions and lexical resources. In: Bond, F., Korhonen, A., McCarthy, D., Villavicencio, A. (eds.) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, ACL, pp. 57–64 (2003)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)
MATH Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 61–74 (1993)
Google Scholar
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania (1999)
Google Scholar
Yamada, H., Matsumoto, Y.: Statistical dependency analysis with support vector machines. In: IWPT 2003: 8th International Workshop on Parsing Technologies, pp. 195–206 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192, Japan
Campbell Hore, Masayuki Asahara & Yūji Matsumoto

Authors

Campbell Hore
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Asahara
View author publications
You can also search for this author in PubMed Google Scholar
Yūji Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Language Technology, Macquarie University, 2019, Sydney, NSW, Australia
Robert Dale
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, 119613, Singapore
Jian Su
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hore, C., Asahara, M., Matsumoto, Y. (2005). Automatic Extraction of Fixed Multiword Expressions. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_50

Download citation

DOI: https://doi.org/10.1007/11562214_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics