Unsupervised Learning of P NP P Word Combinations

Galicia-Haro, Sofía N.; Gelbukh, Alexander

doi:10.1007/978-3-540-30586-6_37

Sofía N. Galicia-Haro¹⁷ &
Alexander Gelbukh¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2320 Accesses

Abstract

We evaluate the possibility to learn, in an unsupervised manner, a list of idiomatic word combinations of the type preposition + noun phrase + preposition (P NP P), namely, such groups with three or more simple forms that behave as a whole lexical unit and have semantic and syntactic properties not deducible from the corresponding properties of each simple form, e.g., by means of, in order to, in front of. We show that idiomatic P NP P combinations have some statistical properties distinct from those of usual idiomatic collocations. In particular, we found that most frequent P NP P trigrams tend to be idiomatic. Of other statistical measures, log-likelihood performs almost as good as frequency for detecting idiomatic expressions of this type, while chi-square and point-wise mutual information perform very poor. We experiment on Spanish material.

Work partially supported by Mexican Government (CONACyT, SNI, CGPI-IPN, PIFI-IPN).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Shabd: A psycholinguistic database for Hindi

Article 06 August 2021

The IdiomSearch Experiment: Extracting Phraseology from a Probabilistic Network of Constructions

A Linguistic Approach to English Phrasal Verbs

References

Banerjee, S., Pedersen, T.: The Design, Implementation, and Use of the Ngram Statistics Package. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588. Springer, Heidelberg (2003), http://www.d.umn.edu/~tpederse/nsp.html
Chapter Google Scholar
Degand, L., Bestgen, Y.: Towards automatic retrieval of idioms in French newspaper corpora. Literary and Linguistic Computing 18(3), 249–259 (2003)
Article Google Scholar
Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association. In: Proc. ACL 2001, pp. 188–195 (2001)
Google Scholar
Galicia-Haro, S.N.: Using Electronic Texts for an Annotated Corpus Building. In: 4th Mexican International Conference on Computer Science, ENC 2003, Mexico, pp. 26–33 (2003)
Google Scholar
Justeson, J.S., Katz, S.M.: Technical Terminology: Some Linguistic properties and an algorithm for identification in text. Natural Language Engineering 1, 9–27 (1995)
Article Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)
MATH Google Scholar
Nañez Fernández, E.: Diccionario de construcciones sintácticas del español. Preposiciones. Editorial de la Universidad Autónoma de Madrid (1995)
Google Scholar
Rayson, P., Berridge, D., Francis, B.: Extending the Cochran rule for the comparison of word frequencies between corpora. In: Purnelle, G., et al. (eds.) Le poids des mots: Proc. of 7th International Conf. on Statistical analysis of textual data, JADT 2004 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Sciences, UNAM Universitary City, Mexico City, Mexico
Sofía N. Galicia-Haro
Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Authors

Sofía N. Galicia-Haro
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Galicia-Haro, S.N., Gelbukh, A. (2005). Unsupervised Learning of P NP P Word Combinations. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_37

Download citation

DOI: https://doi.org/10.1007/978-3-540-30586-6_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics