Various Criteria of Collocation Cohesion in Internet: Comparison of Resolving Power

Bolshakov, Igor A.; Bolshakova, Elena I.; Kotlyarov, Alexey P.; Gelbukh, Alexander

doi:10.1007/978-3-540-78135-6_6

Igor A. Bolshakov¹,
Elena I. Bolshakova²,
Alexey P. Kotlyarov¹ &
…
Alexander Gelbukh²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1488 Accesses
2 Citations

Abstract

For extracting collocations from the Internet, it is necessary to numerically estimate the cohesion between potential collocates. Mutual Information cohesion measure (MI) based on numbers of collocate occurring closely together (N ₁₂) and apart (N ₁, N ₂) is well known, but the Web page statistics deprives MI of its statistical validity. We propose a family of different measures that depend on N ₁, N ₂ and N ₁₂ in a similar monotonic way and possess the scalability feature of MI. We apply the new criteria for a collection of N ₁, N ₂, and N ₁₂ obtained from AltaVista for links between a few tens of English nouns and several hundreds of their modifiers taken from Oxford Collocations Dictionary. The ‘noun–its own adjective’ pairs are true collocations and their measure values form one distribution. The ‘noun–alien adjective’ pairs are false collocations and their measure values form another distribution. The discriminating threshold is searched for to minimize the sum of probabilities for errors of two possible types. The resolving power of a criterion is equal to the minimum of the sum. The best criterion delivering minimum minimorum is found.

Work done under partial support of Mexican Government (CONACyT, SNI, CGEPI-IPN) and Russian Foundation of Fundamental Research (grant 06-01-00571).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bolshakov, I.A., Bolshakova, E.I.: Measurements of Lexico-Syntactic Cohesion by means of Internet. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS (LNAI), vol. 3789, pp. 790–799. Springer, Heidelberg (2005)
Chapter Google Scholar
Bolshakova, E.I., Bolshakov, I.A., Kotlyarov, A.P.: Experiments in Detection and Correction of Russian Malapropisms by means of the Web. International Journal on Information Theories & Applications 12(2), 141–149 (2005)
Google Scholar
Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Google Scholar
Evert, S., Krenn, B.: Methods for the qualitative evaluation of lexical association measures. In: Proc. 39th Meeting of the ACL 2001, pp. 188–195 (2001)
Google Scholar
Wu, H., Zhou, M.: Synonymous Collocation Extraction Using Translation Information, http://acl.ldc.upenn.edu/P/P03/P03-1016.pdf
Ikehara, S., Shirai, S., Uchino, H.: A statistical method for extracting uninterrupted and interrupted collocations from very large corpora. In: Proc. COLING 1996 Conference, pp. 574–579 (1996)
Google Scholar
Keller, F., Lapata, M.: Using the Web to Obtain Frequencies for Unseen Bigram. Computational linguistics 29(3), 459–484 (2003)
Article Google Scholar
Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational linguistics 29(3), 333–347 (2003)
Article MathSciNet Google Scholar
Krenn, B., Evert, S.: Can we do better than frequency? A case study on extracting pp-verb collocations. In: Proc. ACL Workshop on Collocations (2001)
Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Oxford Collocations Dictionary for Students of English. Oxford University Press (2003)
Google Scholar
Pearce, D.: Synonymy in collocation extraction. In: Proc. Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations. NAACL 2001, Pittsburgh, PA (2001), http://citeseer.ist.psu.edu/pearce01synonymy.html
Xu, R., Lu, Q.: Improving collocation extraction by using syntactic patterns. In: Proc. IEEE Int. Conf. Natural Language Processing and Knowledge Engineering, IEEE NLP-KE apos.05, pp. 52–57 (2005)
Google Scholar
Seretan, V., Wehrli, E.: Accurate collocation extraction using a multilingual parser. In: Proc. 21st Int. Conf. Computational Linguistics and 44th Annual Meeting of the ACL, Sydney, Australia, pp. 953–960 (2006)
Google Scholar
Seretan, V., Wehrli, E.: Multilingual collocation extraction: Issues and solutions. In: Proc. Workshop on Multilingual Language Resources and Interoperability, Sydney, Australia, pp. 40–49 (2006)
Google Scholar
Seretan, V., Nerima, L., Wehrli, E.: A tool for multi-word collocation extraction and visualization in multilingual corpora. In: Proc. 11th EURALEX International Congress EURALEX 2004, Lorient, France, pp. 755–766 (2004)
Google Scholar
Smadja, F.: Retreiving Collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1990)
Google Scholar
Smadja, F.A., McKeown, K.R.: Automatically extracting and representing collocations for language generation. In: Proc. 28th Meeting of the ACL, pp. 252–259 (1990)
Google Scholar
Wermter, J., Hahn, U.: Collocation Extraction Based on Modifiability Statistics. In: Proc. 20th Int. Conf. Computational Linguistics COLING 2004, pp. 980–986 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico
Igor A. Bolshakov & Alexey P. Kotlyarov
Faculty of Computational Mathematics and Cybernetics, Moscow State Lomonosov University, Moscow, Russia
Elena I. Bolshakova & Alexander Gelbukh

Authors

Igor A. Bolshakov
View author publications
You can also search for this author in PubMed Google Scholar
Elena I. Bolshakova
View author publications
You can also search for this author in PubMed Google Scholar
Alexey P. Kotlyarov
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bolshakov, I.A., Bolshakova, E.I., Kotlyarov, A.P., Gelbukh, A. (2008). Various Criteria of Collocation Cohesion in Internet: Comparison of Resolving Power. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-78135-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics