f: Phrase Relatedness Function Using Overlapping Bi-gram Context

Rakib, Md. Rashadul Hasan; Islam, Aminul; Milios, Evangelos

doi:10.1007/978-3-319-34111-8_19

Md. Rashadul Hasan Rakib¹⁵,
Aminul Islam¹⁵ &
Evangelos Milios¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9673))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

1642 Accesses
1 Citations

Abstract

We present an unsupervised phrase relatedness function (f) that has been applied in a Semantic Textual Similarity system (TrWP) of SemEval-2015. The best run of TrWP was ranked 33 among 73 runs. f finds the relatedness strength between two phrases using overlapping bi-gram context extracted from the Google-n-gram corpus. The relatedness strength is the strength of association capturing how similar or dissimilar two phrases are. In order to find the relatedness strength, f applies a sum-ratio (SR) technique based on the statistics of the overlapping n-grams associated with two input phrases. The experimental result from f demonstrates improvement over existing phrase relatedness methods on two standard datasets of 216 phrase-pairs. f does not require any human annotated resource and is independent of the syntactic structure of phrases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use ‘relatedness’ and ‘similarity’ interchangeably in our paper, albeit ‘similarity’ is a special case or a subset of ‘relatedness’.
2.
We use the term Sum-Ratio as the weighted mean of two numbers.
3.
Perform pruning on the bi-gram contexts implies to the pruning of the Google-n-grams from which those contexts are extracted.
4.
We prefer Pearson’s r to Spearman’s \(\rho \) because Agirre et al. [28] stated that Pearson’s r is more informative than Spearman’s \(\rho \). Spearman’s \(\rho \) considers the rank differences while Pearson’s r takes into account the value differences. Moreover, SemEval-2013 [28] used Pearson’s r for evaluation task.
5.
Pearson’s r is not computed using Mitchell and Lapata’s [7] system due to the unavailability of their individual phrase-pair score. Moreover, in an attempt to reproduce Mitchell and Lapata’s [7] method, Hartung and Frank [6] get Spearman’s \(\rho = 0.34\) instead of \(\rho =0.46\) on 108 adjective-noun pairs.

References

Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. In: Proceedings of the Eighth International Conference on World Wide Web, WWW 1999, New York, USA, pp. 1361–1374 (1999)
Google Scholar
Chim, H., Deng, X.: Efficient phrase-based document similarity for clustering. IEEE Trans. Knowl. Data Eng. 20(9), 1217–1229 (2008)
Article Google Scholar
Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)
Google Scholar
Hammouda, K., Kamel, M.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16(10), 1279–1296 (2004)
Article Google Scholar
Pera, M.S., Ng, Y.K.: Spamed: a spam e-mail detection approach based on phrase similarity. J. Am. Soc. Inf. Sci. Technol. 60(2), 393–409 (2009)
Article Google Scholar
Hartung, M., Frank, A.: Assessing interpretable, attribute-related meaning representations for adjective-noun phrases in a similarity prediction task. In: Proceedings of the GEMS 2011 Workshop, Stroudsburg, PA, USA, pp. 52–61(2011)
Google Scholar
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)
Article Google Scholar
Baroni, M.: Composition in distributional semantics. Lang. Linguist. Compass 7(10), 511–522 (2013)
Article Google Scholar
Annesi, P., Storch, V., Basili, R.: Space projections as distributional models for semantic composition. In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 323–335. Springer, Heidelberg (2012)
Chapter Google Scholar
Han, L., Kashyap, A.L., Finin, T., Mayfield, J., Weese, J.: UMBC_EBIQUITY-CORE: semantic textual similarity systems. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics, June 2013
Google Scholar
Tsatsaronis, G., Varlamis, I., Vazirgiannis, M., Nørvåg, K.: Omiotis: a thesaurus-based measure of text relatedness. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 742–745. Springer, Heidelberg (2009)
Chapter Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: A web search engine-based approach to measure semantic similarity between words. IEEE Trans. Knowl. Data Eng. 23(7), 977–990 (2011)
Article Google Scholar
Cilibrasi, R.L., Vitanyi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)
Article Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth ICML, ICML 1998, San Francisco, CA, USA, pp. 296–304 (1998)
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)
MATH Google Scholar
Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, p. 491. Springer, Heidelberg (2001)
Chapter Google Scholar
Rakib, M.R.H., Islam, A., Milios, E.: TrWP: text relatedness using word and phrase relatedness. In: Proceedings of the SemEval 2015, Colorado, pp. 90–95 (2015)
Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Int. Res. 37(1), 141–188 (2010)
MathSciNet MATH Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, ACL 1998, pp. 768–774 (1998)
Google Scholar
Brants, T., Franz, A.: Web 1T 5-gram corpus version 1.1. Linguistic Data Consortium (2006)
Google Scholar
Reddy, S., Klapaftis, I., McCarthy, D., Manandhar, S.: Dynamic and static prototype vectors for semantic composition. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, Thailand, pp. 705–713, November 2011
Google Scholar
Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Vilares, M., Ribadas, F.J., Vilares, J.: Phrase similarity through the edit distance. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 306–317. Springer, Heidelberg (2004)
Chapter Google Scholar
Islam, A., Milios, E., Kešelj, V.: Comparing word relatedness measures based on google-n-grams. In: COLING (Posters), pp. 495–506 (2012)
Google Scholar
Gracia, J., Trillo, R., Espinoza, M., Mena, E.: Querying the web: a multiontology disambiguation method. In: Proceedings of the 6th International Conference on Web Engineering, ICWE 2006, pp. 241–248. ACM, New York (2006)
Google Scholar
Bohm, G., Zech, G.: Introduction to statistics and data analysis for physicists. DESY (2010)
Google Scholar
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: *SEM 2013 shared task: semantic textual similarity. In: Second Joint Conference on Lexical and Computational Semantics, Atlanta, Georgia, USA, pp. 32–43, June 2013
Google Scholar
Zou, G.Y.: Toward using confidence intervals to compare correlations. Psychol. Methods 12(4), 399–413 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, Halifax, Canada
Md. Rashadul Hasan Rakib, Aminul Islam & Evangelos Milios

Authors

Md. Rashadul Hasan Rakib
View author publications
You can also search for this author in PubMed Google Scholar
Aminul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Milios
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Rashadul Hasan Rakib .

Editor information

Editors and Affiliations

Lakehead University, Thunder Bay, Ontario, Canada
Richard Khoury
National Research Council Canada , Ottawa, Canada
Christopher Drummond

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rakib, M.R.H., Islam, A., Milios, E. (2016). f: Phrase Relatedness Function Using Overlapping Bi-gram Context. In: Khoury, R., Drummond, C. (eds) Advances in Artificial Intelligence. Canadian AI 2016. Lecture Notes in Computer Science(), vol 9673. Springer, Cham. https://doi.org/10.1007/978-3-319-34111-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-34111-8_19
Published: 13 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34110-1
Online ISBN: 978-3-319-34111-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics