A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework

Cecchini, Flavio Massimiliano; Riedl, Martin; Fersini, Elisabetta; Biemann, Chris

doi:10.1007/s10579-018-9415-1

A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework

Original Paper
Published: 24 March 2018

Volume 52, pages 733–770, (2018)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Flavio Massimiliano Cecchini ORCID: orcid.org/0000-0001-9029-1822¹,
Martin Riedl²,
Elisabetta Fersini¹ &
…
Chris Biemann²

529 Accesses
Explore all metrics

Abstract

This article presents a comparison of different Word Sense Induction (wsi) clustering algorithms on two novel pseudoword data sets of semantic-similarity and co-occurrence-based word graphs, with a special focus on the detection of homonymic polysemy. We follow the original definition of a pseudoword as the combination of two monosemous terms and their contexts to simulate a polysemous word. The evaluation is performed comparing the algorithm’s output on a pseudoword’s ego word graph (i.e., a graph that represents the pseudoword’s context in the corpus) with the known subdivision given by the components corresponding to the monosemous source words forming the pseudoword. The main contribution of this article is to present a self-sufficient pseudoword-based evaluation framework for wsi graph-based clustering algorithms, thereby defining a new evaluation measure (top2) and a secondary clustering process (hyperclustering). To our knowledge, we are the first to conduct and discuss a large-scale systematic pseudoword evaluation targeting the induction of coarse-grained homonymous word senses across a large number of graph clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-objective Word Sense Induction Using Content and Interlink Connections

Context-Based Text-Graph Embeddings in Word-Sense Induction Tasks

Path and Information Content-Based Structural Word Sense Disambiguation

Notes

An ego word graph of a word w is a graph that represents the context of w in the corpus; alternatively, it can be seen as the neighbourhood of w in a word graph that globally represents the corpus. See Sect. 5.1.1 for the definition of ego word graph in our framework.
http://wordnetweb.princeton.edu/perl/webwn, Miller (1995).
See for example the results at task 14 of SemEval 2010 (Manandhar et al. 2010), where adjusted mutual information was introduced to correct the bias: https://www.cs.york.ac.uk/semeval2010_WSI/task_14_ranking.html.
In this example a context is informally understood as the lemmatised versions of content words co-occurring with the target word. A formal definition of the kind of context used in our work will be given in Sect. 5.1.1.
If $\gamma \ne \emptyset $ and/or $\delta \ne \emptyset $, we are actually considering a non-exhaustive partition or a subpartition, i.e., a collection of disjoint, non-empty subsets whose union is not necessarily the whole set.
http://corpora.uni-leipzig.de.
https://sourceforge.net/p/jobimtext/wiki/Home/.
Precisely, the implementation found in https://sourceforge.net/p/jobimtext/wiki/Sense_Clustering/ with parameters: -n 200 -N 200.
On this topic cf. Lyons (1968).
The quintiles are the four values that divide a quantity in five parts: in this case, they are the multiples of ca 4.52, i.e., 4.52, 9.04, 13.56 and 18.08.
On this graph-theoretical topic, see e.g., Haynes et al. (1998).
Despite some similarities, our definition of hypergraph is different than the common graph-theoretical concept that goes by the same name, namely that of a graph $G=(V,E)$ whose edges can be generic subsets of v. See Berge and Minieka (1973) for more details about the subject.
We define the clustering of a set ${\mathcal {S}}$ as a finite collection of non-empty subsets of ${\mathcal {S}}$ whose union is the whole ${\mathcal {S}}$. In this paper, we often assume a clustering to also be a partition, i.e., that the subsets are all disjoint, but for some algorithms like MaxMax this is not always the case.
barque is another word for ship, while pennywhistle is a small, inexpensive flute.
A clustering coefficient of a node or a graph can be defined in different ways. The first definition of a local clustering coefficient is found in Watts and Strogatz (1998); a global one based on triangles is in Feld (1981); Karlberg (1997).
The mean absolute deviation of a data set of observations is the average of the absolute values of the differences between the mean data set value and the observations (Dixon and Massey 1957).
We could normalise the mad score with respect to the number of total clustered elements. However, since the order of our ego graphs is nearly constant, we just take the absolute mean deviations. The same goes for the mean number of clusters.

References

Amigó, E., Gonzalo, J., Artiles, J., & Verdejo, F. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12(4), 461–486.
Article Google Scholar
Bagga, A., Baldwin, B. (1998). Algorithms for scoring coreference chains. In Proceedings of the first international Conference on Language Resources and Evaluation (LREC’98), workshop on linguistic coreference (pp. 563–566). European Language Resources Association, Granada, Spain.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Article Google Scholar
Başkaya, O., & Jurgens, D. (2016). Semi-supervised learning with induced word senses for state of the art word sense disambiguation. Journal of Artificial Intelligence Research, 55, 1025–1058.
Article Google Scholar
Berge, C., & Minieka, E. (1973). Graphs and hypergraphs (Vol. 7). Amsterdam: North-Holland.
Google Scholar
Biemann, C. (2006). Chinese whispers: An efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing (pp. 73–80), New York, NY, USA.
Biemann, C., & Quasthoff, U. (2009). Networks generated from natural language text. In N. Ganguly, A. Deutsch & A. Mukherjee (Eds.), Dynamics on and of complex networks: Applications to biology, computer science, and the social sciences (pp. 167–185). Springer.
Biemann, C., & Riedl, M. (2013). Text: Now in 2D! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1), 55–95.
Article Google Scholar
Bordag, S. (2006). Word sense induction: Triplet-based clustering and automatic evaluation. In Proceedings of the 11th conference of the European chapter of the association for computational linguistics (pp. 137–144). EACL, Trento, Italy.
i Cancho, R. F., & Solé, R. (2001). The small world of human language. Proceedings of the Royal Society of London Series B: Biological Sciences, 268(1482), 2261–2265.
Article Google Scholar
Cecchini, F. M. (2017). Graph-based clustering algorithms for word sense induction. Ph.D. thesis, Università degli Studi di Milano-Bicocca.
Cecchini, F. M., Fersini, E. (2015) . Word sense discrimination: A gangplank algorithm. In Proceedings of the second Italian conference on computational linguistics CLiC-it 2015 (pp. 77–81). Trento, Italy.
Cecchini, F. M., Fersini, E., & Messina, E. (2015). Word sense discrimination on tweets: A graph-based approach. In KDIR 2015—Proceedings of the international conference on knowledge discovery and information retrieval (Vol. 1, pp. 138–146). IC3K, Lisbon.
Cover, T., & Thomas, J. (2012 [1991]). Elements of information theory. Wiley, Hoboken, NJ.
De Marneffe, M. C., MacCartney, B., & Manning, C. (2006) . Generating typed dependency parses from phrase structure parses. In Proceedings of the fifth international conference on language resources and evaluation (LREC’06), 2006 (pp. 449–454). European Language Resources Association, Genoa.
De Saussure, F. (1916) . Cours de linguistique générale. Payot&Rivage, Paris, France (1995 [1916]). Critical edition of 1st edition
Di Marco, A., & Navigli, R. (2013). Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics, 39(3), 709–754.
Article Google Scholar
Dixon, W., & Massey, F, Jr. (1957). Introduction to statistical analysis. New York, NY: McGraw-Hill.
Google Scholar
van Dongen, S. (2000). Graph clustering by flow simulation. Ph.D. thesis, Universiteit Utrecht
Evert, S. (2004) . The statistics of word cooccurrences: Word pairs and collocations. Ph.D. thesis, Universität Stuttgart
Feld, S. L. (1981). The focused organization of social ties. American Journal of Sociology, 86(5), 1015–1035.
Article Google Scholar
Gale, W., Church, K., & Yarowsky, D. (1992) . Work on statistical methods for word sense disambiguation. In Technical Report of 1992 fall symposium—Probabilistic approaches to natural language, pp. 54–60. AAAI, Cambridge, Massachusetts, USA
Grätzer, G. (2011). Lattice theory: Foundation. New York: Springer.
Book Google Scholar
Harris, Z. (1954). Distributional structure. Word, 10(2–3), 146–162.
Article Google Scholar
Haynes, T. W., Hedetniemi, S., & Slater, P. (1998). Fundamentals of domination in graphs. Boca Raton, FL: CRC Press.
Google Scholar
Hope, D., & Keller, B. (2013). MaxMax: A graph-based soft clustering algorithm applied to word sense induction. In Proceedings of the 14th international conference on computational linguistics and intelligent text processing (pp. 368–381). Samos, Greece
Karlberg, M. (1997). Testing transitivity in graphs. Social Networks, 19(4), 325–343.
Article Google Scholar
Kilgarriff, A., Rychlý, P., Smrž, P., & Tugwell, D. (2004). The sketch engine. In Proceedings of the eleventh Euralex Conference (pp. 105–116). Lorient, France.
Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.
Article Google Scholar
Lyons, J. (1968). Introduction to theoretical linguistics. Cambridge: Cambridge University Press.
Book Google Scholar
Manandhar, S., Klapaftis, I., Dligach, D., & Pradhan, S. (2010) . Semeval-2010 task 14: Word sense induction & disambiguation. In Proceedings of the 5th international workshop on semantic evaluation (pp. 63–68). Association for Computational Linguistics, Los Angeles, CA.
Martin, J., & Jurafsky, D. (2000). Speech and language processing. Upper Saddle River, NJ: Pearson.
Google Scholar
Miller, G. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11), 39–41.
Article Google Scholar
Nakov, P., & Hearst, M. (2003). Category-based pseudowords. In Companion volume of the proceedings of the human language technology conference of the North American chapter of the association for computational linguistics (HTL-NAACL) 2003—Short Papers (pp. 70–72). Association for Computational Linguistics, Edmonton, Alberta, Canada.
Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys (CSUR), 41(2), 10.
Article Google Scholar
Navigli, R., Litkowski, K., & Hargraves, O. (2007) . Semeval-2007 task 07: Coarse-grained english all-words task. In Proceedings of the 4th international workshop on semantic evaluations (pp. 30–35). Association for Computational Linguistics, Prague.
Otrusina, L., Smrž, P. (2010) . A new approach to pseudoword generation. In Proceedings of the seventh international Conference on language resources and evaluation (LREC’10) (pp. 1195–1199). European Language Resources Association, Valletta.
Parker, R., Graff, D., Kong, J., Chen, K., & Maeda, K. (2011) . English Gigaword, 5th edn. Linguistic Data Consortium, Philadelphia, PA. https://catalog.ldc.upenn.edu/LDC2011T07.
Pilehvar, M. T., & Navigli, R. (2013). Paving the way to a large-scale pseudosense-annotated dataset. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (HTL-NAACL) (pp. 1100–1109). Association for Computational Linguistics, Atlanta, GA.
Pilehvar, M. T., & Navigli, R. (2014). A large-scale pseudoword-based evaluation framework for state-of-the-art word sense disambiguation. Computational Linguistics, 40(4), 837–881.
Article Google Scholar
Richter, M., Quasthoff, U., Hallsteinsdóttir, E., & Biemann, C. (2006) . Exploiting the Leipzig corpora collection. In Proceedings of the fifth Slovenian and first international language technologies conference, IS-LTC ’06 (pp. 68–73). Slovenian Language Technologies Society, Ljubljana.
Riedl, M. (2016) . Unsupervised methods for learning and using semantics of natural language. Ph.D. thesis, Technische Universität Darmstadt
Ruohonen, K. (2013) . Graph theory. Tampereen teknillinen yliopisto (trans: Tamminen, J., Lee, K.-C., & Piché, R.). http://math.tut.fi/~ruohonen/GT_English.pdf. Originally titled Graafiteoria, lecture notes.
Schütze, H. (1992) . Dimensions of meaning. In Proceedings of Supercomputing’92 (pp. 787–796). ACM/IEEE, Minneapolis, MN.
Strehl, A., & Ghosh, J. (2002). Cluster ensembles–A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.
Google Scholar
Turney, P., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188.
Article Google Scholar
Véronis, J. (2004). Hyperlex: Lexical cartography for information retrieval. Computer Speech & Language, 18(3), 223–252.
Article Google Scholar
Watts, D., & Strogatz, S. (1998). Collective dynamics of small-world networks. Nature, 393(6684), 440–442.
Article Google Scholar
Widdows, D., & Dorow, B. (2002) . A graph model for unsupervised lexical acquisition. In Proceedings of the 19th international conference on computational linguistics (vol. 1, pp. 1–7). Association for Computational Linguistics, Taipei.

Download references

Author information

Authors and Affiliations

DISCo, Università degli Studi di Milano - Bicocca, Viale Sarca 336, Ed. U14, 20126, Milan, Italy
Flavio Massimiliano Cecchini & Elisabetta Fersini
Informatikum, Universität Hamburg, Vogt-Kölln-Straße 30, 22527, Hamburg, Germany
Martin Riedl & Chris Biemann

Authors

Flavio Massimiliano Cecchini
View author publications
You can also search for this author inPubMed Google Scholar
Martin Riedl
View author publications
You can also search for this author inPubMed Google Scholar
Elisabetta Fersini
View author publications
You can also search for this author inPubMed Google Scholar
Chris Biemann
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Flavio Massimiliano Cecchini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cecchini, F.M., Riedl, M., Fersini, E. et al. A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework. Lang Resources & Evaluation 52, 733–770 (2018). https://doi.org/10.1007/s10579-018-9415-1

Download citation

Published: 24 March 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10579-018-9415-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-objective Word Sense Induction Using Content and Interlink Connections

Context-Based Text-Graph Embeddings in Word-Sense Induction Tasks

Path and Information Content-Based Structural Word Sense Disambiguation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now