Abstract
This article presents a comparison of different Word Sense Induction (wsi) clustering algorithms on two novel pseudoword data sets of semantic-similarity and co-occurrence-based word graphs, with a special focus on the detection of homonymic polysemy. We follow the original definition of a pseudoword as the combination of two monosemous terms and their contexts to simulate a polysemous word. The evaluation is performed comparing the algorithm’s output on a pseudoword’s ego word graph (i.e., a graph that represents the pseudoword’s context in the corpus) with the known subdivision given by the components corresponding to the monosemous source words forming the pseudoword. The main contribution of this article is to present a self-sufficient pseudoword-based evaluation framework for wsi graph-based clustering algorithms, thereby defining a new evaluation measure (top2) and a secondary clustering process (hyperclustering). To our knowledge, we are the first to conduct and discuss a large-scale systematic pseudoword evaluation targeting the induction of coarse-grained homonymous word senses across a large number of graph clustering algorithms.




Similar content being viewed by others
Notes
An ego word graph of a word w is a graph that represents the context of w in the corpus; alternatively, it can be seen as the neighbourhood of w in a word graph that globally represents the corpus. See Sect. 5.1.1 for the definition of ego word graph in our framework.
http://wordnetweb.princeton.edu/perl/webwn, Miller (1995).
See for example the results at task 14 of SemEval 2010 (Manandhar et al. 2010), where adjusted mutual information was introduced to correct the bias: https://www.cs.york.ac.uk/semeval2010_WSI/task_14_ranking.html.
In this example a context is informally understood as the lemmatised versions of content words co-occurring with the target word. A formal definition of the kind of context used in our work will be given in Sect. 5.1.1.
If \(\gamma \ne \emptyset \) and/or \(\delta \ne \emptyset \), we are actually considering a non-exhaustive partition or a subpartition, i.e., a collection of disjoint, non-empty subsets whose union is not necessarily the whole set.
Precisely, the implementation found in https://sourceforge.net/p/jobimtext/wiki/Sense_Clustering/ with parameters: -n 200 -N 200.
On this topic cf. Lyons (1968).
The quintiles are the four values that divide a quantity in five parts: in this case, they are the multiples of ca 4.52, i.e., 4.52, 9.04, 13.56 and 18.08.
On this graph-theoretical topic, see e.g., Haynes et al. (1998).
Despite some similarities, our definition of hypergraph is different than the common graph-theoretical concept that goes by the same name, namely that of a graph \(G=(V,E)\) whose edges can be generic subsets of v. See Berge and Minieka (1973) for more details about the subject.
We define the clustering of a set \({\mathcal {S}}\) as a finite collection of non-empty subsets of \({\mathcal {S}}\) whose union is the whole \({\mathcal {S}}\). In this paper, we often assume a clustering to also be a partition, i.e., that the subsets are all disjoint, but for some algorithms like MaxMax this is not always the case.
barque is another word for ship, while pennywhistle is a small, inexpensive flute.
The mean absolute deviation of a data set of observations is the average of the absolute values of the differences between the mean data set value and the observations (Dixon and Massey 1957).
We could normalise the mad score with respect to the number of total clustered elements. However, since the order of our ego graphs is nearly constant, we just take the absolute mean deviations. The same goes for the mean number of clusters.
References
Amigó, E., Gonzalo, J., Artiles, J., & Verdejo, F. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12(4), 461–486.
Bagga, A., Baldwin, B. (1998). Algorithms for scoring coreference chains. In Proceedings of the first international Conference on Language Resources and Evaluation (LREC’98), workshop on linguistic coreference (pp. 563–566). European Language Resources Association, Granada, Spain.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Başkaya, O., & Jurgens, D. (2016). Semi-supervised learning with induced word senses for state of the art word sense disambiguation. Journal of Artificial Intelligence Research, 55, 1025–1058.
Berge, C., & Minieka, E. (1973). Graphs and hypergraphs (Vol. 7). Amsterdam: North-Holland.
Biemann, C. (2006). Chinese whispers: An efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing (pp. 73–80), New York, NY, USA.
Biemann, C., & Quasthoff, U. (2009). Networks generated from natural language text. In N. Ganguly, A. Deutsch & A. Mukherjee (Eds.), Dynamics on and of complex networks: Applications to biology, computer science, and the social sciences (pp. 167–185). Springer.
Biemann, C., & Riedl, M. (2013). Text: Now in 2D! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1), 55–95.
Bordag, S. (2006). Word sense induction: Triplet-based clustering and automatic evaluation. In Proceedings of the 11th conference of the European chapter of the association for computational linguistics (pp. 137–144). EACL, Trento, Italy.
i Cancho, R. F., & Solé, R. (2001). The small world of human language. Proceedings of the Royal Society of London Series B: Biological Sciences, 268(1482), 2261–2265.
Cecchini, F. M. (2017). Graph-based clustering algorithms for word sense induction. Ph.D. thesis, Università degli Studi di Milano-Bicocca.
Cecchini, F. M., Fersini, E. (2015) . Word sense discrimination: A gangplank algorithm. In Proceedings of the second Italian conference on computational linguistics CLiC-it 2015 (pp. 77–81). Trento, Italy.
Cecchini, F. M., Fersini, E., & Messina, E. (2015). Word sense discrimination on tweets: A graph-based approach. In KDIR 2015—Proceedings of the international conference on knowledge discovery and information retrieval (Vol. 1, pp. 138–146). IC3K, Lisbon.
Cover, T., & Thomas, J. (2012 [1991]). Elements of information theory. Wiley, Hoboken, NJ.
De Marneffe, M. C., MacCartney, B., & Manning, C. (2006) . Generating typed dependency parses from phrase structure parses. In Proceedings of the fifth international conference on language resources and evaluation (LREC’06), 2006 (pp. 449–454). European Language Resources Association, Genoa.
De Saussure, F. (1916) . Cours de linguistique générale. Payot&Rivage, Paris, France (1995 [1916]). Critical edition of 1st edition
Di Marco, A., & Navigli, R. (2013). Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics, 39(3), 709–754.
Dixon, W., & Massey, F, Jr. (1957). Introduction to statistical analysis. New York, NY: McGraw-Hill.
van Dongen, S. (2000). Graph clustering by flow simulation. Ph.D. thesis, Universiteit Utrecht
Evert, S. (2004) . The statistics of word cooccurrences: Word pairs and collocations. Ph.D. thesis, Universität Stuttgart
Feld, S. L. (1981). The focused organization of social ties. American Journal of Sociology, 86(5), 1015–1035.
Gale, W., Church, K., & Yarowsky, D. (1992) . Work on statistical methods for word sense disambiguation. In Technical Report of 1992 fall symposium—Probabilistic approaches to natural language, pp. 54–60. AAAI, Cambridge, Massachusetts, USA
Grätzer, G. (2011). Lattice theory: Foundation. New York: Springer.
Harris, Z. (1954). Distributional structure. Word, 10(2–3), 146–162.
Haynes, T. W., Hedetniemi, S., & Slater, P. (1998). Fundamentals of domination in graphs. Boca Raton, FL: CRC Press.
Hope, D., & Keller, B. (2013). MaxMax: A graph-based soft clustering algorithm applied to word sense induction. In Proceedings of the 14th international conference on computational linguistics and intelligent text processing (pp. 368–381). Samos, Greece
Karlberg, M. (1997). Testing transitivity in graphs. Social Networks, 19(4), 325–343.
Kilgarriff, A., Rychlý, P., Smrž, P., & Tugwell, D. (2004). The sketch engine. In Proceedings of the eleventh Euralex Conference (pp. 105–116). Lorient, France.
Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.
Lyons, J. (1968). Introduction to theoretical linguistics. Cambridge: Cambridge University Press.
Manandhar, S., Klapaftis, I., Dligach, D., & Pradhan, S. (2010) . Semeval-2010 task 14: Word sense induction & disambiguation. In Proceedings of the 5th international workshop on semantic evaluation (pp. 63–68). Association for Computational Linguistics, Los Angeles, CA.
Martin, J., & Jurafsky, D. (2000). Speech and language processing. Upper Saddle River, NJ: Pearson.
Miller, G. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11), 39–41.
Nakov, P., & Hearst, M. (2003). Category-based pseudowords. In Companion volume of the proceedings of the human language technology conference of the North American chapter of the association for computational linguistics (HTL-NAACL) 2003—Short Papers (pp. 70–72). Association for Computational Linguistics, Edmonton, Alberta, Canada.
Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys (CSUR), 41(2), 10.
Navigli, R., Litkowski, K., & Hargraves, O. (2007) . Semeval-2007 task 07: Coarse-grained english all-words task. In Proceedings of the 4th international workshop on semantic evaluations (pp. 30–35). Association for Computational Linguistics, Prague.
Otrusina, L., Smrž, P. (2010) . A new approach to pseudoword generation. In Proceedings of the seventh international Conference on language resources and evaluation (LREC’10) (pp. 1195–1199). European Language Resources Association, Valletta.
Parker, R., Graff, D., Kong, J., Chen, K., & Maeda, K. (2011) . English Gigaword, 5th edn. Linguistic Data Consortium, Philadelphia, PA. https://catalog.ldc.upenn.edu/LDC2011T07.
Pilehvar, M. T., & Navigli, R. (2013). Paving the way to a large-scale pseudosense-annotated dataset. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (HTL-NAACL) (pp. 1100–1109). Association for Computational Linguistics, Atlanta, GA.
Pilehvar, M. T., & Navigli, R. (2014). A large-scale pseudoword-based evaluation framework for state-of-the-art word sense disambiguation. Computational Linguistics, 40(4), 837–881.
Richter, M., Quasthoff, U., Hallsteinsdóttir, E., & Biemann, C. (2006) . Exploiting the Leipzig corpora collection. In Proceedings of the fifth Slovenian and first international language technologies conference, IS-LTC ’06 (pp. 68–73). Slovenian Language Technologies Society, Ljubljana.
Riedl, M. (2016) . Unsupervised methods for learning and using semantics of natural language. Ph.D. thesis, Technische Universität Darmstadt
Ruohonen, K. (2013) . Graph theory. Tampereen teknillinen yliopisto (trans: Tamminen, J., Lee, K.-C., & Piché, R.). http://math.tut.fi/~ruohonen/GT_English.pdf. Originally titled Graafiteoria, lecture notes.
Schütze, H. (1992) . Dimensions of meaning. In Proceedings of Supercomputing’92 (pp. 787–796). ACM/IEEE, Minneapolis, MN.
Strehl, A., & Ghosh, J. (2002). Cluster ensembles–A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.
Turney, P., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188.
Véronis, J. (2004). Hyperlex: Lexical cartography for information retrieval. Computer Speech & Language, 18(3), 223–252.
Watts, D., & Strogatz, S. (1998). Collective dynamics of small-world networks. Nature, 393(6684), 440–442.
Widdows, D., & Dorow, B. (2002) . A graph model for unsupervised lexical acquisition. In Proceedings of the 19th international conference on computational linguistics (vol. 1, pp. 1–7). Association for Computational Linguistics, Taipei.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cecchini, F.M., Riedl, M., Fersini, E. et al. A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework. Lang Resources & Evaluation 52, 733–770 (2018). https://doi.org/10.1007/s10579-018-9415-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-018-9415-1