Unsupervised word sense induction using rival penalized competitive learning

https://doi.org/10.1016/j.engappai.2015.02.004Get rights and content

Abstract

Word sense induction (WSI) aims to automatically identify different senses of an ambiguous word from its contexts. It is a nontrivial task to perform WSI in natural language processing because word sense ambiguity is pervasive in linguistic expressions. In this paper, we construct multi-granularity semantic spaces to learn the representations of ambiguous instances, in order to capture richer semantic knowledge during context modeling. In particular, we not only consider the semantic space of words, but the semantic space of word clusters and topics as well. Moreover, to circumvent the difficulty of selecting the number of word senses, we adapt a rival penalized competitive learning method to determine the number of word senses automatically via gradually repelling the redundant sense clusters. We validate the effectiveness of our method on several public WSI datasets and the results show that our method is able to improve the quality of WSI over several competitive baselines.

Introduction

Word sense induction (WSI) is crucial for many natural language processing (NLP) tasks as word sense ambiguity is prevalent in all natural languages. WSI and word sense disambiguation (WSD) are two related techniques for lexical semantic computation. The main distinction between the two techniques is that the former discriminates different senses without relying on a predefined sense inventory, while the latter assumes an ability to access an already known sense list. For discriminating different word senses, each occurrence of an ambiguous word is regarded as an ambiguous instance. WSI is to conduct unsupervised sense clustering among these ambiguous instances, and the number of the resulting clusters is explained as the number of induced word senses. We show an example of WSI of ambiguous word “ball” in Fig. 1.

We believe that WSI methods face two major challenges. First, the contextual semantic is not explored sufficiently when conducting context modeling. In general, shallow lexical features (e.g. unigrams or bigrams of words) surrounded the ambiguous instances that constitute an important ingredient in sense induction. However, such fine-grained semantic features will inevitably suffer from data sparsity problem. More advanced Bayesian methods use topic models such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003) to learn topic distributions of ambiguous instances. Compared with the shallow features, topic features can capture latent topic structure and have more generalization ability in semantic representation. Topic models are able to exploit abstract conceptual structures; however, only using topic models may lose certain amount of unique lexical semantics during context modeling. Based on this, we believe that using contextual features derived from multi-granularity semantic spaces can reflect various aspects of the semantic knowledge of the contexts.

Second, the sense number of ambiguous words cannot be determined appropriately. Many popular clustering methods such as k-means algorithm require the cluster number to be pre-assigned precisely. However, in many practical applications, it becomes impossible to know the exact cluster number in advance, such that these clustering algorithms often result in poor performance (Dehkordi et al., 2009). More recently, the non-parametric Bayesian method (Lau et al., 2012) uses Hierarchical Dirichlet Processes (HDP) (Teh et al., 2006) to learn the number of word senses automatically. However, it tends to induce larger number of word sense when comparing to the gold standard per ambiguous word on SEMEVAL-2010 WSI dataset (Lau et al., 2012). Hence, exploring a word sense clustering algorithm to learn appropriate sense numbers for ambiguous words is also crucial for WSI task.

In this paper, we want to overcome the two challenges of WSI mentioned above. We propose a novel WSI framework that automatically induce word senses for ambiguous words over multi-granularity semantic spaces without relying a pre-assigned number of word sense. In particular, our WSI framework runs in two steps: (1) learning multi-granularity semantic representations for ambiguous instances, and (2) context-based word sense clustering for ambiguous words.

For the first step, our main idea is that discriminating different word senses entails integrating diverse semantic granularities from the contexts. To be specific, we use Vector Space Model (Salton and Buckley, 1988) to learn the semantic representations of ambiguous instances, under the semantic space of words, word clusters and topics. Semantic distances among different semantic granularities are integrated in terms of a concatenation and a linear interpolation strategy (Section 3). For the second step, we adapt a rival penalized competitive learning (RPCL) method to determine the number of word senses automatically by gradually repelling the redundant sense clusters (Section 4). Once our algorithm matches a stopping condition, the centroid of the remaining clusters are considered as the representations of different word senses, and the number of remaining clusters are considered as the sense number induced for the ambiguous words. Fig. 2 summarizes the architecture of our proposed method for WSI.

Our method is able to improve the quality of WSI over several competitive baselines and the induced sense number is close to the gold standard sense. Especially, the main contributions of our work lie in two aspects, including (1) we integrate multi-granularity semantic spaces to represent the ambiguous instances without resorting to any external resources, and (2) instead of being pre-assigned a fixed number of word senses, our framework can automatically determine the sense number of ambiguous words.

The remainder of this paper is organized as follows: Section 2 summarizes and compares related work. Section 3 presents our method on how to learn a multi-granularity semantic spaces representation for each ambiguous instance. Section 4 elaborates the context-based word sense clustering for ambiguous words. Section 5 describes our experiments and shows results with discussions. Finally, Section 6 concludes and outlines future directions.

Section snippets

Related work

In this section, we give an overview of previous methods and the participating systems in the WSI task.

Overview of previous methods in WSI: In general, most of the researches in WSI are based on the Distributional Hypothesis (Harris, 1954), which indicates that words surrounded with similar contexts tend to have similar meanings. Previous methods have exploited various linguistic features such as first and second order context vectors (Purandare and Pedersen, 2004), bigrams and triplets of

Learning multi-granularity semantic space representation

In view of semantic space, we construct three types of semantic space: word, word cluster, and topic. Without loss of generality, let A={ai|i:1n} be a set of ambiguous words and Iij be a ambiguous instance of ambiguous word ai. If ambiguous word ai contains mi ambiguous instances, then we denote it as Ψaimi=j=1miIij. Based on this, we further denote the whole training set of A as ΨAn=i=1nΨaimi. Consequently, IijΨaimiΨAn.

To construct the semantic space, our first response towards the

Word sense clustering without knowing the number of word senses

Our method to conduct the word sense clustering for ambiguous instances is inspired by the RPCL algorithm proposed in Xu et al. (1993). RPCL is able to perform sense clustering via driving redundant sense clusters far away from the input instances, such that the redundant clusters are eliminated automatically.

Without loss of generality, we assume that the ambiguous instances {Iij}j=1mi of ambiguous word ai are from sense clusters {Oik}k=1qi, where qi is the number of the gold standard senses.

Experiments

We conducted a serious of experiments over several public WSI datasets derived from SemEval-2010 (Manandhar et al., 2010), SemEval-2007 (Agirre and Soroa, 2007a) and SemEval-2013 (Navigli and Vannella, 2013) task to evaluate the effectiveness of our proposed WSI framework. We tested our proposed method via the evaluated scripts provided by organizers.

Conclusion and future work

We have presented a novel WSI framework that automatically induces word senses for ambiguous words over multi-granularity semantic spaces in an unsupervised fashion. Our method has exploited word, word cluster and topic representation to integrating multi-granularity semantic information during context modeling. Instead of being pre-assigned a fixed sense number, our method is able to induce the sense number automatically for each target word via gradually repelling the redundant sense clusters.

Acknowledgments

We would like to thank all the referees for their constructive and helpful suggestions on this paper. This work is supported by the Natural Science Foundation of China (Grant nos. 61005052, 61075058 and 61303082), the Key Technologies R&D Program of China (Grant no. 2012BAH14F03), the Fundamental Research Funds for the Central Universities (Grant no. 2010121068), the Natural Science Foundation of Fujian Province, China (Grant no. 2010J01351), the Research Fund for the Doctoral Program of Higher

References (45)

  • P.F. Brown et al.

    Class-based n-gram models of natural language

    Comput. Linguist.

    (1992)
  • Charniak, E., 2013. Naive Bayes Word Sense Induction. In: Proceedings of the 2013 Conference on Empirical Methods in...
  • Chen, P., Ding, W., Bowes, C., Brown, D., 2009. A fully unsupervised word sense disambiguation method using dependency...
  • Cheung, Y.m., 2002. Rival penalization controlled competitive learning for data clustering with unknown cluster number....
  • Van de Cruys, T., Apidianaki, M., 2011. Latent semantic word sense induction and disambiguation. In: Proceedings of the...
  • M.Y. Dehkordi et al.

    A novel hybrid structure for clustering

    Adv. Comput. Sci. Eng.

    (2009)
  • A. Di Marco et al.

    Clustering and diversifying web search results with graph-based word sense induction

    Comput. Linguist.

    (2013)
  • Elshamy, W., Caragea, D., Hsu, W.H., 2010. Ksu kdd: word sense induction by clustering in topic space. In: Proceedings...
  • Z.S. Harris

    Distributional Structure

    Word.

    (1954)
  • Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R., 2006. Ontonotes: the 90% solution. In: Proceedings of...
  • L. Hubert et al.

    Comparing partitions

    J. Classif.

    (1985)
  • Jaccard, P., 1901. Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr....
  • Cited by (5)

    • A novel framework for biomedical entity sense induction

      2018, Journal of Biomedical Informatics
    • Adapted competitive learning on continuous semantic space for word sense induction

      2016, Neurocomputing
      Citation Excerpt :

      Word sense induction (WSI) discriminates different word senses of a polysemous word without relying on a predefined sense inventory. In contrast, word sense disambiguation (WSD) is assumed to access an already known sense list [16]. From this perspective, WSI can be treated as a clustering problem while WSD is a classification one.

    • A Extensive Survey of Rival Penalized Techniques

      2018, Proceedings of the 3rd International Conference on Inventive Computation Technologies, ICICT 2018
    View full text