Unsupervised word sense induction using rival penalized competitive learning
Introduction
Word sense induction (WSI) is crucial for many natural language processing (NLP) tasks as word sense ambiguity is prevalent in all natural languages. WSI and word sense disambiguation (WSD) are two related techniques for lexical semantic computation. The main distinction between the two techniques is that the former discriminates different senses without relying on a predefined sense inventory, while the latter assumes an ability to access an already known sense list. For discriminating different word senses, each occurrence of an ambiguous word is regarded as an ambiguous instance. WSI is to conduct unsupervised sense clustering among these ambiguous instances, and the number of the resulting clusters is explained as the number of induced word senses. We show an example of WSI of ambiguous word “ball” in Fig. 1.
We believe that WSI methods face two major challenges. First, the contextual semantic is not explored sufficiently when conducting context modeling. In general, shallow lexical features (e.g. unigrams or bigrams of words) surrounded the ambiguous instances that constitute an important ingredient in sense induction. However, such fine-grained semantic features will inevitably suffer from data sparsity problem. More advanced Bayesian methods use topic models such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003) to learn topic distributions of ambiguous instances. Compared with the shallow features, topic features can capture latent topic structure and have more generalization ability in semantic representation. Topic models are able to exploit abstract conceptual structures; however, only using topic models may lose certain amount of unique lexical semantics during context modeling. Based on this, we believe that using contextual features derived from multi-granularity semantic spaces can reflect various aspects of the semantic knowledge of the contexts.
Second, the sense number of ambiguous words cannot be determined appropriately. Many popular clustering methods such as k-means algorithm require the cluster number to be pre-assigned precisely. However, in many practical applications, it becomes impossible to know the exact cluster number in advance, such that these clustering algorithms often result in poor performance (Dehkordi et al., 2009). More recently, the non-parametric Bayesian method (Lau et al., 2012) uses Hierarchical Dirichlet Processes (HDP) (Teh et al., 2006) to learn the number of word senses automatically. However, it tends to induce larger number of word sense when comparing to the gold standard per ambiguous word on SEMEVAL-2010 WSI dataset (Lau et al., 2012). Hence, exploring a word sense clustering algorithm to learn appropriate sense numbers for ambiguous words is also crucial for WSI task.
In this paper, we want to overcome the two challenges of WSI mentioned above. We propose a novel WSI framework that automatically induce word senses for ambiguous words over multi-granularity semantic spaces without relying a pre-assigned number of word sense. In particular, our WSI framework runs in two steps: (1) learning multi-granularity semantic representations for ambiguous instances, and (2) context-based word sense clustering for ambiguous words.
For the first step, our main idea is that discriminating different word senses entails integrating diverse semantic granularities from the contexts. To be specific, we use Vector Space Model (Salton and Buckley, 1988) to learn the semantic representations of ambiguous instances, under the semantic space of words, word clusters and topics. Semantic distances among different semantic granularities are integrated in terms of a concatenation and a linear interpolation strategy (Section 3). For the second step, we adapt a rival penalized competitive learning (RPCL) method to determine the number of word senses automatically by gradually repelling the redundant sense clusters (Section 4). Once our algorithm matches a stopping condition, the centroid of the remaining clusters are considered as the representations of different word senses, and the number of remaining clusters are considered as the sense number induced for the ambiguous words. Fig. 2 summarizes the architecture of our proposed method for WSI.
Our method is able to improve the quality of WSI over several competitive baselines and the induced sense number is close to the gold standard sense. Especially, the main contributions of our work lie in two aspects, including (1) we integrate multi-granularity semantic spaces to represent the ambiguous instances without resorting to any external resources, and (2) instead of being pre-assigned a fixed number of word senses, our framework can automatically determine the sense number of ambiguous words.
The remainder of this paper is organized as follows: Section 2 summarizes and compares related work. Section 3 presents our method on how to learn a multi-granularity semantic spaces representation for each ambiguous instance. Section 4 elaborates the context-based word sense clustering for ambiguous words. Section 5 describes our experiments and shows results with discussions. Finally, Section 6 concludes and outlines future directions.
Section snippets
Related work
In this section, we give an overview of previous methods and the participating systems in the WSI task.
Overview of previous methods in WSI: In general, most of the researches in WSI are based on the Distributional Hypothesis (Harris, 1954), which indicates that words surrounded with similar contexts tend to have similar meanings. Previous methods have exploited various linguistic features such as first and second order context vectors (Purandare and Pedersen, 2004), bigrams and triplets of
Learning multi-granularity semantic space representation
In view of semantic space, we construct three types of semantic space: word, word cluster, and topic. Without loss of generality, let be a set of ambiguous words and Iij be a ambiguous instance of ambiguous word ai. If ambiguous word ai contains mi ambiguous instances, then we denote it as . Based on this, we further denote the whole training set of A as . Consequently, .
To construct the semantic space, our first response towards the
Word sense clustering without knowing the number of word senses
Our method to conduct the word sense clustering for ambiguous instances is inspired by the RPCL algorithm proposed in Xu et al. (1993). RPCL is able to perform sense clustering via driving redundant sense clusters far away from the input instances, such that the redundant clusters are eliminated automatically.
Without loss of generality, we assume that the ambiguous instances of ambiguous word ai are from sense clusters , where qi is the number of the gold standard senses.
Experiments
We conducted a serious of experiments over several public WSI datasets derived from SemEval-2010 (Manandhar et al., 2010), SemEval-2007 (Agirre and Soroa, 2007a) and SemEval-2013 (Navigli and Vannella, 2013) task to evaluate the effectiveness of our proposed WSI framework. We tested our proposed method via the evaluated scripts provided by organizers.
Conclusion and future work
We have presented a novel WSI framework that automatically induces word senses for ambiguous words over multi-granularity semantic spaces in an unsupervised fashion. Our method has exploited word, word cluster and topic representation to integrating multi-granularity semantic information during context modeling. Instead of being pre-assigned a fixed sense number, our method is able to induce the sense number automatically for each target word via gradually repelling the redundant sense clusters.
Acknowledgments
We would like to thank all the referees for their constructive and helpful suggestions on this paper. This work is supported by the Natural Science Foundation of China (Grant nos. 61005052, 61075058 and 61303082), the Key Technologies R&D Program of China (Grant no. 2012BAH14F03), the Fundamental Research Funds for the Central Universities (Grant no. 2010121068), the Natural Science Foundation of Fujian Province, China (Grant no. 2010J01351), the Research Fund for the Doctoral Program of Higher
References (45)
- et al.
Competitive learning algorithms for vector quantization
Neural Netw.
(1990) - et al.
Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number
Pattern Recognit.
(2013) - et al.
Term-weighting approaches in automatic text retrieval
Inf. Process. Manag.
(1988) - Agirre, E., Soroa, A., 2007a. Semeval-2007 task 02: evaluating word sense induction and discrimination systems. In:...
- Agirre, E., Soroa, A., 2007b. Ubc-as: a graph based unsupervised system for induction and classification. In:...
- et al.
The wacky wide weba collection of very large linguistically processed web-crawled corpora
Lang. Resour. Eval.
(2009) - Biemann, C., Heyer, G., Quasthoff, U., Richter, M., 2007. The leipzig corpora collection-monolingual corpora of...
- et al.
Latent Dirichlet allocation
J. Mach. Learn. Res.
(2003) - Bordag, S., 2006. Word sense induction: triplet-based clustering and automatic evaluation. In: Proceedings of the 11st...
- Brody, S., Lapata, M., 2009. Bayesian word sense induction. In: Proceedings of the 12th Conference of the European...
Class-based n-gram models of natural language
Comput. Linguist.
A novel hybrid structure for clustering
Adv. Comput. Sci. Eng.
Clustering and diversifying web search results with graph-based word sense induction
Comput. Linguist.
Distributional Structure
Word.
Comparing partitions
J. Classif.
Cited by (5)
A novel framework for biomedical entity sense induction
2018, Journal of Biomedical InformaticsAdapted competitive learning on continuous semantic space for word sense induction
2016, NeurocomputingCitation Excerpt :Word sense induction (WSI) discriminates different word senses of a polysemous word without relying on a predefined sense inventory. In contrast, word sense disambiguation (WSD) is assumed to access an already known sense list [16]. From this perspective, WSI can be treated as a clustering problem while WSD is a classification one.
Assessing American presidential candidates using principles of ontological engineering, word sense disambiguation, data envelope analysis and qualitative comparative analysis
2023, International Journal of Speech TechnologyLongitudinal Study of a Website for Assessing American Presidential Candidates and decision Making of Potential election Irregularities detection
2022, International Journal on Semantic Web and Information SystemsA Extensive Survey of Rival Penalized Techniques
2018, Proceedings of the 3rd International Conference on Inventive Computation Technologies, ICICT 2018