Abstract
In this paper, we propose a word sense learning algorithm which is capable of unsupervised feature selection and cluster number identification. Feature selection for word sense learning is built on an entropy-based filter and formalized as a constraint optimization problem, the output of which is a set of important features. Cluster number identification is built on a Gaussian mixture model with a MDL-based criterion, and the optimal model order is inferred by minimizing the criterion. To evaluate closeness between the learned sense clusters with the ground-truth classes, we introduce a kind of weighted F-measure to model the effort needed to reconstruct the classes from the clusters. Experiments show that the algorithm can retrieve important features, roughly estimate the class numbers automatically and outperforms other algorithms in terms of the weighted F-measure. In addition, we also try to apply the algorithm to a specific task of adding new words into a Chinese thesaurus.
Similar content being viewed by others
Notes
We use bag of words to denote the contexts of word occurrences.
For a Chinese character or word, we list its Pinyin (//) and English equivalent unless ambiguous.
References
Bradley, P., Fayyad, U., & Reina, C. (1998). Scaling Clustering Algorithms to Large Database. Proceedings of KDD.
Bouman, C. A., Shapiro, M., Cook, G. W., Atkins, C. B., & Cheng, H. (1998). Cluster: An unsupervised algorithm for modelling Gaussian mixtures.
Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature Selection for Clustering – A Filter Solution. Proc. Of IEEE Int. Conf. on Data Mining, Maebashi City, Japan.
Dash, M., & Liu, H. (2000). Feature selection for clustering. Proceedings of PAKDD.
Dorow, B., & Widdows, D. (2003). Discovering Corpus-Specific Word Senses. Proc. of the 10th EACL.
Dorow, B., Widdows, D. Katerina, L., Eckmann, J. Sergi, D., & Moses, E. (2005). Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination. Proceedings of 2nd Workshop organized by the MEANING Project.
Dy, J. G., & Brodley, C. E. (2004). Feature Selection for Unsupervised Learning. Journal of Machine Learning Research, 5, 845–889.
Figueiredo, M., & Jain, A. K. (2000). Unsupervised selection and estimation of finite mixture models. Proceedings of the International Conference on Pattern Recognition.
Fukumoto, F., & Suzuki, Y. (1999). Word Sense Disambiguation in Untagged Text Based on Term Weight Learning. Proc. of the 9th EACL.
Ji, D. H., Huang, C. N., & Gong, J. P. (1998). Adding new words into a Chinese thesaurus. Computer and the Humanities, 33(1), 203–227.
Krovetz, R., & Croft, W. B. (1993). Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10(2), 115–141.
Kulkarni, A., & Pedersen, T. (2005). SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts. Proceedings of the Demonstration and Interactive Poster Session of the 43rd ACL.
Lange, T., Braum, M., Roth, V., & Buhmann, J. M. (2002). Stability-based model selection. NIPS, 15.
Law, M. H., Figueiredo, M., & Jain, A. K. (2002). Feature selection in mixture-based clustering. Advances in Neural Information Processing Systems, 15, 609–616.
Lin, D., & Pantel, P. (2002). Concept discovery from text. Proceedings of Conference on Computational Linguistics.
Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. The MIT Press
Mei, J., Zhu, Y., Gao, Y., & Yin, H. (1982). Tongyici Cilin. Shanghai Dictionary Press.
Mitra, P., Murthy, A. C. & Pal, K. S. (2002). Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 301–312.
Modha, D. S., & Spangler, W. S. (2003). Feature weighting in k-means clustering. Machine Learning, 52(3), 217–237.
Niu, Z. Y., Ji, D. H., & Tan, C. L., (2004). Learning word senses with feature selection and model order identification. Proceedings of ACL.
Pantel, P., & Lin, D. K. (2002). Discovering Word Senses from Text. Proc. of ACM SIGKDD Conf. on KDD.
Pedersen, T., & Bruce, R. (1997). Distinguishing Word Senses in Untagged Text. Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (pp. 197–207).
Pedersen, T. & Kulkarni, A. (2006). Selecting the “Right” Number of Senses Based on Clustering Criterion Functions. Proceedings of the Posters and Demo Program of the EACL.
Pudil, P., Novovicova, J., & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15, 1119–1125.
Purandare, A., & Pedersen, T. (2004). Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces. Proceedings of the Conference on Computational Natural Language Learning (CoNLL).
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.
Sahami, M., Yusufali, S., & Baldonado, M. (1998). SONIA: a service for organizing networked information autonomously. Digital Library.
Sanderson, M. (2000). Retrieving with good sense. Information Retrieval, 2(1), 49–69.
Schutze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–124.
Schutze, H., & Pedersen, J. (1995). Information retrieval based on word senses. Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175.
Talavera, L. (1999). Feature selection as a pre-processing step for hierarchical clustering. Proc. of the 16th Int. Conf. On Machine Learning. Morgan Kaufmann, San Francisco, CA.
Vaithyanathan, S., & Dom, B. (1999). Model selection in unsupervised learning with application to document clustering. Proceedings of ICML.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ji, D., He, Y. & Xiao, G. Word sense learning based on feature selection and MDL principle. Lang Resources & Evaluation 40, 375–393 (2006). https://doi.org/10.1007/s10579-007-9030-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-007-9030-z