Skip to main content
Log in

Word sense learning based on feature selection and MDL principle

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, we propose a word sense learning algorithm which is capable of unsupervised feature selection and cluster number identification. Feature selection for word sense learning is built on an entropy-based filter and formalized as a constraint optimization problem, the output of which is a set of important features. Cluster number identification is built on a Gaussian mixture model with a MDL-based criterion, and the optimal model order is inferred by minimizing the criterion. To evaluate closeness between the learned sense clusters with the ground-truth classes, we introduce a kind of weighted F-measure to model the effort needed to reconstruct the classes from the clusters. Experiments show that the algorithm can retrieve important features, roughly estimate the class numbers automatically and outperforms other algorithms in terms of the weighted F-measure. In addition, we also try to apply the algorithm to a specific task of adding new words into a Chinese thesaurus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. We use bag of words to denote the contexts of word occurrences.

  2. For a Chinese character or word, we list its Pinyin (//) and English equivalent unless ambiguous.

  3. http://www.news.sina.com.cn

  4. http://www.keenage.com

References

  • Bradley, P., Fayyad, U., & Reina, C. (1998). Scaling Clustering Algorithms to Large Database. Proceedings of KDD.

  • Bouman, C. A., Shapiro, M., Cook, G. W., Atkins, C. B., & Cheng, H. (1998). Cluster: An unsupervised algorithm for modelling Gaussian mixtures.

  • Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature Selection for Clustering – A Filter Solution. Proc. Of IEEE Int. Conf. on Data Mining, Maebashi City, Japan.

  • Dash, M., & Liu, H. (2000). Feature selection for clustering. Proceedings of PAKDD.

  • Dorow, B., & Widdows, D. (2003). Discovering Corpus-Specific Word Senses. Proc. of the 10th EACL.

  • Dorow, B., Widdows, D. Katerina, L., Eckmann, J. Sergi, D., & Moses, E. (2005). Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination. Proceedings of 2nd Workshop organized by the MEANING Project.

  • Dy, J. G., & Brodley, C. E. (2004). Feature Selection for Unsupervised Learning. Journal of Machine Learning Research, 5, 845–889.

    Google Scholar 

  • Figueiredo, M., & Jain, A. K. (2000). Unsupervised selection and estimation of finite mixture models. Proceedings of the International Conference on Pattern Recognition.

  • Fukumoto, F., & Suzuki, Y. (1999). Word Sense Disambiguation in Untagged Text Based on Term Weight Learning. Proc. of the 9th EACL.

  • Ji, D. H., Huang, C. N., & Gong, J. P. (1998). Adding new words into a Chinese thesaurus. Computer and the Humanities, 33(1), 203–227.

    Google Scholar 

  • Krovetz, R., & Croft, W. B. (1993). Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10(2), 115–141.

    Google Scholar 

  • Kulkarni, A., & Pedersen, T. (2005). SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts. Proceedings of the Demonstration and Interactive Poster Session of the 43rd ACL.

  • Lange, T., Braum, M., Roth, V., & Buhmann, J. M. (2002). Stability-based model selection. NIPS, 15.

  • Law, M. H., Figueiredo, M., & Jain, A. K. (2002). Feature selection in mixture-based clustering. Advances in Neural Information Processing Systems, 15, 609–616.

    Google Scholar 

  • Lin, D., & Pantel, P. (2002). Concept discovery from text. Proceedings of Conference on Computational Linguistics.

  • Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. The MIT Press

  • Mei, J., Zhu, Y., Gao, Y., & Yin, H. (1982). Tongyici Cilin. Shanghai Dictionary Press.

  • Mitra, P., Murthy, A. C. & Pal, K. S. (2002). Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 301–312.

    Article  Google Scholar 

  • Modha, D. S., & Spangler, W. S. (2003). Feature weighting in k-means clustering. Machine Learning, 52(3), 217–237.

    Article  Google Scholar 

  • Niu, Z. Y., Ji, D. H., & Tan, C. L., (2004). Learning word senses with feature selection and model order identification. Proceedings of ACL.

  • Pantel, P., & Lin, D. K. (2002). Discovering Word Senses from Text. Proc. of ACM SIGKDD Conf. on KDD.

  • Pedersen, T., & Bruce, R. (1997). Distinguishing Word Senses in Untagged Text. Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (pp. 197–207).

  • Pedersen, T. & Kulkarni, A. (2006). Selecting the “Right” Number of Senses Based on Clustering Criterion Functions. Proceedings of the Posters and Demo Program of the EACL.

  • Pudil, P., Novovicova, J., & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15, 1119–1125.

    Article  Google Scholar 

  • Purandare, A., & Pedersen, T. (2004). Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces. Proceedings of the Conference on Computational Natural Language Learning (CoNLL).

  • Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.

    Article  Google Scholar 

  • Sahami, M., Yusufali, S., & Baldonado, M. (1998). SONIA: a service for organizing networked information autonomously. Digital Library.

  • Sanderson, M. (2000). Retrieving with good sense. Information Retrieval, 2(1), 49–69.

    Google Scholar 

  • Schutze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–124.

    Google Scholar 

  • Schutze, H., & Pedersen, J. (1995). Information retrieval based on word senses. Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175.

  • Talavera, L. (1999). Feature selection as a pre-processing step for hierarchical clustering. Proc. of the 16th Int. Conf. On Machine Learning. Morgan Kaufmann, San Francisco, CA.

  • Vaithyanathan, S., & Dom, B. (1999). Model selection in unsupervised learning with application to document clustering. Proceedings of ICML.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donghong Ji.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, D., He, Y. & Xiao, G. Word sense learning based on feature selection and MDL principle. Lang Resources & Evaluation 40, 375–393 (2006). https://doi.org/10.1007/s10579-007-9030-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9030-z

Keywords

Navigation