Word sense learning based on feature selection and MDL principle

Ji, Donghong; He, Yanxiang; Xiao, Guozheng

doi:10.1007/s10579-007-9030-z

Word sense learning based on feature selection and MDL principle

Published: 18 July 2007

Volume 40, pages 375–393, (2006)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Donghong Ji^1,2,
Yanxiang He¹ &
Guozheng Xiao³

126 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we propose a word sense learning algorithm which is capable of unsupervised feature selection and cluster number identification. Feature selection for word sense learning is built on an entropy-based filter and formalized as a constraint optimization problem, the output of which is a set of important features. Cluster number identification is built on a Gaussian mixture model with a MDL-based criterion, and the optimal model order is inferred by minimizing the criterion. To evaluate closeness between the learned sense clusters with the ground-truth classes, we introduce a kind of weighted F-measure to model the effort needed to reconstruct the classes from the clusters. Experiments show that the algorithm can retrieve important features, roughly estimate the class numbers automatically and outperforms other algorithms in terms of the weighted F-measure. In addition, we also try to apply the algorithm to a specific task of adding new words into a Chinese thesaurus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

We use bag of words to denote the contexts of word occurrences.
For a Chinese character or word, we list its Pinyin (//) and English equivalent unless ambiguous.
http://www.news.sina.com.cn
http://www.keenage.com

References

Bradley, P., Fayyad, U., & Reina, C. (1998). Scaling Clustering Algorithms to Large Database. Proceedings of KDD.
Bouman, C. A., Shapiro, M., Cook, G. W., Atkins, C. B., & Cheng, H. (1998). Cluster: An unsupervised algorithm for modelling Gaussian mixtures.
Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature Selection for Clustering – A Filter Solution. Proc. Of IEEE Int. Conf. on Data Mining, Maebashi City, Japan.
Dash, M., & Liu, H. (2000). Feature selection for clustering. Proceedings of PAKDD.
Dorow, B., & Widdows, D. (2003). Discovering Corpus-Specific Word Senses. Proc. of the 10th EACL.
Dorow, B., Widdows, D. Katerina, L., Eckmann, J. Sergi, D., & Moses, E. (2005). Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination. Proceedings of 2nd Workshop organized by the MEANING Project.
Dy, J. G., & Brodley, C. E. (2004). Feature Selection for Unsupervised Learning. Journal of Machine Learning Research, 5, 845–889.
Google Scholar
Figueiredo, M., & Jain, A. K. (2000). Unsupervised selection and estimation of finite mixture models. Proceedings of the International Conference on Pattern Recognition.
Fukumoto, F., & Suzuki, Y. (1999). Word Sense Disambiguation in Untagged Text Based on Term Weight Learning. Proc. of the 9th EACL.
Ji, D. H., Huang, C. N., & Gong, J. P. (1998). Adding new words into a Chinese thesaurus. Computer and the Humanities, 33(1), 203–227.
Google Scholar
Krovetz, R., & Croft, W. B. (1993). Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10(2), 115–141.
Google Scholar
Kulkarni, A., & Pedersen, T. (2005). SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts. Proceedings of the Demonstration and Interactive Poster Session of the 43rd ACL.
Lange, T., Braum, M., Roth, V., & Buhmann, J. M. (2002). Stability-based model selection. NIPS, 15.
Law, M. H., Figueiredo, M., & Jain, A. K. (2002). Feature selection in mixture-based clustering. Advances in Neural Information Processing Systems, 15, 609–616.
Google Scholar
Lin, D., & Pantel, P. (2002). Concept discovery from text. Proceedings of Conference on Computational Linguistics.
Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. The MIT Press
Mei, J., Zhu, Y., Gao, Y., & Yin, H. (1982). Tongyici Cilin. Shanghai Dictionary Press.
Mitra, P., Murthy, A. C. & Pal, K. S. (2002). Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 301–312.
Article Google Scholar
Modha, D. S., & Spangler, W. S. (2003). Feature weighting in k-means clustering. Machine Learning, 52(3), 217–237.
Article Google Scholar
Niu, Z. Y., Ji, D. H., & Tan, C. L., (2004). Learning word senses with feature selection and model order identification. Proceedings of ACL.
Pantel, P., & Lin, D. K. (2002). Discovering Word Senses from Text. Proc. of ACM SIGKDD Conf. on KDD.
Pedersen, T., & Bruce, R. (1997). Distinguishing Word Senses in Untagged Text. Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (pp. 197–207).
Pedersen, T. & Kulkarni, A. (2006). Selecting the “Right” Number of Senses Based on Clustering Criterion Functions. Proceedings of the Posters and Demo Program of the EACL.
Pudil, P., Novovicova, J., & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15, 1119–1125.
Article Google Scholar
Purandare, A., & Pedersen, T. (2004). Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces. Proceedings of the Conference on Computational Natural Language Learning (CoNLL).
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.
Article Google Scholar
Sahami, M., Yusufali, S., & Baldonado, M. (1998). SONIA: a service for organizing networked information autonomously. Digital Library.
Sanderson, M. (2000). Retrieving with good sense. Information Retrieval, 2(1), 49–69.
Google Scholar
Schutze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–124.
Google Scholar
Schutze, H., & Pedersen, J. (1995). Information retrieval based on word senses. Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175.
Talavera, L. (1999). Feature selection as a pre-processing step for hierarchical clustering. Proc. of the 16th Int. Conf. On Machine Learning. Morgan Kaufmann, San Francisco, CA.
Vaithyanathan, S., & Dom, B. (1999). Model selection in unsupervised learning with application to document clustering. Proceedings of ICML.

Download references

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, 430072, China
Donghong Ji & Yanxiang He
Institute for Infocomm Research, Singapore, 119613, Singapore
Donghong Ji
Center for Study of Language and Information, Wuhan University, Wuhan, 430072, China
Guozheng Xiao

Authors

Donghong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Yanxiang He
View author publications
You can also search for this author in PubMed Google Scholar
Guozheng Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donghong Ji.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, D., He, Y. & Xiao, G. Word sense learning based on feature selection and MDL principle. Lang Resources & Evaluation 40, 375–393 (2006). https://doi.org/10.1007/s10579-007-9030-z

Download citation

Received: 24 August 2006
Accepted: 29 May 2007
Published: 18 July 2007
Issue Date: December 2006
DOI: https://doi.org/10.1007/s10579-007-9030-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Word sense learning based on feature selection and MDL principle

Abstract

Access this article

Similar content being viewed by others

A review of unsupervised feature selection methods

Sparse semi-supervised multi-label feature selection based on latent representation

Multiclass feature selection with metaheuristic optimization algorithms: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Word sense learning based on feature selection and MDL principle

Abstract

Access this article

Similar content being viewed by others

A review of unsupervised feature selection methods

Sparse semi-supervised multi-label feature selection based on latent representation

Multiclass feature selection with metaheuristic optimization algorithms: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation