Semi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation

Sugiyama, Kazunari; Okumura, Manabu

doi:10.1007/978-3-642-00382-0_22

Kazunari Sugiyama¹⁷ &
Manabu Okumura¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1766 Accesses
3 Citations

Abstract

We propose a supervised word sense disambiguation (WSD) system that uses features obtained from clustering results of word instances. Our approach is novel in that we employ semi-supervised clustering that controls the fluctuation of the centroid of a cluster, and we select seed instances by considering the frequency distribution of word senses and exclude outliers when we introduce “must-link” constraints between seed instances. In addition, we improve the supervised WSD accuracy by using features computed from word instances in clusters generated by the semi-supervised clustering. Experimental results show that these features are effective in improving WSD accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bar-Hillel, A., Hertz, T., Shental, N.: Learning Distance Functions Using Equivalence Relations. In: Proc. of the 20th International Conference on Machine Learning (ICML 2003), pp. 577–584 (2003)
Google Scholar
Hotho, A., Nürnberger, A., Paaß, G.: A Brief Survey of Text Mining. GLDV-Journal for Computational Linguistics and Language Technology 20(1), 19–62 (2005)
Google Scholar
Cai, J.F., Lee, W.S., Teh, Y.W.: Improving Word Sense Disambiguation Using Topic Features. In: Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP 2007), pp. 1015–1023 (2007)
Google Scholar
Davidov, D., Rappoport, A.: Classification of Semantic Relationships betwen Nominals Using Pattern Clusters. In: Proc. of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL 2008:HLT), pp. 227–235 (2008)
Google Scholar
Klein, D., Kamvar, S.D., Manning, C.D.: From Instance-level Constraints to Space-level Constraints: Making the Most of Prior Knowledge in Data Clustering. In: Proc. of the 19th International Conference on Machine Learning (ICML 2002), pp. 307–314 (2002)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Agirre, E., Ansa, O., Hovy, E., Martínez, D.: Enriching Very Large Ontologies Using the WWW. In: Proc. of 1st International Workshop on Ontology Learning (OL 2000). Held in Conjunction with the 14th European Conference on Artificial Intelligence (ECAI 2000) (2000)
Google Scholar
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: The 90% Solution. In: Proc. of the Human Language Technology Conference of the North American Chapter of the ACL (HLT-NAACL 2006), pp. 57–60 (2006)
Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.J.: Distance Metric Learning with Application to Clustering with Side-Information. Advances in Neural Information Processing Systems 15, 521–528 (2003)
Google Scholar
Katsavounidis, I., Kuo, C., Zhang, Z.: A New Initialization Technique for Generalized Lloyd Iteration. IEEE Signal Processing Letters 1(10), 144–146 (1994)
Article Google Scholar
Shirai, K.: SENSEVAL-2 Japanese Dictionary Task. In: Proc. of the 2nd International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2), pp. 33–36 (2001)
Google Scholar
Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: Proc. of the 17th International Conference on Machine Learning (ICML 2000), pp. 1103–1110 (2000)
Google Scholar
Wagstaff, K., Rogers, S., Schroedl, S.: Constrained K-means Clustering with Background Knowledge. In: Proc. of the 18th International Conference on Machine Learning (ICML 2001), pp. 577–584 (2001)
Google Scholar
MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proc. of the 5th Berkeley Symposium on Mathmatical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Nishio, M., Iwabuchi, E., Mizutani, S.: Iwanami Kokugo Jiten Dai Go Han. Iwanami Shoten (1994) (in Japanese)
Google Scholar
Muggleton, S.: Inductive Logic Programming. New Generation Computing 8(4), 295–318 (1991)
Article MATH Google Scholar
Taagepera, R., Shugart, M.S.: Seats and Votes: The Effects and Determinants of Electoral Systems. Yale University Press (1991)
Google Scholar
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised Clustering by Seeding. In: Proc. of the 19th International Conference on Machine Learning (ICML 2002), pp. 27–34 (2002)
Google Scholar
Specia, L., Stevenson, M., Nunes, M.G.V.: Learning Expressive Models for Word Sense Disambiguation. In: Proc. of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), pp. 41–48 (2007)
Google Scholar
Sugiyama, K., Okumura, M.: Personal Name Disambiguation in Web Search Results Based on a Semi-supervised Clustering Approach. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 250–256. Springer, Heidelberg (2007)
Chapter Google Scholar
The National Language Research Institute. Bunrui Goi Hyou. Shuueisha (1994) (in Japanese)
Google Scholar
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL 1995), pp. 189–196 (1995)
Google Scholar
Niu, Z.-Y., Ji, D.-H., Tan, C.L.: A Semi-Supervised Feature Clustering Algorithm with Application to Word Sense Disambiguation. In: Proc. of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), pp. 907–914 (2005)
Google Scholar
Zhong, Z., Ng, H.T., Chan, Y.S.: Word Sense Disambigation Using OntoNotes: An Empirical Study. In: Proc. of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pp. 1002–1010 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Precision and Intelligence Laboratory, Tokyo Institute of Technology, 4259 Nagatsuta, Midori, Yokohama, Kanagawa, 226-8503, Japan
Kazunari Sugiyama & Manabu Okumura

Authors

Kazunari Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar
Manabu Okumura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sugiyama, K., Okumura, M. (2009). Semi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-00382-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00381-3
Online ISBN: 978-3-642-00382-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics