Investigating Problems of Semi-supervised Learning for Word Sense Disambiguation

Le, Anh-Cuong; Shimazu, Akira; Nguyen, Le-Minh

doi:10.1007/11940098_51

Anh-Cuong Le²²,
Akira Shimazu²² &
Le-Minh Nguyen²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

1006 Accesses
1 Citations

Abstract

Word Sense Disambiguation (WSD) is the problem of determining the right sense of a polysemous word in a given context. In this paper, we will investigate the use of unlabeled data for WSD within the framework of semi supervised learning, in which the original labeled dataset is iteratively extended by exploiting unlabeled data. This paper addresses two problems occurring in this approach: determining a subset of new labeled data at each extension and generating the final classifier. By giving solutions for these problems, we generate some variants of bootstrapping algorithms and apply to word sense disambiguation. The experiments were done on the datasets of four words: interest, line, hard, and serve; and on English lexical sample of Senseval-3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. COLT, pp. 92–100 (1998)
Google Scholar
Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proc. ICML 2000, pp. 327–334 (2000)
Google Scholar
Lee, Y.K., Ng, H.T.: An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proc. EMNLP, pp. 41–48 (2002)
Google Scholar
Le, C.A., Huynh, V.-N., Dam, H.-C., Shimazu, A.: Combining Classifiers Based on OWA Operators with an Application to Word Sense Disambiguation. In: Ślęzak, D., Wang, G., Szczuka, M.S., Düntsch, I., Yao, Y. (eds.) RSFDGrC 2005. LNCS, vol. 3641, pp. 512–521. Springer, Heidelberg (2005)
Chapter Google Scholar
Mihalcea, R.: Co-training and Self-training for Word Sense Disambiguation. In: Proc. CoNLL, pp. 33–40 (2004)
Google Scholar
Ng, H.T., Lee, H.B.: Integrating multiple knowledge sources to Disambiguate Word Sense: An exemplar-based approach. In: Proc. ACL, pp. 40–47 (1996)
Google Scholar
Pham, T.P., Ng, H.T., Lee, W.S.: Word Sense Disambiguation with Semi-Supervised Learning. In: Proc. AAAI, pp. 1093–1098 (2005)
Google Scholar
Pierce, D., Cardie, C.: Limitations of co-training for natural language learning from large datasets. In: Proc. EMNLP, pp. 1–9 (2001)
Google Scholar
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proc. ACL, pp. 189–196 (1995)
Google Scholar
Yu, N.Z., Hong, J.D., Lim, T.C.: Word Sense Disambiguation Using Label Propagation Based Semi-supervised Learning Method. In: Proc. ACL, pp. 395–402 (2005)
Google Scholar
Su, W., Carpuat, M., Wu, D.: Semi-Supervised Training of a Kernel PCA-Based Model for Word Sense Disambiguation. In: Proc. COLING, pp. 1298–1304 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan
Anh-Cuong Le, Akira Shimazu & Le-Minh Nguyen

Authors

Anh-Cuong Le
View author publications
You can also search for this author in PubMed Google Scholar
Akira Shimazu
View author publications
You can also search for this author in PubMed Google Scholar
Le-Minh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, AC., Shimazu, A., Nguyen, LM. (2006). Investigating Problems of Semi-supervised Learning for Word Sense Disambiguation. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_51

Download citation

DOI: https://doi.org/10.1007/11940098_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics