Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data

Park, Seong-Bae; Zhang, Byoung-Tak; Kim, Yung Taek

doi:10.1023/A:1023812606045

Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data

Published: July 2003

Volume 19, pages 27–38, (2003)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Seong-Bae Park¹,
Byoung-Tak Zhang¹ &
Yung Taek Kim¹

134 Accesses
6 Citations
Explore all metrics

Abstract

In this paper we describe a machine learning approach to word sense disambiguation that uses unlabeled data. Our method is based on selective sampling with committees of decision trees. The committee members are trained on a small set of labeled examples which are then augmented by a large number of unlabeled examples. Using unlabeled examples is important because obtaining labeled data is expensive and time-consuming while it is easy and inexpensive to collect a large number of unlabeled examples. The idea behind this approach is that the labels of unlabeled examples can be estimated by using committees. Using additional unlabeled examples, therefore, improves the performance of word sense disambiguation and minimizes the cost of manual labeling. Effectiveness of this approach was examined on a raw corpus of one million words. Using unlabeled data, we achieved an accuracy improvement up to 20.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

F. Atsushi, I. Kentaro, T. Takenobu, and T. Hozumi, “Selective sampling of effective example sentence sets for word sense disambiguation,” Computational Linguistics, vol. 24, no.4, pp. 573–597, 1998.
Google Scholar
P. Brown, S. Della-Pietras, V. Della-Pietras, and R. Mercer, “Word sense disambiguation using statistical methods,” in Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 1991, pp. 264–270.
T. Hwee and H. Lee, “Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach,” in Proceedings of the 34th Annual Meeting of the ACL, 1996, pp. 40–47.
C. Leacock, G. Towell, and E. Voorhees, “Towords building contextural representations of word senses using statistical models,” in Proceedings of the SIGLEX Workshop: Acquisition of Lexical Knowledge from Text, 1993, pp. 10–20.
T. Pedersen and R. Bruce, “Distinguishing word senses in untagged text,” in Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, 1997, pp. 399–401.
Y. Wilks and M. Stevenson, “Word sense disambiguation using optimised combinations of knowledge sources,” in Proceedings of COLING-ACL’98, 1998, pp. 1398–1402.
R. Liere and P. Tadepalli, “Active learning with committees for text categorization,” in Proceedings of AAAI-97, 1997, pp. 591–596.
D. Yarowsky, “Unsupervised word sense disambiguation rivaling supervised methods,” in Proceedings of the 33rd Annual Meeting of the ACL, 1995, pp. 189–196.
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, “Learning to classify text from labeled and unlabeled documents,” Machine Learning, vol. 39, pp. 1–32, 2000.
Google Scholar
I. Dagan and S. Engelson, “Committee-based sampling for training probabilistic classifiers,” in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 150–157.
K. Lang, “Newsweeder: Learning to filter netnews,” in Proceedings of the Twelfth Internation Conference on Machine Learning, 1997, pp. 331–339.
D. Lewis and W. Gale, “A sequential algorithm for training text classifiers,” in Proceedings of SIGIR-94, 1994, pp. 5–11.
A. McCallum and K. Nigam, “Employing EM and pool-based active learning for text classification,” in Proceedings of the Fifteenth International Conference on Machine Learning, 1998, pp. 359–367.
G. Paaß and J. Kindermann, “Bayesian query construction for neural network models,” in Proceedings of Advances in Neural Information Processing Systems 7, 1995, pp. 443–450.
B.-T. Zhang, “Accelerated learning by active example selection,” International Journal of Neural Systems, vol. 5, no.1, pp. 67–75, 1994.
Google Scholar
B.-T. Zhang and D.-Y. Cho, “Genetic programming with active data selection,” Simulated Evolution and Learning, vol. LNAI 1585, pp. 146–153, 1999.
Google Scholar
Y. Freund, H. Seung, E. Shamir, and N. Tishiby, “Selective sampling using the query by committee algorithm,” Machine Learning, vol. 28, pp. 133–168, 1997.
Google Scholar
A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of COLT-98, 1998, pp. 92–100.
D. Miller and H. Uyar, “A mixture of experts classifier with learning based on both labelled and unlabelled data,” in Proceedings of Advances in Neural Information Processing System 9, 1997, pp. 571–577.
K. Tumer and J. Ghosh, “Error correlation and error reduction in ensemble classifiers,” Connection Science, vol. 8, no.34, pp. 385–404, 1996.
Google Scholar
N. Littlestone and M. Warmuth, “The weighted majority algorithm,” Information and Computation, vol. 108, no.2, pp. 212–261, 1994.
Google Scholar
Y. Freund and R. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 148–156.
L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996.
Google Scholar
T. Dietterich, M. Kearns, and Y. Mansour, “Applying the weak learning framework to understand and improve C4.5,” in Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 96–104.
R. Schapire, “Theoretical views of boosting,” in Proceedings of EuroCOLT, 1999, pp. 1–10.
R. Quinlan, C4.5: Programs For Machine Learning, Morgran Kaufmann Publishers, 1993.
P. Utgoff, N. Berkman, and J. Clouse, “Decision tree induction based on efficient tree restructuring,” Machine Learning, vol. 29, pp. 5–44, 1997.
Google Scholar
S. Kang and Y. Kim, “Syllable-based model for the Korean morphology,” in Proceedings of COLING-94, 1994, pp. 221–226.
J. Yang and Y. Kim, “Korean analysis using multiple knowledge sources,” Journal of The Korea Information Science Society, vol. 21, no.7, pp. 1324–1332, 1994. (in Korean)
Google Scholar
F. Atsushi, I. Kentaro, T. Takenobu, and T. Hozumi, “To what extent does case contribute to verb sense disambiguation?” in Proceedings of COLING-96, 1996, pp. 59–64.
D. Lin, “Using syntactic dependency as local context to resolve word sense ambiguity,” in Proceedings of the 35th Annual Meeting of the ACL, 1997, pp. 64–71.
S. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in Proceedings of the 34 ^th Annual Meeting of the ACL, 1996, pp. 310–318.
C. Fellbaum, WordNet: An Electronic Lexical Databse, The MIT Press, 1998.
E. Brill, “A simple rule-based part of speech tagger,” in Proceedings of the Third Conference on Applied Natural Language Processing, 1992, pp. 152–155.
P. Chan and S. Stolfo, “A comparative evaluation of voting and meta-learning on partitioned data,” in Proceedings of the Twelfth International Conference on Machine Learning, 1995, pp. 90–98.
E. Charniak, Statistical Language Learning, The MIT Press, 1993.
J.-M. Cho and G.-C. Kim, “Korean verb sense disambiguation using distributional information from corpora,” in Proceedings of Natural Language Processing Pacific Rim Symposium, 1995, pp. 691–696.
J. Diederich, “Connectionist recruitment learning,” in Proceedings of European Conference on Artificial Intelligence, 1988, pp. 351–356.
P. Domingos, “Knowledge acquisition from examples via multiple models,” in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 98–106.
B.-T. Zhang, “Learning by incremental selection of critical examples,” Arbeitspapiere der GMD, No. 735, German National Research Center for Computer Science (GMD), St. Augustin/Bonn, Germany, March 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, 151-742, Korea
Seong-Bae Park, Byoung-Tak Zhang & Yung Taek Kim

Authors

Seong-Bae Park
View author publications
You can also search for this author in PubMed Google Scholar
Byoung-Tak Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yung Taek Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Byoung-Tak Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, SB., Zhang, BT. & Kim, Y.T. Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data. Applied Intelligence 19, 27–38 (2003). https://doi.org/10.1023/A:1023812606045

Download citation

Issue Date: July 2003
DOI: https://doi.org/10.1023/A:1023812606045

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation