Abstract
In this paper we describe a machine learning approach to word sense disambiguation that uses unlabeled data. Our method is based on selective sampling with committees of decision trees. The committee members are trained on a small set of labeled examples which are then augmented by a large number of unlabeled examples. Using unlabeled examples is important because obtaining labeled data is expensive and time-consuming while it is easy and inexpensive to collect a large number of unlabeled examples. The idea behind this approach is that the labels of unlabeled examples can be estimated by using committees. Using additional unlabeled examples, therefore, improves the performance of word sense disambiguation and minimizes the cost of manual labeling. Effectiveness of this approach was examined on a raw corpus of one million words. Using unlabeled data, we achieved an accuracy improvement up to 20.2%.
Similar content being viewed by others
References
F. Atsushi, I. Kentaro, T. Takenobu, and T. Hozumi, “Selective sampling of effective example sentence sets for word sense disambiguation,” Computational Linguistics, vol. 24, no.4, pp. 573–597, 1998.
P. Brown, S. Della-Pietras, V. Della-Pietras, and R. Mercer, “Word sense disambiguation using statistical methods,” in Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 1991, pp. 264–270.
T. Hwee and H. Lee, “Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach,” in Proceedings of the 34th Annual Meeting of the ACL, 1996, pp. 40–47.
C. Leacock, G. Towell, and E. Voorhees, “Towords building contextural representations of word senses using statistical models,” in Proceedings of the SIGLEX Workshop: Acquisition of Lexical Knowledge from Text, 1993, pp. 10–20.
T. Pedersen and R. Bruce, “Distinguishing word senses in untagged text,” in Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, 1997, pp. 399–401.
Y. Wilks and M. Stevenson, “Word sense disambiguation using optimised combinations of knowledge sources,” in Proceedings of COLING-ACL’98, 1998, pp. 1398–1402.
R. Liere and P. Tadepalli, “Active learning with committees for text categorization,” in Proceedings of AAAI-97, 1997, pp. 591–596.
D. Yarowsky, “Unsupervised word sense disambiguation rivaling supervised methods,” in Proceedings of the 33rd Annual Meeting of the ACL, 1995, pp. 189–196.
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, “Learning to classify text from labeled and unlabeled documents,” Machine Learning, vol. 39, pp. 1–32, 2000.
I. Dagan and S. Engelson, “Committee-based sampling for training probabilistic classifiers,” in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 150–157.
K. Lang, “Newsweeder: Learning to filter netnews,” in Proceedings of the Twelfth Internation Conference on Machine Learning, 1997, pp. 331–339.
D. Lewis and W. Gale, “A sequential algorithm for training text classifiers,” in Proceedings of SIGIR-94, 1994, pp. 5–11.
A. McCallum and K. Nigam, “Employing EM and pool-based active learning for text classification,” in Proceedings of the Fifteenth International Conference on Machine Learning, 1998, pp. 359–367.
G. Paaß and J. Kindermann, “Bayesian query construction for neural network models,” in Proceedings of Advances in Neural Information Processing Systems 7, 1995, pp. 443–450.
B.-T. Zhang, “Accelerated learning by active example selection,” International Journal of Neural Systems, vol. 5, no.1, pp. 67–75, 1994.
B.-T. Zhang and D.-Y. Cho, “Genetic programming with active data selection,” Simulated Evolution and Learning, vol. LNAI 1585, pp. 146–153, 1999.
Y. Freund, H. Seung, E. Shamir, and N. Tishiby, “Selective sampling using the query by committee algorithm,” Machine Learning, vol. 28, pp. 133–168, 1997.
A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of COLT-98, 1998, pp. 92–100.
D. Miller and H. Uyar, “A mixture of experts classifier with learning based on both labelled and unlabelled data,” in Proceedings of Advances in Neural Information Processing System 9, 1997, pp. 571–577.
K. Tumer and J. Ghosh, “Error correlation and error reduction in ensemble classifiers,” Connection Science, vol. 8, no.34, pp. 385–404, 1996.
N. Littlestone and M. Warmuth, “The weighted majority algorithm,” Information and Computation, vol. 108, no.2, pp. 212–261, 1994.
Y. Freund and R. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 148–156.
L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996.
T. Dietterich, M. Kearns, and Y. Mansour, “Applying the weak learning framework to understand and improve C4.5,” in Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 96–104.
R. Schapire, “Theoretical views of boosting,” in Proceedings of EuroCOLT, 1999, pp. 1–10.
R. Quinlan, C4.5: Programs For Machine Learning, Morgran Kaufmann Publishers, 1993.
P. Utgoff, N. Berkman, and J. Clouse, “Decision tree induction based on efficient tree restructuring,” Machine Learning, vol. 29, pp. 5–44, 1997.
S. Kang and Y. Kim, “Syllable-based model for the Korean morphology,” in Proceedings of COLING-94, 1994, pp. 221–226.
J. Yang and Y. Kim, “Korean analysis using multiple knowledge sources,” Journal of The Korea Information Science Society, vol. 21, no.7, pp. 1324–1332, 1994. (in Korean)
F. Atsushi, I. Kentaro, T. Takenobu, and T. Hozumi, “To what extent does case contribute to verb sense disambiguation?” in Proceedings of COLING-96, 1996, pp. 59–64.
D. Lin, “Using syntactic dependency as local context to resolve word sense ambiguity,” in Proceedings of the 35th Annual Meeting of the ACL, 1997, pp. 64–71.
S. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in Proceedings of the 34 th Annual Meeting of the ACL, 1996, pp. 310–318.
C. Fellbaum, WordNet: An Electronic Lexical Databse, The MIT Press, 1998.
E. Brill, “A simple rule-based part of speech tagger,” in Proceedings of the Third Conference on Applied Natural Language Processing, 1992, pp. 152–155.
P. Chan and S. Stolfo, “A comparative evaluation of voting and meta-learning on partitioned data,” in Proceedings of the Twelfth International Conference on Machine Learning, 1995, pp. 90–98.
E. Charniak, Statistical Language Learning, The MIT Press, 1993.
J.-M. Cho and G.-C. Kim, “Korean verb sense disambiguation using distributional information from corpora,” in Proceedings of Natural Language Processing Pacific Rim Symposium, 1995, pp. 691–696.
J. Diederich, “Connectionist recruitment learning,” in Proceedings of European Conference on Artificial Intelligence, 1988, pp. 351–356.
P. Domingos, “Knowledge acquisition from examples via multiple models,” in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 98–106.
B.-T. Zhang, “Learning by incremental selection of critical examples,” Arbeitspapiere der GMD, No. 735, German National Research Center for Computer Science (GMD), St. Augustin/Bonn, Germany, March 1993.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Park, SB., Zhang, BT. & Kim, Y.T. Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data. Applied Intelligence 19, 27–38 (2003). https://doi.org/10.1023/A:1023812606045
Issue Date:
DOI: https://doi.org/10.1023/A:1023812606045