Skip to main content
Log in

Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper we describe a machine learning approach to word sense disambiguation that uses unlabeled data. Our method is based on selective sampling with committees of decision trees. The committee members are trained on a small set of labeled examples which are then augmented by a large number of unlabeled examples. Using unlabeled examples is important because obtaining labeled data is expensive and time-consuming while it is easy and inexpensive to collect a large number of unlabeled examples. The idea behind this approach is that the labels of unlabeled examples can be estimated by using committees. Using additional unlabeled examples, therefore, improves the performance of word sense disambiguation and minimizes the cost of manual labeling. Effectiveness of this approach was examined on a raw corpus of one million words. Using unlabeled data, we achieved an accuracy improvement up to 20.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. F. Atsushi, I. Kentaro, T. Takenobu, and T. Hozumi, “Selective sampling of effective example sentence sets for word sense disambiguation,” Computational Linguistics, vol. 24, no.4, pp. 573–597, 1998.

    Google Scholar 

  2. P. Brown, S. Della-Pietras, V. Della-Pietras, and R. Mercer, “Word sense disambiguation using statistical methods,” in Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 1991, pp. 264–270.

  3. T. Hwee and H. Lee, “Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach,” in Proceedings of the 34th Annual Meeting of the ACL, 1996, pp. 40–47.

  4. C. Leacock, G. Towell, and E. Voorhees, “Towords building contextural representations of word senses using statistical models,” in Proceedings of the SIGLEX Workshop: Acquisition of Lexical Knowledge from Text, 1993, pp. 10–20.

  5. T. Pedersen and R. Bruce, “Distinguishing word senses in untagged text,” in Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, 1997, pp. 399–401.

  6. Y. Wilks and M. Stevenson, “Word sense disambiguation using optimised combinations of knowledge sources,” in Proceedings of COLING-ACL’98, 1998, pp. 1398–1402.

  7. R. Liere and P. Tadepalli, “Active learning with committees for text categorization,” in Proceedings of AAAI-97, 1997, pp. 591–596.

  8. D. Yarowsky, “Unsupervised word sense disambiguation rivaling supervised methods,” in Proceedings of the 33rd Annual Meeting of the ACL, 1995, pp. 189–196.

  9. K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, “Learning to classify text from labeled and unlabeled documents,” Machine Learning, vol. 39, pp. 1–32, 2000.

    Google Scholar 

  10. I. Dagan and S. Engelson, “Committee-based sampling for training probabilistic classifiers,” in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 150–157.

  11. K. Lang, “Newsweeder: Learning to filter netnews,” in Proceedings of the Twelfth Internation Conference on Machine Learning, 1997, pp. 331–339.

  12. D. Lewis and W. Gale, “A sequential algorithm for training text classifiers,” in Proceedings of SIGIR-94, 1994, pp. 5–11.

  13. A. McCallum and K. Nigam, “Employing EM and pool-based active learning for text classification,” in Proceedings of the Fifteenth International Conference on Machine Learning, 1998, pp. 359–367.

  14. G. Paaß and J. Kindermann, “Bayesian query construction for neural network models,” in Proceedings of Advances in Neural Information Processing Systems 7, 1995, pp. 443–450.

  15. B.-T. Zhang, “Accelerated learning by active example selection,” International Journal of Neural Systems, vol. 5, no.1, pp. 67–75, 1994.

    Google Scholar 

  16. B.-T. Zhang and D.-Y. Cho, “Genetic programming with active data selection,” Simulated Evolution and Learning, vol. LNAI 1585, pp. 146–153, 1999.

    Google Scholar 

  17. Y. Freund, H. Seung, E. Shamir, and N. Tishiby, “Selective sampling using the query by committee algorithm,” Machine Learning, vol. 28, pp. 133–168, 1997.

    Google Scholar 

  18. A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of COLT-98, 1998, pp. 92–100.

  19. D. Miller and H. Uyar, “A mixture of experts classifier with learning based on both labelled and unlabelled data,” in Proceedings of Advances in Neural Information Processing System 9, 1997, pp. 571–577.

  20. K. Tumer and J. Ghosh, “Error correlation and error reduction in ensemble classifiers,” Connection Science, vol. 8, no.34, pp. 385–404, 1996.

    Google Scholar 

  21. N. Littlestone and M. Warmuth, “The weighted majority algorithm,” Information and Computation, vol. 108, no.2, pp. 212–261, 1994.

    Google Scholar 

  22. Y. Freund and R. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 148–156.

  23. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996.

    Google Scholar 

  24. T. Dietterich, M. Kearns, and Y. Mansour, “Applying the weak learning framework to understand and improve C4.5,” in Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 96–104.

  25. R. Schapire, “Theoretical views of boosting,” in Proceedings of EuroCOLT, 1999, pp. 1–10.

  26. R. Quinlan, C4.5: Programs For Machine Learning, Morgran Kaufmann Publishers, 1993.

  27. P. Utgoff, N. Berkman, and J. Clouse, “Decision tree induction based on efficient tree restructuring,” Machine Learning, vol. 29, pp. 5–44, 1997.

    Google Scholar 

  28. S. Kang and Y. Kim, “Syllable-based model for the Korean morphology,” in Proceedings of COLING-94, 1994, pp. 221–226.

  29. J. Yang and Y. Kim, “Korean analysis using multiple knowledge sources,” Journal of The Korea Information Science Society, vol. 21, no.7, pp. 1324–1332, 1994. (in Korean)

    Google Scholar 

  30. F. Atsushi, I. Kentaro, T. Takenobu, and T. Hozumi, “To what extent does case contribute to verb sense disambiguation?” in Proceedings of COLING-96, 1996, pp. 59–64.

  31. D. Lin, “Using syntactic dependency as local context to resolve word sense ambiguity,” in Proceedings of the 35th Annual Meeting of the ACL, 1997, pp. 64–71.

  32. S. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in Proceedings of the 34 th Annual Meeting of the ACL, 1996, pp. 310–318.

  33. C. Fellbaum, WordNet: An Electronic Lexical Databse, The MIT Press, 1998.

  34. E. Brill, “A simple rule-based part of speech tagger,” in Proceedings of the Third Conference on Applied Natural Language Processing, 1992, pp. 152–155.

  35. P. Chan and S. Stolfo, “A comparative evaluation of voting and meta-learning on partitioned data,” in Proceedings of the Twelfth International Conference on Machine Learning, 1995, pp. 90–98.

  36. E. Charniak, Statistical Language Learning, The MIT Press, 1993.

  37. J.-M. Cho and G.-C. Kim, “Korean verb sense disambiguation using distributional information from corpora,” in Proceedings of Natural Language Processing Pacific Rim Symposium, 1995, pp. 691–696.

  38. J. Diederich, “Connectionist recruitment learning,” in Proceedings of European Conference on Artificial Intelligence, 1988, pp. 351–356.

  39. P. Domingos, “Knowledge acquisition from examples via multiple models,” in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 98–106.

  40. B.-T. Zhang, “Learning by incremental selection of critical examples,” Arbeitspapiere der GMD, No. 735, German National Research Center for Computer Science (GMD), St. Augustin/Bonn, Germany, March 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Byoung-Tak Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, SB., Zhang, BT. & Kim, Y.T. Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data. Applied Intelligence 19, 27–38 (2003). https://doi.org/10.1023/A:1023812606045

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023812606045

Navigation