Skip to main content
Log in

Tagger Evaluation Given Hierarchical Tag Sets

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

We present methods for evaluating human and automatictaggers that extend current practice in three ways. First, we show howto evaluate taggers that assign multiple tags to each test instance,even if they do not assign probabilities. Second, we show how toaccommodate a common property of manually constructed ``gold standards''that are typically used for objective evaluation, namely that there isoften more than one correct answer. Third, we show how to measureperformance when the set of possible tags is tree-structured in an IS-Ahierarchy. To illustrate how our methods can be used to measureinter-annotator agreement, we show how to compute the kappa coefficientover hierarchical tag sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Atkins, S. “Tools for computer-aided lexicography: the Hector project”. In Papers in Computational Lexicography: COMPLEX '93. Budapest, 1993.

  • Carletta, J. “Assessing agreement on classification tasks: the Kappa statistic”. Computational Linguistics 22(2), 249–254, 1996.

    Google Scholar 

  • Chinchor, N. (ed.) “Proceedings of the 7th Message Understanding Conference”. Columbia,MD: Science Applications International Corporation (SAIC), 1998. Online publication athttp://www.muc.saic.com/proceedings/muc_7_toc.html.

    Google Scholar 

  • Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database; Cambridge, MA: MIT Press, 1998.

    Google Scholar 

  • Krishnamurthy, R. and D. Nicholls. “Peeling an onion: the lexicographer's experience of manual sense-tagging”. In SENSEVAL Workshop. Sussex, England, 1998.

  • Resnik, P. and D. Yarowsky. “A perspective on word sense disambiguation methods and their evaluation”. In M. Light (ed.): ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How? Washington, D.C., 1997.

  • Resnik, P. and D. Yarowsky. “Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation”. Natural Language Engineering, 5(2), 1999.

  • Siegel, S. and N.J. Castellan, Jr. Nonparametric Statistics for the Behavioral Sciences. Second edition. McGraw-Hill, 1988.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Melamed, I.D., Resnik, P. Tagger Evaluation Given Hierarchical Tag Sets. Computers and the Humanities 34, 79–84 (2000). https://doi.org/10.1023/A:1002402902356

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1002402902356

Navigation