Abstract
Automatic Term Recognition (ATR) is defined as the task of identifying domain specific terms from technical corpora. Termhood-based approaches measure the degree that a candidate term refers to a domain specific concept. Unithood-based approaches measure the attachment strength of a candidate term constituents. These methods have been evaluated using different, often incompatible evaluation schemes and datasets. This paper provides an overview and a thorough evaluation of state-of-the-art ATR methods, under a common evaluation framework, i.e. corpora and evaluation method. Our contributions are two-fold: (1) We compare a number of different ATR methods, showing that termhood-based methods achieve in general superior performance. (2) We show that the number of independent occurrences of a candidate term is the most effective source for estimating term nestedness, improving ATR performance.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Dias, G., Kaalep, H., Muischnek, K.: Automatic Extraction of Verb Phrases from Annotated Corpora: A Linguistic Evaluation for Estonian. In: EACL/ACL Workshop on Collocations, Toulouse, France (2001)
Dunning, T.E.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1993)
Evert, S., Krenn, B.: Methods for the qualitative evaluation of lexical association measures. In: ACL, Morristown, NJ, USA (2001)
Frantzi, K.T., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries 3(2), 115–130 (2000)
Gu, B.: Recognizing Nested Named Entities in GENIA corpus. In: HLT-NAACL BioNLP Workshop, New York, pp. 112–113 (2006)
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1), 9–27 (1995)
Kageura, K., Umino, B.: Methods of automatic term recognition: a review. Terminology 3(2), 259–289 (1996)
Kulick, S., Bies, A., Liberman, M., Mandel, M., Mcdonald, R., Palmer, M., Schein, A., Ungar, L., Winters, S., White, P.: Integrated Annotation for Biomedical Information Extraction. In: Hirschman, L., Pustejovsky, J. (eds.) HLT-NAACL BioLINK Workshop, Boston, Massachusetts, USA, pp. 61–68 (2004)
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. Chapter: Collocations. MIT Press, Cambridge (1999)
Mcinnes, B.T.: Extending the Log Likelihood Measure to Improve Collocation Identification. Master’s thesis. University of Minnesota (2004)
Mikheev, A., Moens, M., Grover, C.: Named Entity recognition without gazetteers. In: EACL, Bergen, Norway, pp. 1–8 (1999)
Nakagawa, H.: Automatic Term Recognition based on Statistics of Compound Nouns. Terminology 6(2), 195–210 (2000)
Pecina, P., Schlesinger, P.: Combining Association Measures for Collocation Extraction. In: ACL, Sydney, Australia (2006)
Radev, D., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Elebi, A., Liu, D., Drabek, E.: Evaluation challenges in large-scale document summarization. In: ACL, Sapporo, Japan (2003)
Wermter, J., Hahn, U.: Collocation extraction based on modifiability statistics. In: COLING, Morristown, NJ, USA (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Korkontzelos, I., Klapaftis, I.P., Manandhar, S. (2008). Reviewing and Evaluating Automatic Term Recognition Techniques. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)