Mining Biomedical Abstracts: What’s in a Term?

Nenadic, Goran; Spasic, Irena; Ananiadou, Sophia

doi:10.1007/978-3-540-30211-7_85

Goran Nenadic^22,25,
Irena Spasic^23,25 &
Sophia Ananiadou^24,25

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

International Conference on Natural Language Processing

1586 Accesses
1 Citations

Abstract

In this paper we present a study of the usage of terminology in the biomedical literature, with the main aim to indicate phenomena that can be helpful for automatic term recognition in the domain. Our analysis is based on the terminology appearing in the Genia corpus. We analyse the usage of biomedical terms and their variants (namely inflectional and orthographic alternatives, terms with prepositions, coordinated terms, etc.), showing the variability and dynamic nature of terms used in biomedical abstracts. Term coordination and terms containing prepositions are analysed in detail. We also show that there is a discrepancy between terms used in the literature and terms listed in controlled dictionaries. In addition, we briefly evaluate the effectiveness of incorporating treatment of different types of term variation into an automatic term recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ananiadou, S.: A Methodology for Automatic Term Recognition. In: Proc. of COLING, pp. 1034–1038 (1994)
Google Scholar
Chang, J., Schutze, H., Altman, R.: Creating an Online Dictionary of Abbreviations from Medline. Journal of the American Medical Informatics Association 9(6), 612–620 (2002)
Article Google Scholar
Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-word Terms: the Cvalue/ NC-value Method. Int. J. on Digital Libraries. 3(2), 115–130 (2000)
Article Google Scholar
Hirschman, L., Friedman, C., McEntire, R., Wu, C.: Linking Biological Language Information and Knowledge. In: Proc. of PSB 2003 (the introduction to the BioNLP track) (2003)
Google Scholar
Jacquemin, C.: Spotting and Discovering Terms through NLP. MIT Press, Cambridge (2001)
Google Scholar
Krauthammer, M., Nenadic, G.: Term Identification in the Biomedical Literature. Journal of Biomedical Informatics (2004) (in press)
Google Scholar
Lander, E.S., et al. (International Human Genome Sequencing Consortium): Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921
Google Scholar
Larkey, L., Ogilvie, P., Price, A., Tamilio, B.: Acrophile: An Automated Acronym Extractor and Server. In: Proc. of ACM Digital Libraries, pp. 205–214 (2000)
Google Scholar
Liu, H., Aronson, A.R., Friedman, C.: A study of abbreviations in Medline abstracts. In: Proc. of AMIA Symposium 2002, pp. 464–468 (2002)
Google Scholar
Maynard, D., Ananiadou, S.: TRUCKS: A Model for Automatic Multi-Word Term Recognition. Journal of Natural Language Processing 8(1), 101–125 (2000)
Google Scholar
Nenadic, G., Spasic, I., Ananiadou, S.: Automatic Acronym Acquisition and Term Variation Management within Domain-Specific Texts. In: Proc. of LREC-3, pp. 2155–2162 (2002)
Google Scholar
Nenadic, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: Proc. of COLING 2004 (2004) (in press)
Google Scholar
Ogren, P., Cohen, K., Acquaah-Mensah, G., Eberlein, J., Hunter, L.: The Compositional Structure of Gene Ontology Terms. In: Proc. of PSB, pp. 214–225 (2004)
Google Scholar
Ohta, T., Tateisi, Y., Kim, J., Mima, H., Tsujii, J.: Genia Corpus: an Annotated Research Abstract Corpus in Molecular Biology Domain. In: Proc. of HLT 2002, pp. 73–77 (2002)
Google Scholar
Pustejovsky, J., Castaño, J., Cochran, B., Kotecki, M., Morrell, M., Rumshisky, A.: Extraction and Disambiguation of Acronym-Meaning Pairs in Medline. In: Proc. of Medinfo (2001)
Google Scholar
Pustejovsky, J., Castaño, J., Zhang, J., Kotecki, M., Cochran, B.: Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations. In: Proc. of PSB 2002, pp. 362–373 (2002)
Google Scholar
Rimer, M., O’Connell, M.: BioABACUS: a database of abbreviations and acronyms in biotechnology and computer science. Bioinformatics 14(10), 888–889 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computation, UMIST, Manchester, M60 1QD, UK
Goran Nenadic
Department of Chemistry, UMIST, Manchester, M60 1QD, UK
Irena Spasic
Computer Science, University of Salford, Salford, M5 4WT, UK
Sophia Ananiadou
National Centre for Text Mining, Manchester, UK
Goran Nenadic, Irena Spasic & Sophia Ananiadou

Authors

Goran Nenadic
View author publications
You can also search for this author in PubMed Google Scholar
Irena Spasic
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Ananiadou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Behavior Design Corporation, IV Science-Based Industrial Park Hsinchu, 2F, No.5, Industry E. Rd, Taiwan
Keh-Yih Su
University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, JST CREST, Honcho 4-1-8, Kawaguchi-shi,, 332-0012, Saitama,
Jun’ichi Tsujii
Pohang University of Science and Technology (POSTECH), AITrc, Republic of Korea
Jong-Hyeok Lee
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nenadic, G., Spasic, I., Ananiadou, S. (2005). Mining Biomedical Abstracts: What’s in a Term?. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_85

Download citation

DOI: https://doi.org/10.1007/978-3-540-30211-7_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics