Abstract
An approach and software tools are described for identifying and extracting compound terms (CTs), acronyms and their associated contexts from textual material that is associated with neuroanatomical atlases. A set of simple syntactic rules were appended to the output of a commercially available part of speech (POS) tagger (Qtag v 3.01) that extracts CTs and their associated context from the texts of neuroanatomical atlases. This “hybrid” parser appears to be highly sensitive and recognized 96% of the potentially germane neuroanatomical CTs and acronyms present in the cat and primate thalamic atlases.
A comparison of neuroanatomical CTs and acronyms between the cat and primate atlas texts was initially performed using exact-term matching. The implementation of string-matching algorithms significantly improved the identification of relevant terms and acronyms between the two domains. The End Gap Free string matcher identified 98% of CTs and the Needleman Wunsch (NW) string matcher matched 36% of acronyms between the two atlases.
Combining several simple grammatical and lexical rules with the POS tagger (“hybrid parser”) (1) extracted complex neuroanatomical terms and acronyms from selected cat and primate thalamic atlases and (2) and facilitated the semi-automated generation of a highly granular thalamic terminology. The implementation of string-matching algorithms (1) reconciled terminological errors generated by optical character recognition (OCR) software used to generate the neuroanatomical text information and (2) increased the sensitivity of matching neuroanatomical terms and acronyms between the two neuroanatomical domains that were generated by the “hybrid” parser.
Similar content being viewed by others
References
American Heritage Dictionary of the English Language, The: Fourth Edition. 2000, Houghton-Mifflin, Boston, MA.
Assadi, H. and Bourigault, D. (1996) Acquisition and modeling of knowledge starting from texts: data-processing tools and methodological elements. In: Acts of 10th Congress Pattern Recognition and Artificial Intelligence, Rennes, France.
Berman, A. L. and Jones E. G. (1982) The Thalamus and Basal Telencephalon of the Cat. A Cytoarchitectonic Atlas with Stereotaxic Coordinates. University of Wisconsin Press, Madison, WI.
Chang, J., Schutze, H., and Altman, R. (1999) Creating an online dictionary of abbreviations from MED-LINE. J. Am. Med. Inform. Assoc. 9:612–620.
Crasto, C., Marenco, L., Miller, P., and Shepherd, G. (2002) Olfactory receptor database: a metadata-driven automated population from sources of gene and protein sequences. Nucleic Acids Res. 30:354–360.
Gardner, D., Abato, M., Knuth, K. H., Debellis, R., and Gardner, E. P. (2001a) A functional ontology for neuroinformatics. The Human Brain Project/Neuroinformatics Annual Spring Meeting, May 21–22, 2001, Bethesda, MD.
Gusfield, D. (1997) Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge, UK.
Jacquemin, C. and Bourigault, D. (2002) Termextraction and automatic indexing. In: Handbook of Computational Linguistics. (Mitkov, R., ed.) Oxford University Press, Oxford, UK, Chapter 19.
Jones, E. G. (1998) The thalamus of primates In: Handbook of Chemical Neuroanatomy, Volume 14. (Bloom, F. E., et al., eds.) Elsevier, Amsterdam, The Netherlands.
Kuang-Hua, C. and Chert, I. (1994) Extracting noun phrases from large-scale texts: A hybrid approach and its automatic evaluation. In: 32nd Annual Meeting of the Association for Computational Linguistics, June 27–30, New Mexico State University, Las Cruces, NM.
Language Technology Group. http://www.ltg.ed.ac.uk/software/chunk/
Lopresti, D. and Wilfong, G. (1999) Cross-domain approximate string matching. Sixth International Symposium on String Processing and Information Retrieval. Cancun, Mexico, September 22–24, pp. 120–127.
Manning, C. D. and Schütze, H. (2000) Foundations of statistical natural language. MIT Press, Cambridge, MA, p. 83.
Maynard, D. and Ananiadou, S. (1999) Identifying contextual information for multi-word term extraction, In: 5th International Congress on Terminology and Knowledge Engineering (TKE99), pp. 212–221.
Monge, A. E. and Elkan, C. P. (1996) The field matching problem: Algorithms and applications. Second International Conference on Knowledge Discovery and Data Mining. (KDD96), Portland, OR, August 2–4, pp. 267–270.
Penn Tree Bank. http://www.cis.upenn.edu/~treebank/home.html
Qtag v 3.01, Portable POS Tagger. Oliver Mason, Department of English, School of Humanities, The University of Birmingham, UK. http://web.bham.ac.uk/O.Mason/
SPECIALIST Lexicon. http://www.nlm.nih.gov/research/umls/META4.HTML#s4
Zhu, J. J. and Ungar, L. H. (2000) String Edit Analysis for merging databases. Knowledge Discovery and Data Mining Workshop, August 20. Boston, MA. ACM SIG KDD, Jan 2001, Vol. 2., No, 2, p. 3.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Srinivas, P.R., Gusfield, D., Mason, O. et al. Neuroanatomical term generation and comparison between two terminologies. Neuroinform 1, 177–192 (2003). https://doi.org/10.1007/s12021-003-0004-z
Issue Date:
DOI: https://doi.org/10.1007/s12021-003-0004-z