Abstract
With the increasing amount of biomedical literature, there is a need for automatic extraction of information to support biomedical researchers. Due to incomplete biomedical information databases, the extraction is not straightforward using dictionaries, and several approaches using contextual rules and machine learning have previously been proposed. Our work is inspired by the previous approaches, but is novel in the sense that it is using Google for semantic annotation of the biomedical words. The semantic annotation accuracy obtained – 52% on words not found in the Brown Corpus, Swiss-Prot or LocusLink (accessed using Gsearch.org) – is justifying further work in this direction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bickel, S., Brefeld, U., Faulstich, L., Hakenberg, J., Leser, U., Plake, C., Scheffer, T.: A Support Vector Machine classifier for gene name recognition. In: Proceedings of the EMBO Workshop: A Critical Assessment of Text Mining Methods in Molecular Biology (March 2004)
Blaschke, C., Andrade, M., Ouzounis, C., Valencia, A.: Automatic Extraction of biological information from scientific text: Protein-protein interactions. In: Proceedings of International Conference on Intelligent Systems for Molecular Biology, pp. 60–67. AAAI, Menlo Park (1999)
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365–370 (2003)
Bunescu, R., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Comparative Experiments on Learning Information Extractors for Proteins and their Interactions. Journal Artificial Intelligence in Medicine: Special Issue on Summarization and Information Extraction from Medical Documents (Forthcoming) (2004)
Bunescu, R., Ge, R., Kate, R.J., Mooney, R.J., Wong, Y.W., Marcotte, E.M., Ramani, A.K.: Learning to Extract Proteins and their Interactions from Medline Abstracts. In: Proceedings of the ICML-2003 Workshop on Machine Learning in Bioinformatics, pp. 46–53 (August 2003)
Bunescu, R., Ge, R., Mooney, R.J., Marcotte, E., Ramani, A.K.: Extracting Gene and Protein Names from Biomedical Abstracts. Unpublished Technical Note, Machine Learning Research Group, University of Texas at Austin, USA (March 2002)
Cimiano, P., Staab, S.: Learning by Googling. SIGKDD Explorations Newsletter 6(2), 24–34 (2004)
Cowie, J., Lehnert, W.: Information Extraction. Communications of the ACM 39(1), 80–91 (1996)
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: SemTag and seeker: bootstrapping the semantic web via automated semantic annotation. In: Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 178–186. ACM, New York (2003)
Dingare, S., Finkel, J., Manning, C., Nissim, M., Alex, B.: Exploring the Boundaries: Gene and Protein Identification in Biomedical Text. In: Proceedings of the BioCreative Workshop (March 2004)
Dingare, S., Finkel, J., Manning, C., Nissim, M., Alex, B., Grover, C.: Exploring the Boundaries: Gene and Protein Identification in Biomedical Text. Submitted to BMC Bioinformatics (2004)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Submitted to Artificial Intelligence (2004)
Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Proceedings of Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Ginter, F., Boberg, J., Jarvinen, J., Salakoski, T.: New Techniques for Disambiguation in Natural Language and Their Application to Biological Texts. Journal of Machine Learning Research 5, 605–621 (2004)
Tsuji, J.i., Wong, L.: Natural Language Processing and Information Extraction in Biology. In: Proceedings of the Pacific Symposium on Biocomputing 2001, pp. 372–373 (2001)
Jenssen, T.-K., Lægreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 28(1), 21–28 (2001)
Jiampojamarn, S.: Biological term extraction using classification methods. Presentation at Dalhousie Natural Language Processing Meeting (June 2004)
Kakade, V., Sharangpani, M.: Improving the Precision of Web Search for Medical Domain using Automatic Query Expansion. Online (2004)
Kruschwitz, U.: Automatically Acquired Domain Knowledge for ad hoc Search: Evaluation Results. In: Proceedings of the 2003 Intl. Conf. on Natural Language Processing and Knowledge Engineering (NLP-KE 2003). IEEE, Los Alamitos (2003)
Mukherjea, S., Subramaniam, L.V., Chanda, G., Sankararaman, S., Kothari, R., Batra, V., Bhardwaj, D., Srivastava, B.: Enhancing a biomedical information extraction system with dictionary mining and context disambiguation. IBM Journal of Research and Development 48(5/6), 693–701 (2004)
Narayanaswamy, M., Ravikumar, K.E., Vijay-Shanker, K.: A biological named entity recognizer. In: Proceedings of the Pacific Symposium on Biocomputing 2003, pp. 427–438 (2003)
Parry, D.: A fuzzy ontology for medical document retrieval. In: Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, vol. 32, pp. 121–126. ACM Press, New York (2004)
Pontius, J.U., Wagner, L., Schuler, G.D.: The NCBI Handbook. In: chapter UniGene: a unified view of the transcriptome. National Center for Biotechnology Information (2003)
Pruitt, K.D., Maglott, D.R.: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Research 29(1), 137–140 (2001)
Sætre, R.: GeneTUC, A Biolinguistic Project (Master Project) Norwegian University of Science and Technology, Norway (June 2002)
Sætre, R.: Natural Language Processing of Gene Information. Master’s thesis, Norwegian University of Science and Technology, Norway and CIS/LMU Munchen, Germany (April 2003)
Shah, U., Finin, T., Joshi, A.: Information Retrieval on the Semantic Web. In: Proceedings of CIKM 2002, pp. 461–468. ACM Press, New York (2002)
Shatkay, H., Feldman, R.: Mining the Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology 10(6), 821–855 (2003)
Tanabe, L., Wilbur, W.J.: Tagging gene and protein names in biomedical text. Bioinformatics 18(8), 1124–1132 (2002)
Torii, M., Vijay-Shanker, K.: Using Unlabeled MEDLINE Abstracts for Biological Named Entity Classification. In: Proceedings of the 13th Conference on Genome Informatics, pp. 567–568 (2002)
Tsuruoka, Y., Tsuji, J.: Probabilistic Term Variant Generator for Biomedical Terms. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 167–173. ACM, New York (July/August 2003)
Tveit, A., Sætre, R., Steigedal, T.S., Lægreid, A.: ProtChew: Automatic Extraction of Protein Names from Biomedical Literature. In: Proceedings of the International Workshop on Biomedical Data Engineering (BMDE 2005, in conjunction with ICDE 2005), Tokyo, Japan. IEEE Press, Los Alamitos (April 2005) (forthcoming)
Wong, L.: A Protein Interaction Extraction System. In: Proceedings of the Pacific Symposium on Biocomputing 2001, pp. 520–530 (2001)
Wong, L.: Gaps in Text-based Knowledge Discovery for Biology. Drug Discovery Today 7(17), 897–898 (2002)
Yu, H., Hatzivassiloglou, V., Friedman, C., Rzhetsky, A., Wilbur, W.J.: Automatic Extraction of Gene and Protein Synonyms from MEDLINE and Journal Articles. In: Proceedings of the AMIA Symposium 2002, pp. 919–923 (2002)
Yu, H., Hatzivassiloglou, V., Rzhetsky, A., Wilbur, W.J.: Automatically identifying gene/protein terms in MEDLINE abstracts. Journal of Biomedical Informatics 35(5/6), 322–330 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sætre, R., Tveit, A., Steigedal, T.S., Lægreid, A. (2005). Semantic Annotation of Biomedical Literature Using Google. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2005. ICCSA 2005. Lecture Notes in Computer Science, vol 3482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424857_36
Download citation
DOI: https://doi.org/10.1007/11424857_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25862-9
Online ISBN: 978-3-540-32045-6
eBook Packages: Computer ScienceComputer Science (R0)