Semantic Annotation of Biomedical Literature Using Google

Sætre, Rune; Tveit, Amund; Steigedal, Tonje S.; Lægreid, Astrid

doi:10.1007/11424857_36

Rune Sætre²⁴,
Amund Tveit^24,26,
Tonje S. Steigedal²⁵ &
…
Astrid Lægreid²⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3482))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1727 Accesses
4 Citations

Abstract

With the increasing amount of biomedical literature, there is a need for automatic extraction of information to support biomedical researchers. Due to incomplete biomedical information databases, the extraction is not straightforward using dictionaries, and several approaches using contextual rules and machine learning have previously been proposed. Our work is inspired by the previous approaches, but is novel in the sense that it is using Google for semantic annotation of the biomedical words. The semantic annotation accuracy obtained – 52% on words not found in the Brown Corpus, Swiss-Prot or LocusLink (accessed using Gsearch.org) – is justifying further work in this direction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bickel, S., Brefeld, U., Faulstich, L., Hakenberg, J., Leser, U., Plake, C., Scheffer, T.: A Support Vector Machine classifier for gene name recognition. In: Proceedings of the EMBO Workshop: A Critical Assessment of Text Mining Methods in Molecular Biology (March 2004)
Google Scholar
Blaschke, C., Andrade, M., Ouzounis, C., Valencia, A.: Automatic Extraction of biological information from scientific text: Protein-protein interactions. In: Proceedings of International Conference on Intelligent Systems for Molecular Biology, pp. 60–67. AAAI, Menlo Park (1999)
Google Scholar
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365–370 (2003)
Article Google Scholar
Bunescu, R., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Comparative Experiments on Learning Information Extractors for Proteins and their Interactions. Journal Artificial Intelligence in Medicine: Special Issue on Summarization and Information Extraction from Medical Documents (Forthcoming) (2004)
Google Scholar
Bunescu, R., Ge, R., Kate, R.J., Mooney, R.J., Wong, Y.W., Marcotte, E.M., Ramani, A.K.: Learning to Extract Proteins and their Interactions from Medline Abstracts. In: Proceedings of the ICML-2003 Workshop on Machine Learning in Bioinformatics, pp. 46–53 (August 2003)
Google Scholar
Bunescu, R., Ge, R., Mooney, R.J., Marcotte, E., Ramani, A.K.: Extracting Gene and Protein Names from Biomedical Abstracts. Unpublished Technical Note, Machine Learning Research Group, University of Texas at Austin, USA (March 2002)
Google Scholar
Cimiano, P., Staab, S.: Learning by Googling. SIGKDD Explorations Newsletter 6(2), 24–34 (2004)
Article Google Scholar
Cowie, J., Lehnert, W.: Information Extraction. Communications of the ACM 39(1), 80–91 (1996)
Article Google Scholar
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: SemTag and seeker: bootstrapping the semantic web via automated semantic annotation. In: Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 178–186. ACM, New York (2003)
Chapter Google Scholar
Dingare, S., Finkel, J., Manning, C., Nissim, M., Alex, B.: Exploring the Boundaries: Gene and Protein Identification in Biomedical Text. In: Proceedings of the BioCreative Workshop (March 2004)
Google Scholar
Dingare, S., Finkel, J., Manning, C., Nissim, M., Alex, B., Grover, C.: Exploring the Boundaries: Gene and Protein Identification in Biomedical Text. Submitted to BMC Bioinformatics (2004)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Submitted to Artificial Intelligence (2004)
Google Scholar
Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Proceedings of Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Google Scholar
Ginter, F., Boberg, J., Jarvinen, J., Salakoski, T.: New Techniques for Disambiguation in Natural Language and Their Application to Biological Texts. Journal of Machine Learning Research 5, 605–621 (2004)
MathSciNet Google Scholar
Tsuji, J.i., Wong, L.: Natural Language Processing and Information Extraction in Biology. In: Proceedings of the Pacific Symposium on Biocomputing 2001, pp. 372–373 (2001)
Google Scholar
Jenssen, T.-K., Lægreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 28(1), 21–28 (2001)
Article Google Scholar
Jiampojamarn, S.: Biological term extraction using classification methods. Presentation at Dalhousie Natural Language Processing Meeting (June 2004)
Google Scholar
Kakade, V., Sharangpani, M.: Improving the Precision of Web Search for Medical Domain using Automatic Query Expansion. Online (2004)
Google Scholar
Kruschwitz, U.: Automatically Acquired Domain Knowledge for ad hoc Search: Evaluation Results. In: Proceedings of the 2003 Intl. Conf. on Natural Language Processing and Knowledge Engineering (NLP-KE 2003). IEEE, Los Alamitos (2003)
Google Scholar
Mukherjea, S., Subramaniam, L.V., Chanda, G., Sankararaman, S., Kothari, R., Batra, V., Bhardwaj, D., Srivastava, B.: Enhancing a biomedical information extraction system with dictionary mining and context disambiguation. IBM Journal of Research and Development 48(5/6), 693–701 (2004)
Article Google Scholar
Narayanaswamy, M., Ravikumar, K.E., Vijay-Shanker, K.: A biological named entity recognizer. In: Proceedings of the Pacific Symposium on Biocomputing 2003, pp. 427–438 (2003)
Google Scholar
Parry, D.: A fuzzy ontology for medical document retrieval. In: Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, vol. 32, pp. 121–126. ACM Press, New York (2004)
Google Scholar
Pontius, J.U., Wagner, L., Schuler, G.D.: The NCBI Handbook. In: chapter UniGene: a unified view of the transcriptome. National Center for Biotechnology Information (2003)
Google Scholar
Pruitt, K.D., Maglott, D.R.: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Research 29(1), 137–140 (2001)
Article Google Scholar
Sætre, R.: GeneTUC, A Biolinguistic Project (Master Project) Norwegian University of Science and Technology, Norway (June 2002)
Google Scholar
Sætre, R.: Natural Language Processing of Gene Information. Master’s thesis, Norwegian University of Science and Technology, Norway and CIS/LMU Munchen, Germany (April 2003)
Google Scholar
Shah, U., Finin, T., Joshi, A.: Information Retrieval on the Semantic Web. In: Proceedings of CIKM 2002, pp. 461–468. ACM Press, New York (2002)
Chapter Google Scholar
Shatkay, H., Feldman, R.: Mining the Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology 10(6), 821–855 (2003)
Article Google Scholar
Tanabe, L., Wilbur, W.J.: Tagging gene and protein names in biomedical text. Bioinformatics 18(8), 1124–1132 (2002)
Article Google Scholar
Torii, M., Vijay-Shanker, K.: Using Unlabeled MEDLINE Abstracts for Biological Named Entity Classification. In: Proceedings of the 13th Conference on Genome Informatics, pp. 567–568 (2002)
Google Scholar
Tsuruoka, Y., Tsuji, J.: Probabilistic Term Variant Generator for Biomedical Terms. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 167–173. ACM, New York (July/August 2003)
Google Scholar
Tveit, A., Sætre, R., Steigedal, T.S., Lægreid, A.: ProtChew: Automatic Extraction of Protein Names from Biomedical Literature. In: Proceedings of the International Workshop on Biomedical Data Engineering (BMDE 2005, in conjunction with ICDE 2005), Tokyo, Japan. IEEE Press, Los Alamitos (April 2005) (forthcoming)
Google Scholar
Wong, L.: A Protein Interaction Extraction System. In: Proceedings of the Pacific Symposium on Biocomputing 2001, pp. 520–530 (2001)
Google Scholar
Wong, L.: Gaps in Text-based Knowledge Discovery for Biology. Drug Discovery Today 7(17), 897–898 (2002)
Article Google Scholar
Yu, H., Hatzivassiloglou, V., Friedman, C., Rzhetsky, A., Wilbur, W.J.: Automatic Extraction of Gene and Protein Synonyms from MEDLINE and Journal Articles. In: Proceedings of the AMIA Symposium 2002, pp. 919–923 (2002)
Google Scholar
Yu, H., Hatzivassiloglou, V., Rzhetsky, A., Wilbur, W.J.: Automatically identifying gene/protein terms in MEDLINE abstracts. Journal of Biomedical Informatics 35(5/6), 322–330 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Norwegian University of Science, and Technology, NO-7491, Trondheim, Norway
Rune Sætre & Amund Tveit
Department of Cancer Research and Molecular Medicine, Norwegian University of Science, and Technology, NO-7491, Trondheim, Norway
Tonje S. Steigedal & Astrid Lægreid
Norwegian Center for Patient Record Research, Norwegian University of Science, and Technology, NO-7491, Trondheim, Norway
Amund Tveit

Authors

Rune Sætre
View author publications
You can also search for this author in PubMed Google Scholar
Amund Tveit
View author publications
You can also search for this author in PubMed Google Scholar
Tonje S. Steigedal
View author publications
You can also search for this author in PubMed Google Scholar
Astrid Lægreid
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi
Department of Computer Science, University of Calgary, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
William Norris Professor, Head of the Computer Science and Engineering, Department University of Minnesota, USA
Vipin Kumar
Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganà
Institute of High Performance Computing, IHCP, 1 Science Park Road, 01-01 The Capricorn, Singapore Science Park II, 117528, Singapore
Heow Pueh Lee
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
Clayton School of IT, Monash University, 3800, Clayton, Australia
David Taniar
OptimaNumerics Ltd, Belfast, United Kingdom
Chih Jeng Kenneth Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sætre, R., Tveit, A., Steigedal, T.S., Lægreid, A. (2005). Semantic Annotation of Biomedical Literature Using Google. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2005. ICCSA 2005. Lecture Notes in Computer Science, vol 3482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424857_36

Download citation

DOI: https://doi.org/10.1007/11424857_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25862-9
Online ISBN: 978-3-540-32045-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics