Skip to main content

Semantic Annotation of Biomedical Literature Using Google

  • Conference paper
Computational Science and Its Applications – ICCSA 2005 (ICCSA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3482))

Included in the following conference series:

Abstract

With the increasing amount of biomedical literature, there is a need for automatic extraction of information to support biomedical researchers. Due to incomplete biomedical information databases, the extraction is not straightforward using dictionaries, and several approaches using contextual rules and machine learning have previously been proposed. Our work is inspired by the previous approaches, but is novel in the sense that it is using Google for semantic annotation of the biomedical words. The semantic annotation accuracy obtained – 52% on words not found in the Brown Corpus, Swiss-Prot or LocusLink (accessed using Gsearch.org) – is justifying further work in this direction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bickel, S., Brefeld, U., Faulstich, L., Hakenberg, J., Leser, U., Plake, C., Scheffer, T.: A Support Vector Machine classifier for gene name recognition. In: Proceedings of the EMBO Workshop: A Critical Assessment of Text Mining Methods in Molecular Biology (March 2004)

    Google Scholar 

  2. Blaschke, C., Andrade, M., Ouzounis, C., Valencia, A.: Automatic Extraction of biological information from scientific text: Protein-protein interactions. In: Proceedings of International Conference on Intelligent Systems for Molecular Biology, pp. 60–67. AAAI, Menlo Park (1999)

    Google Scholar 

  3. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365–370 (2003)

    Article  Google Scholar 

  4. Bunescu, R., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Comparative Experiments on Learning Information Extractors for Proteins and their Interactions. Journal Artificial Intelligence in Medicine: Special Issue on Summarization and Information Extraction from Medical Documents (Forthcoming) (2004)

    Google Scholar 

  5. Bunescu, R., Ge, R., Kate, R.J., Mooney, R.J., Wong, Y.W., Marcotte, E.M., Ramani, A.K.: Learning to Extract Proteins and their Interactions from Medline Abstracts. In: Proceedings of the ICML-2003 Workshop on Machine Learning in Bioinformatics, pp. 46–53 (August 2003)

    Google Scholar 

  6. Bunescu, R., Ge, R., Mooney, R.J., Marcotte, E., Ramani, A.K.: Extracting Gene and Protein Names from Biomedical Abstracts. Unpublished Technical Note, Machine Learning Research Group, University of Texas at Austin, USA (March 2002)

    Google Scholar 

  7. Cimiano, P., Staab, S.: Learning by Googling. SIGKDD Explorations Newsletter 6(2), 24–34 (2004)

    Article  Google Scholar 

  8. Cowie, J., Lehnert, W.: Information Extraction. Communications of the ACM 39(1), 80–91 (1996)

    Article  Google Scholar 

  9. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: SemTag and seeker: bootstrapping the semantic web via automated semantic annotation. In: Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 178–186. ACM, New York (2003)

    Chapter  Google Scholar 

  10. Dingare, S., Finkel, J., Manning, C., Nissim, M., Alex, B.: Exploring the Boundaries: Gene and Protein Identification in Biomedical Text. In: Proceedings of the BioCreative Workshop (March 2004)

    Google Scholar 

  11. Dingare, S., Finkel, J., Manning, C., Nissim, M., Alex, B., Grover, C.: Exploring the Boundaries: Gene and Protein Identification in Biomedical Text. Submitted to BMC Bioinformatics (2004)

    Google Scholar 

  12. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Submitted to Artificial Intelligence (2004)

    Google Scholar 

  13. Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Proceedings of Pacific Symposium on Biocomputing, pp. 707–718 (1998)

    Google Scholar 

  14. Ginter, F., Boberg, J., Jarvinen, J., Salakoski, T.: New Techniques for Disambiguation in Natural Language and Their Application to Biological Texts. Journal of Machine Learning Research 5, 605–621 (2004)

    MathSciNet  Google Scholar 

  15. Tsuji, J.i., Wong, L.: Natural Language Processing and Information Extraction in Biology. In: Proceedings of the Pacific Symposium on Biocomputing 2001, pp. 372–373 (2001)

    Google Scholar 

  16. Jenssen, T.-K., Lægreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 28(1), 21–28 (2001)

    Article  Google Scholar 

  17. Jiampojamarn, S.: Biological term extraction using classification methods. Presentation at Dalhousie Natural Language Processing Meeting (June 2004)

    Google Scholar 

  18. Kakade, V., Sharangpani, M.: Improving the Precision of Web Search for Medical Domain using Automatic Query Expansion. Online (2004)

    Google Scholar 

  19. Kruschwitz, U.: Automatically Acquired Domain Knowledge for ad hoc Search: Evaluation Results. In: Proceedings of the 2003 Intl. Conf. on Natural Language Processing and Knowledge Engineering (NLP-KE 2003). IEEE, Los Alamitos (2003)

    Google Scholar 

  20. Mukherjea, S., Subramaniam, L.V., Chanda, G., Sankararaman, S., Kothari, R., Batra, V., Bhardwaj, D., Srivastava, B.: Enhancing a biomedical information extraction system with dictionary mining and context disambiguation. IBM Journal of Research and Development 48(5/6), 693–701 (2004)

    Article  Google Scholar 

  21. Narayanaswamy, M., Ravikumar, K.E., Vijay-Shanker, K.: A biological named entity recognizer. In: Proceedings of the Pacific Symposium on Biocomputing 2003, pp. 427–438 (2003)

    Google Scholar 

  22. Parry, D.: A fuzzy ontology for medical document retrieval. In: Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, vol. 32, pp. 121–126. ACM Press, New York (2004)

    Google Scholar 

  23. Pontius, J.U., Wagner, L., Schuler, G.D.: The NCBI Handbook. In: chapter UniGene: a unified view of the transcriptome. National Center for Biotechnology Information (2003)

    Google Scholar 

  24. Pruitt, K.D., Maglott, D.R.: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Research 29(1), 137–140 (2001)

    Article  Google Scholar 

  25. Sætre, R.: GeneTUC, A Biolinguistic Project (Master Project) Norwegian University of Science and Technology, Norway (June 2002)

    Google Scholar 

  26. Sætre, R.: Natural Language Processing of Gene Information. Master’s thesis, Norwegian University of Science and Technology, Norway and CIS/LMU Munchen, Germany (April 2003)

    Google Scholar 

  27. Shah, U., Finin, T., Joshi, A.: Information Retrieval on the Semantic Web. In: Proceedings of CIKM 2002, pp. 461–468. ACM Press, New York (2002)

    Chapter  Google Scholar 

  28. Shatkay, H., Feldman, R.: Mining the Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology 10(6), 821–855 (2003)

    Article  Google Scholar 

  29. Tanabe, L., Wilbur, W.J.: Tagging gene and protein names in biomedical text. Bioinformatics 18(8), 1124–1132 (2002)

    Article  Google Scholar 

  30. Torii, M., Vijay-Shanker, K.: Using Unlabeled MEDLINE Abstracts for Biological Named Entity Classification. In: Proceedings of the 13th Conference on Genome Informatics, pp. 567–568 (2002)

    Google Scholar 

  31. Tsuruoka, Y., Tsuji, J.: Probabilistic Term Variant Generator for Biomedical Terms. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 167–173. ACM, New York (July/August 2003)

    Google Scholar 

  32. Tveit, A., Sætre, R., Steigedal, T.S., Lægreid, A.: ProtChew: Automatic Extraction of Protein Names from Biomedical Literature. In: Proceedings of the International Workshop on Biomedical Data Engineering (BMDE 2005, in conjunction with ICDE 2005), Tokyo, Japan. IEEE Press, Los Alamitos (April 2005) (forthcoming)

    Google Scholar 

  33. Wong, L.: A Protein Interaction Extraction System. In: Proceedings of the Pacific Symposium on Biocomputing 2001, pp. 520–530 (2001)

    Google Scholar 

  34. Wong, L.: Gaps in Text-based Knowledge Discovery for Biology. Drug Discovery Today 7(17), 897–898 (2002)

    Article  Google Scholar 

  35. Yu, H., Hatzivassiloglou, V., Friedman, C., Rzhetsky, A., Wilbur, W.J.: Automatic Extraction of Gene and Protein Synonyms from MEDLINE and Journal Articles. In: Proceedings of the AMIA Symposium 2002, pp. 919–923 (2002)

    Google Scholar 

  36. Yu, H., Hatzivassiloglou, V., Rzhetsky, A., Wilbur, W.J.: Automatically identifying gene/protein terms in MEDLINE abstracts. Journal of Biomedical Informatics 35(5/6), 322–330 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sætre, R., Tveit, A., Steigedal, T.S., Lægreid, A. (2005). Semantic Annotation of Biomedical Literature Using Google. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2005. ICCSA 2005. Lecture Notes in Computer Science, vol 3482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424857_36

Download citation

  • DOI: https://doi.org/10.1007/11424857_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25862-9

  • Online ISBN: 978-3-540-32045-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics