Skip to main content

Towards an Automated Analysis of Biomedical Abstracts

  • Conference paper
Data Integration in the Life Sciences (DILS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4075))

Included in the following conference series:

Abstract

An essential part of bioinformatic research concerns the iterative process of validating hypotheses by analyzing facts stored in databases and in published literature. This process can be enhanced by language technology methods, in particular by automatic text understanding. Since it is becoming increasingly difficult to keep up with the vast number of scientific articles being published, there is a need for more easily accessible representations of the current knowledge. The goal of the research described in this paper is to develop a system aimed to support the large-scale research on metabolic and regulatory pathways by extracting relations between biological objects from descriptions found in literature. We present and evaluate the procedures for semantico-syntactic tagging, dividing the text into parts concerning previous research and current research, syntactic parsing, and transformation of syntactic trees into logical representations similar to the pathway graphs utilized in the Kyoto Encyclopaedia of Genes and Genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baxevanis, A.D., Ouellette, B.F.F.: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd edn. Wiley-Interscience, Chichester (2004)

    Google Scholar 

  2. Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Press (2001)

    Google Scholar 

  3. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004)

    Article  Google Scholar 

  4. Becker, K.G., Hosack, D.A., Dennis Jr, G., Lempicki, R.A., Bright, T.J., Cheadle, C., Engel, J.: PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics 4, 61 (2003)

    Article  Google Scholar 

  5. Chaussabel, D., Sher, A.: Mining microarray expression data by literature profiling. Genome Biol. 3(10), research0055.1–research0055.16 (2002)

    Google Scholar 

  6. Darasiela, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I.: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20(5), 604-611 (2004)

    Google Scholar 

  7. Jelier, R., Jenster, G., Dorssers, L.C.J., van der Eijk, C.C., van Mulligen, E.M., Mons, B., Kors, J.A.: Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics 21(9), 2049–2058 (2005)

    Article  Google Scholar 

  8. Jenssen, T.K., Öberg, L.M.K., Andersson, M.L., Komorowski, J.: Methods for Large-Scale Mining of Networks of Human Genes. In: Proc. of The First SIAM Conference on Datamining, Chicago (April 2001)

    Google Scholar 

  9. Stapley, B., Benoit, G.: Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in Medline abstracts. In: Proceedings of PSB 2000, Hawaii, USA, pp. 529–540 (2000)

    Google Scholar 

  10. Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., Weinstein, J.N.: MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27(6), 1210–1217 (1999)

    Google Scholar 

  11. Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V., Garner, H.R.: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20, 389–398 (2004)

    Article  Google Scholar 

  12. Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: A natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17 (2001)

    Google Scholar 

  13. Hahn, U., Romacker, M., Schulz, S.: Creating knowledge repositories from biomedical reports: The MEDSYNDIKATE text mining system. In: Pacific Symposium on Biocomputing 2002, Kauai, Hawaii, USA, pp. 338–349 (2002)

    Google Scholar 

  14. Park, J.C., Kim, H.S., Kim, J.J.: Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar. In: Proceedings of PSB 2001, Hawaii, USA, pp. 396–407 (2001)

    Google Scholar 

  15. Pustejovsky, J., Castano, J.: Robust relational parsing over biomedical literature: Extracting inhibit relations. In: Proceedings of PSB 2002, Hawaii, USA, pp. 362–373 (2002)

    Google Scholar 

  16. Hishiki, T., Collier, N., Nobata, C., Okazaki-Ohta, T., Ogata, N., Sekimizu, T., Steiner, R., Park, H.S., Tsuji, J.: Developing NLP Tools for Genome Informatics: An Information Extraction Perspective. In: Proceedings of the 9th Workshop on Genome Informatics, pp. 81–90 (1998)

    Google Scholar 

  17. Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17, 155–161 (2001)

    Article  Google Scholar 

  18. Ng, S.-K., Wong, M.: Toward Routine Automatic Pathway Discovery from On-Line Scientific Text Abstracts. Genome Informatics 10, 104–112 (1999)

    Google Scholar 

  19. Rindflesch, T., Tanabe, L., Weinstein, J., Hunter, L.: EDGAR: Extraction of drugs, genes, and relations from biomedical literature. In: Proceedings of PSB 2000, Hawaii, USA, pp. 517–528 (2000)

    Google Scholar 

  20. Novichkova, S., Egorov, S., Daraselia, N.: MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics 19(13), 1699–1706 (2003)

    Article  Google Scholar 

  21. Rosario, B., Hearst, M.A.: Classifying semantic relations in bioscience texts. In: Proceedings of ACL 2004, Barcelona, Spain (2004)

    Google Scholar 

  22. Roth, D., Yih, W.: A linear programming formulation for global inference in natural language tasks. In: Proc. CoNLL (2004)

    Google Scholar 

  23. Gawronska, B., Erlendsson, B.: Syntactic, Semantic and Referential Patterns in Biomedical Texts: towards in-depth text comprehension for the purpose of bioinformatics. In: Sharp, B. (ed.) Natural Language Understanding and Cognitive Science, Miami, USA, May 2005. Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science NLUCS 2005, pp. 68–77 (2005)

    Google Scholar 

  24. Gawronska, B., Erlendsson, B., Olsson, B.: Tracking Biological Relations in Text: A Referent Grammar Approach. In: Biomedical Ontologies and Text Processing, Workshop held in conjunction with the European Conference on Computational Biology, ECCB 2005, Madrid, Spain, September 28 (2005)

    Google Scholar 

  25. Gawronska, B., Olsson, B., de Vin, L.: Natural Language Technology In Multi-Source Information Fusion. In: Proceedings of the International IPSI 2004k Conference, Kopaonik, Serbia, (April 2004); (published on CD with ISBN 86-7466-117-3)

    Google Scholar 

  26. Olsson, B., Gawronska, B., Erlendsson, B.: Deriving Pathway Maps from Automated Text Analysis using a Grammar-based Approach. In: Proceedings of the 2nd Moscow Conference on Computational Molecular Biology (MCCMB), July 18-21, 2005 Moscow, Russia (2005)

    Google Scholar 

  27. Olsson, B., Gawronska, B., Erlendsson, B.: Deriving Pathway Maps from Automated Text Analysis using a Grammar-based Approach. Journal of Bioinformatics and Computational Biology (special issue) (to appear)

    Google Scholar 

  28. Gamalielsson, J., Olsson, B.: Gosap: Gene Ontology Based Semantic Alignment of Biological Pathways (to appear)

    Google Scholar 

  29. Gawronska, B., Erlendsson, B., Duczak, H.: Extracting semantic classes and morphosyntactic features for English-Polish Machine Translation. In: Proceedings of the 9th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2002), Keihanna, Japan, pp. 63–73 (2002)

    Google Scholar 

  30. Gawronska, B., Torstensson, N., Erlendsson, B.: Defining and Classifying Space Builders for Information Extraction. In: Sharp, B. (ed.) Proceedings of NLUCS- (Natural Language Understanding and Cognitive Science), Porto, Portugal, pp. 15–27 (April, 2004)

    Google Scholar 

  31. Miller, G.A.: WordNet: An on-line lexical database of English. In: Communications of ACM, vol. 38(11), pp. 39–41 (1995)

    Google Scholar 

  32. Kyoto Encyclopaedia of Genes and Genomes (2005), http://www.genome.jp/kegg/ , http://www.genome.jp/kegg/document/help_pathway.html

  33. World Wide Web Consortium (W3C) (2005), http://www.w3.org/TR/xpath

  34. The Stanford Natural Language Processing Group (2006), http://www-nlp.stanford.edu/software/tagger.shtml

  35. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gawronska, B., Erlendsson, B., Olsson, B. (2006). Towards an Automated Analysis of Biomedical Abstracts. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_6

Download citation

  • DOI: https://doi.org/10.1007/11799511_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36593-8

  • Online ISBN: 978-3-540-36595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics