Abstract
An essential part of bioinformatic research concerns the iterative process of validating hypotheses by analyzing facts stored in databases and in published literature. This process can be enhanced by language technology methods, in particular by automatic text understanding. Since it is becoming increasingly difficult to keep up with the vast number of scientific articles being published, there is a need for more easily accessible representations of the current knowledge. The goal of the research described in this paper is to develop a system aimed to support the large-scale research on metabolic and regulatory pathways by extracting relations between biological objects from descriptions found in literature. We present and evaluate the procedures for semantico-syntactic tagging, dividing the text into parts concerning previous research and current research, syntactic parsing, and transformation of syntactic trees into logical representations similar to the pathway graphs utilized in the Kyoto Encyclopaedia of Genes and Genomes.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baxevanis, A.D., Ouellette, B.F.F.: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd edn. Wiley-Interscience, Chichester (2004)
Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Press (2001)
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004)
Becker, K.G., Hosack, D.A., Dennis Jr, G., Lempicki, R.A., Bright, T.J., Cheadle, C., Engel, J.: PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics 4, 61 (2003)
Chaussabel, D., Sher, A.: Mining microarray expression data by literature profiling. Genome Biol. 3(10), research0055.1–research0055.16 (2002)
Darasiela, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I.: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20(5), 604-611 (2004)
Jelier, R., Jenster, G., Dorssers, L.C.J., van der Eijk, C.C., van Mulligen, E.M., Mons, B., Kors, J.A.: Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics 21(9), 2049–2058 (2005)
Jenssen, T.K., Öberg, L.M.K., Andersson, M.L., Komorowski, J.: Methods for Large-Scale Mining of Networks of Human Genes. In: Proc. of The First SIAM Conference on Datamining, Chicago (April 2001)
Stapley, B., Benoit, G.: Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in Medline abstracts. In: Proceedings of PSB 2000, Hawaii, USA, pp. 529–540 (2000)
Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., Weinstein, J.N.: MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27(6), 1210–1217 (1999)
Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V., Garner, H.R.: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20, 389–398 (2004)
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: A natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17 (2001)
Hahn, U., Romacker, M., Schulz, S.: Creating knowledge repositories from biomedical reports: The MEDSYNDIKATE text mining system. In: Pacific Symposium on Biocomputing 2002, Kauai, Hawaii, USA, pp. 338–349 (2002)
Park, J.C., Kim, H.S., Kim, J.J.: Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar. In: Proceedings of PSB 2001, Hawaii, USA, pp. 396–407 (2001)
Pustejovsky, J., Castano, J.: Robust relational parsing over biomedical literature: Extracting inhibit relations. In: Proceedings of PSB 2002, Hawaii, USA, pp. 362–373 (2002)
Hishiki, T., Collier, N., Nobata, C., Okazaki-Ohta, T., Ogata, N., Sekimizu, T., Steiner, R., Park, H.S., Tsuji, J.: Developing NLP Tools for Genome Informatics: An Information Extraction Perspective. In: Proceedings of the 9th Workshop on Genome Informatics, pp. 81–90 (1998)
Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17, 155–161 (2001)
Ng, S.-K., Wong, M.: Toward Routine Automatic Pathway Discovery from On-Line Scientific Text Abstracts. Genome Informatics 10, 104–112 (1999)
Rindflesch, T., Tanabe, L., Weinstein, J., Hunter, L.: EDGAR: Extraction of drugs, genes, and relations from biomedical literature. In: Proceedings of PSB 2000, Hawaii, USA, pp. 517–528 (2000)
Novichkova, S., Egorov, S., Daraselia, N.: MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics 19(13), 1699–1706 (2003)
Rosario, B., Hearst, M.A.: Classifying semantic relations in bioscience texts. In: Proceedings of ACL 2004, Barcelona, Spain (2004)
Roth, D., Yih, W.: A linear programming formulation for global inference in natural language tasks. In: Proc. CoNLL (2004)
Gawronska, B., Erlendsson, B.: Syntactic, Semantic and Referential Patterns in Biomedical Texts: towards in-depth text comprehension for the purpose of bioinformatics. In: Sharp, B. (ed.) Natural Language Understanding and Cognitive Science, Miami, USA, May 2005. Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science NLUCS 2005, pp. 68–77 (2005)
Gawronska, B., Erlendsson, B., Olsson, B.: Tracking Biological Relations in Text: A Referent Grammar Approach. In: Biomedical Ontologies and Text Processing, Workshop held in conjunction with the European Conference on Computational Biology, ECCB 2005, Madrid, Spain, September 28 (2005)
Gawronska, B., Olsson, B., de Vin, L.: Natural Language Technology In Multi-Source Information Fusion. In: Proceedings of the International IPSI 2004k Conference, Kopaonik, Serbia, (April 2004); (published on CD with ISBN 86-7466-117-3)
Olsson, B., Gawronska, B., Erlendsson, B.: Deriving Pathway Maps from Automated Text Analysis using a Grammar-based Approach. In: Proceedings of the 2nd Moscow Conference on Computational Molecular Biology (MCCMB), July 18-21, 2005 Moscow, Russia (2005)
Olsson, B., Gawronska, B., Erlendsson, B.: Deriving Pathway Maps from Automated Text Analysis using a Grammar-based Approach. Journal of Bioinformatics and Computational Biology (special issue) (to appear)
Gamalielsson, J., Olsson, B.: Gosap: Gene Ontology Based Semantic Alignment of Biological Pathways (to appear)
Gawronska, B., Erlendsson, B., Duczak, H.: Extracting semantic classes and morphosyntactic features for English-Polish Machine Translation. In: Proceedings of the 9th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2002), Keihanna, Japan, pp. 63–73 (2002)
Gawronska, B., Torstensson, N., Erlendsson, B.: Defining and Classifying Space Builders for Information Extraction. In: Sharp, B. (ed.) Proceedings of NLUCS- (Natural Language Understanding and Cognitive Science), Porto, Portugal, pp. 15–27 (April, 2004)
Miller, G.A.: WordNet: An on-line lexical database of English. In: Communications of ACM, vol. 38(11), pp. 39–41 (1995)
Kyoto Encyclopaedia of Genes and Genomes (2005), http://www.genome.jp/kegg/ , http://www.genome.jp/kegg/document/help_pathway.html
World Wide Web Consortium (W3C) (2005), http://www.w3.org/TR/xpath
The Stanford Natural Language Processing Group (2006), http://www-nlp.stanford.edu/software/tagger.shtml
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gawronska, B., Erlendsson, B., Olsson, B. (2006). Towards an Automated Analysis of Biomedical Abstracts. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_6
Download citation
DOI: https://doi.org/10.1007/11799511_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36593-8
Online ISBN: 978-3-540-36595-2
eBook Packages: Computer ScienceComputer Science (R0)