Combining Biological Databases and Text Mining to Support New Bioinformatics Applications

Witte, René; Baker, Christopher J. O.

doi:10.1007/11428817_28

René Witte¹⁹ &
Christopher J. O. Baker²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3513))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1425 Accesses
4 Citations

Abstract

A large amount of biological knowledge today is only available from full-text research papers. Since neither manual database curators nor users can keep up with the rapidly expanding volume of scientific literature, natural language processing approaches are becoming increasingly important for bioinformatic projects.

In this paper, we go beyond simply extracting information from full-text articles by describing an architecture that supports targeted access to information from biological databases using the results derived from text mining of research papers, thereby integrating information from both sources within a biological application.

The described architecture is currently being used to extract information about protein mutations from full-text research papers. Text mining results drive the retrieval of sequence information from protein databases and the employment of algorithmic sequence analysis tools, which facilitate further data access from protein structure databases. Complex mapping of NLP derived text annotations to protein structures allows the rendering, with 3D structure visualization, of information not available in databases of mutation annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altschul, S.F., Gish, W., Miller, W., Meyers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. Journal of Molecular Biology 215(3), 403–410 (1990)
Google Scholar
Baker, C.J.O., Witte, R.: Enriching Protein Structure Visualizations with Mutation Annotations Obtained by Text Mining Protein Engineering Literature. In: The 3rd Canadian Working Conference on Computational Biology (CCCB 2004), Markham, Ontario, October 4 (2004) Co-located with IBM CASCON
Google Scholar
Corney, D.P.A., Buxton, B.F., Langdon, W.B., Jones, D.T.: BioRAT: extracting biological information from full-length papers. Bioinformatics (November 22, 2004)
Google Scholar
Couto, F.M., Silva, M.J., Coutinho, P.: ProFAL: PROtein Functional Annotation through Literature. In: JISBD, pp. 747–756 (2003)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the ACL (2002)
Google Scholar
Gabdoulline, R.R., Hoffmann, R., Leitnern, F., Wade, R.C.: ProSAT: functional annotation of protein 3D structures. Bioinformatics 19(13), 1723–1725 (2003)
Article Google Scholar
Kawabata, T., Ota, M., Nishikawa, K.: The protein mutant database. Nucleaic Acid Research 27(1) (1999)
Google Scholar
Marchler-Bauer, A., Panchenko, A.R., Shoemaker, B.A., Thiessen, P.A., Geer, L.Y., Bryant, S.H.: CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Research 30(1), 281–283 (2002)
Article Google Scholar
Müller, H.-M., Kenny, E.E., Sternberg, P.W.: Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature. PLoS Biology 2(11), 1984–1998 (2004), www.plosbiology.org
Article Google Scholar
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. of the National Academy of Sciences of the USA 85(8) (1988)
Google Scholar
Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg, D.: BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Research, 32 (2004)
Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680 (1994)
Article Google Scholar
Witte, R.: An Integration Architecture for User-Centric Document Creation, Retrieval, and Analysis. In: Proceedings of the VLDB Workshop on Information Integration on the Web (IIWeb), Toronto, Canada, August 30, pp. 141–144 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Program Structures and Data Organization (IPD), Universität Karlsruhe (TH), Germany
René Witte
Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada
Christopher J. O. Baker

Authors

René Witte
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. O. Baker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
Andrés Montoyo
Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Rafael Muńoz
Lab. CEDRIC, CNAM, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Witte, R., Baker, C.J.O. (2005). Combining Biological Databases and Text Mining to Support New Bioinformatics Applications. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_28

Download citation

DOI: https://doi.org/10.1007/11428817_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26031-8
Online ISBN: 978-3-540-32110-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics