Skip to main content

NLP-Based Curation of Bacterial Regulatory Networks

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2007)

Abstract

Manual curation of biological databases is an expensive and labor-intensive process in Genomics and Systems Biology. We report the implem-entation of a state-of-the-art, rule-based Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from abstracts and full-text papers. We evaluate its output against a manually-curated standard database, and test the possibilities and limitations of automatic and semi-automatic curation of the so-called biobibliome. We also propose a novel Regulatory Interaction Mining Markup Language suited for representing this data, useful both for biologists and for text-mining specialists.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S.: Partial parsing via finite-state cascades. In: Proceedings of the ESSLLI ’96 Robust Parsing Workshop, Prague, Czech Republic, pp. 8–15 (1996)

    Google Scholar 

  2. Corney, D.P.A., Buxton, B.F., Langdon, W.B., Jones, D.T.: BioRAT: Extracting Biological Information from Full-length Papers. Bioinformatics 20(17), 3206–3213 (2004)

    Article  Google Scholar 

  3. Demetriou, G., Gaizauskas, R.: Utilizing Text Mining Results: The PastaWeb System. In: Proceedings of the Association for Computational Linguistics Workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, July 11, pp. 77–84 (2002)

    Google Scholar 

  4. Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(Suppl. 1), S74–S82 (2001)

    Article  Google Scholar 

  5. Grivell, L.: Mining the bibliome: searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep. 3(3), 200–203 (2002)

    Article  Google Scholar 

  6. Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of BioCreAtIvE: critical assessment of information extraction for biology (Epub 2005 May 24). Bioinformatics 6(Suppl. 1), 1 (2005)

    Google Scholar 

  7. Hucka, M., Finney, A., Bornstein, B.J., Keating, S.M., Shapiro, B.E., Matthews, J., Kovitz, B.L., Schilstra, M.J., Funahashi, A., Doyle, J.C., Kitano, H.: Evolving a lingua franca and associated software infrastructure for computational systems biology: the Systems Biology Markup Language (SBML) project. System Biology (Stevenage) 1(1), 41–53 (2004)

    Article  Google Scholar 

  8. Karamanis, N., Lewin, I., Sealy, R., Drysdaley, R., Briscoe, E.: Integrating Natural Language Processing with Flybase Curation. In: Proceedings from Pacific Symposium on Biocomputing (to appear) (2007)

    Google Scholar 

  9. Karp, P.D.: Pathway databases: a case study in computational symbolic theories. Science 293(5537), 2040–2044 (2001)

    Article  Google Scholar 

  10. Krallinger, M., Erhardt, R.A., Valencia, A.: Text-mining approaches in molecular biology and biomedicine. Drug Discov. Today 10(6), 439–445 (2005)

    Article  Google Scholar 

  11. Rodriguez-Esteban, R., Iossifov, I., Rzhetsky, A.: Imitating Manual Curation of Text-Mined Facts in Biomedicine. PLoS Comput. Biol. 2(9), e118 (2006)

    Article  Google Scholar 

  12. Salgado, H., Gama-Castro, S., Peralta-Gil, M., Diaz-Peredo, E., Sanchez-Solano, F., Santos-Zavaleta, A., Martinez-Flores, I., Jimenez-Jacinto, V., Bonavides-Martinez, C., Segura-Salazar, J., Martinez-Antonio, A., Collado-Vides, J.: RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. 34(Database issue), D394–397 (2006)

    Article  Google Scholar 

  13. Saric, J., Jensen, L., Rojas, I.: Large-scale Extraction of Gene Regulation for Model Organisms in an ontological contex. In: Silico Biology, 5, 0004 (2004)

    Google Scholar 

  14. Saurí, R., Verhagen, M., Pustejovsky, J.: Annotating and Recognizing Event Modality in Text. In: Proceedings of the 19th International FLAIRS Conference, FLAIRS 2006, Melbourne Beach, Florida, May 11-13 (2006)

    Google Scholar 

  15. Scherf, M., Epple, A., Werner, T.: The next generation of literature analysis: integration of genomic analysis into text mining. Brief Bioinform. 6(3), 287–297 (2005)

    Article  Google Scholar 

  16. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, September (1994)

    Google Scholar 

  17. Yandell, M.D., Majoros, W.H.: Genomics and natural language processing. Nature Reviews Genetics 3(8), 601–610 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rodríguez-Penagos, C., Salgado, H., Martínez-Flores, I., Collado-Vides, J. (2007). NLP-Based Curation of Bacterial Regulatory Networks. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics