Skip to main content

Algorithm for Grounding Mutation Mentions from Text to Protein Sequences

  • Conference paper
Data Integration in the Life Sciences (DILS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6254))

Included in the following conference series:

Abstract

Protein mutations derived from in vitro experimental analysis are described in detail in scientific papers. Reuse of mutation impact annotations is an important subfield of bioinformatics for which mutation grounding is a critical step. Presented here is a method for grounding of textual mentions from papers describing mutational changes to proteins. We distinguish between grounding of mutation entities to protein database identifiers and to the correct positions on sequences extracted from protein databases. The grounding workflow coordinates the extraction of mutation, protein and organism mentions from texts and uses these to identify target sequences. Mutation mentions are sequentially mapped onto candidate proteins to facilitate their correct grounding to a protein sequence, independent of a protein-mutation tuple extraction task. Using a gold standard corpus of full text articles and corresponding protein sequences we show high performance precision and recall and discuss novel aspects of the algorithm in the context of previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baker, C.J.O., Witte, R.: Mutation Mining-A Prospector’s Tale. Information Systems Frontiers 8, 47–57 (2006)

    Article  Google Scholar 

  2. Bauher-Mehren, A., Furlong, L.I., Rautschka, M., Sanz, F.: From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways. BMC Bioinformatics 10 (suppl. 8), S6 (2009)

    Article  Google Scholar 

  3. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The Swiss-Prot Protein Knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003)

    Article  Google Scholar 

  4. Bromberg, Y., Rost, B.: SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 25(11), 3823–3835 (2007)

    Article  Google Scholar 

  5. Caporaso, J.G., Baumgartner Jr., W.A., Randolph, D.A., Cohen, K.B., Hunter, L.: MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics 23, 1862–1865 (2007)

    Article  Google Scholar 

  6. Coulet, A., Shah, N., Hunter, L., Barral, C., Altman, R.B.: Extraction of Genotype-Phenotype-Drug Relationships from Text: From Entity Recognition to Bioinformatics Application. In: Pacific Symposium on Biocomputing, vol. 15, pp. 485–487 (2010)

    Google Scholar 

  7. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework And Graphical Development Environment For Robust NLP Tools And Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL 2002 (2002)

    Google Scholar 

  8. Forbes, S.A., Bhamra, G., Bamford, S., Dawson, E., Kok, C., Clements, J., Menzies, A., Teague, J.W., Futreal, P.A., Stratton, M.R.: The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr. Protoc. Hum. Genet. 57, 10.11.1–10.11.26 (2008)

    Google Scholar 

  9. Gabdoulline, R.R., Ulbrich, S., Richter, S., Wade, R.C.: ProSAT2–Protein Structure Annotation Server. Nucleic Acids Res. 34, W79–W83 (2006)

    Article  Google Scholar 

  10. Hafner, C., Hartmann, A., Real, F.X., Hofstaedter, F., Landthaler, M., Vogt, T.: Spectrum of FGFR3 Mutations in Multiple Intraindividual Seborrheic Keratoses. Journal of Investigative Dermatology 27, 1883–1885 (2007)

    Article  Google Scholar 

  11. Cotton, R.G.H., Horaitis, O.: The Challenge of Documenting Mutation Across the Genome: The Human Genome Variation Society Approach. Hum Mut. 23, 447–452 (2004)

    Article  Google Scholar 

  12. Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20, 557–568 (2004)

    Article  Google Scholar 

  13. Izarzugaza, J.M.G., Baresic, A., McMillan, L.E.M., Yeats, C., Clegg, A.B., Orengo, C.A., Martin, A.C.R., Valencia, A.: An integrated approach to the interpretation of Single Amino Acid Polymorphisms within the framework of CATH and Gene3D. BMC Bioinformatics 10(Suppl. 8), S5 (2009)

    Article  Google Scholar 

  14. Kanagasabai, R., Choo, K.H., Ranganathan, S., Baker, C.J.O.: A Workflow for Mutation Extraction and Structure Annotation. J. Bioinformatics and Comp. Bio. 5(6), 1319–1337 (2007)

    Article  Google Scholar 

  15. Krallinger, M., Izarzugaza, J.M.G., Rodriguez-Penagos, C., Valencia, A.: Extraction of human kinase mutations from literature, databases and genotyping studies. BMC Bioinformatics 10 (suppl. 8), S1 (2009)

    Article  Google Scholar 

  16. Rebholz-Schuhmann, D., Marcel, S., Albert, S., Tolle, R., Casari, G., Kirsch, H.: Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res. 32, 135–142 (2004)

    Article  Google Scholar 

  17. Winnenburg, R., Plake, C., Shroeder, M.: Improved mutation tagging with gene identifiers applied to membrane protein stability prediction. BMC Bioinformatics 10 (suppl. 8), S3 (2009)

    Article  Google Scholar 

  18. Witte, R., Baker, C.J.O.: Towards a Systematic Evaluation of protein Mutation Extraction Systems. J. Bioinformatics and Comp. Bio. 5(6), 1339–1359 (2007)

    Article  Google Scholar 

  19. Yip, Y.L., Lachenal, N., Pillet, V., Veuthey, A.-L.: Retrieving mutation-specific information for human proteins in UniProt/Swiss-Prot Knowledgebase. J. Bioinformatics and Comp. Bio. 5(6), 1215–1231 (2007)

    Article  Google Scholar 

  20. Witte, R., Kappler, T.: Enhanced semantic access to the protein engineering literature using ontologies populated by text mining. International Journal of Bioinformatics Research and Applications 3(2), 389–413 (2007)

    Article  Google Scholar 

  21. Erdogmus, M., Sezerman, U.: Application of automatic mutation-gene pair extraction to diseases. J. Bioinformatics and Comp. Bio. 5(6), 1261–1275 (2007)

    Article  Google Scholar 

  22. Siezen, R.J., Leunissen, J.A.M.: Subtilases: the superfamily of subtilisin-like serine proteases. Protein Science 6(3), 501–523 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bergman Laurila, J., Kanagasabai, R., Baker, C.J.O. (2010). Algorithm for Grounding Mutation Mentions from Text to Protein Sequences. In: Lambrix, P., Kemp, G. (eds) Data Integration in the Life Sciences. DILS 2010. Lecture Notes in Computer Science(), vol 6254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15120-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15120-0_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15119-4

  • Online ISBN: 978-3-642-15120-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics