Algorithm for Grounding Mutation Mentions from Text to Protein Sequences

Bergman Laurila, Jonas; Kanagasabai, Rajaraman; Baker, Christopher J. O.

doi:10.1007/978-3-642-15120-0_10

Jonas Bergman Laurila²¹,
Rajaraman Kanagasabai²² &
Christopher J. O. Baker²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6254))

Included in the following conference series:

International Conference on Data Integration in the Life Sciences

525 Accesses

Abstract

Protein mutations derived from in vitro experimental analysis are described in detail in scientific papers. Reuse of mutation impact annotations is an important subfield of bioinformatics for which mutation grounding is a critical step. Presented here is a method for grounding of textual mentions from papers describing mutational changes to proteins. We distinguish between grounding of mutation entities to protein database identifiers and to the correct positions on sequences extracted from protein databases. The grounding workflow coordinates the extraction of mutation, protein and organism mentions from texts and uses these to identify target sequences. Mutation mentions are sequentially mapped onto candidate proteins to facilitate their correct grounding to a protein sequence, independent of a protein-mutation tuple extraction task. Using a gold standard corpus of full text articles and corresponding protein sequences we show high performance precision and recall and discuss novel aspects of the algorithm in the context of previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

ResidueFinder: extracting individual residue mentions from protein literature

Article Open access 21 July 2021

FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining

Article Open access 28 June 2018

Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature

Article Open access 06 June 2015

References

Baker, C.J.O., Witte, R.: Mutation Mining-A Prospector’s Tale. Information Systems Frontiers 8, 47–57 (2006)
Article Google Scholar
Bauher-Mehren, A., Furlong, L.I., Rautschka, M., Sanz, F.: From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways. BMC Bioinformatics 10 (suppl. 8), S6 (2009)
Article Google Scholar
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The Swiss-Prot Protein Knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003)
Article Google Scholar
Bromberg, Y., Rost, B.: SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 25(11), 3823–3835 (2007)
Article Google Scholar
Caporaso, J.G., Baumgartner Jr., W.A., Randolph, D.A., Cohen, K.B., Hunter, L.: MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics 23, 1862–1865 (2007)
Article Google Scholar
Coulet, A., Shah, N., Hunter, L., Barral, C., Altman, R.B.: Extraction of Genotype-Phenotype-Drug Relationships from Text: From Entity Recognition to Bioinformatics Application. In: Pacific Symposium on Biocomputing, vol. 15, pp. 485–487 (2010)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework And Graphical Development Environment For Robust NLP Tools And Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL 2002 (2002)
Google Scholar
Forbes, S.A., Bhamra, G., Bamford, S., Dawson, E., Kok, C., Clements, J., Menzies, A., Teague, J.W., Futreal, P.A., Stratton, M.R.: The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr. Protoc. Hum. Genet. 57, 10.11.1–10.11.26 (2008)
Google Scholar
Gabdoulline, R.R., Ulbrich, S., Richter, S., Wade, R.C.: ProSAT2–Protein Structure Annotation Server. Nucleic Acids Res. 34, W79–W83 (2006)
Article Google Scholar
Hafner, C., Hartmann, A., Real, F.X., Hofstaedter, F., Landthaler, M., Vogt, T.: Spectrum of FGFR3 Mutations in Multiple Intraindividual Seborrheic Keratoses. Journal of Investigative Dermatology 27, 1883–1885 (2007)
Article Google Scholar
Cotton, R.G.H., Horaitis, O.: The Challenge of Documenting Mutation Across the Genome: The Human Genome Variation Society Approach. Hum Mut. 23, 447–452 (2004)
Article Google Scholar
Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20, 557–568 (2004)
Article Google Scholar
Izarzugaza, J.M.G., Baresic, A., McMillan, L.E.M., Yeats, C., Clegg, A.B., Orengo, C.A., Martin, A.C.R., Valencia, A.: An integrated approach to the interpretation of Single Amino Acid Polymorphisms within the framework of CATH and Gene3D. BMC Bioinformatics 10(Suppl. 8), S5 (2009)
Article Google Scholar
Kanagasabai, R., Choo, K.H., Ranganathan, S., Baker, C.J.O.: A Workflow for Mutation Extraction and Structure Annotation. J. Bioinformatics and Comp. Bio. 5(6), 1319–1337 (2007)
Article Google Scholar
Krallinger, M., Izarzugaza, J.M.G., Rodriguez-Penagos, C., Valencia, A.: Extraction of human kinase mutations from literature, databases and genotyping studies. BMC Bioinformatics 10 (suppl. 8), S1 (2009)
Article Google Scholar
Rebholz-Schuhmann, D., Marcel, S., Albert, S., Tolle, R., Casari, G., Kirsch, H.: Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res. 32, 135–142 (2004)
Article Google Scholar
Winnenburg, R., Plake, C., Shroeder, M.: Improved mutation tagging with gene identifiers applied to membrane protein stability prediction. BMC Bioinformatics 10 (suppl. 8), S3 (2009)
Article Google Scholar
Witte, R., Baker, C.J.O.: Towards a Systematic Evaluation of protein Mutation Extraction Systems. J. Bioinformatics and Comp. Bio. 5(6), 1339–1359 (2007)
Article Google Scholar
Yip, Y.L., Lachenal, N., Pillet, V., Veuthey, A.-L.: Retrieving mutation-specific information for human proteins in UniProt/Swiss-Prot Knowledgebase. J. Bioinformatics and Comp. Bio. 5(6), 1215–1231 (2007)
Article Google Scholar
Witte, R., Kappler, T.: Enhanced semantic access to the protein engineering literature using ontologies populated by text mining. International Journal of Bioinformatics Research and Applications 3(2), 389–413 (2007)
Article Google Scholar
Erdogmus, M., Sezerman, U.: Application of automatic mutation-gene pair extraction to diseases. J. Bioinformatics and Comp. Bio. 5(6), 1261–1275 (2007)
Article Google Scholar
Siezen, R.J., Leunissen, J.A.M.: Subtilases: the superfamily of subtilisin-like serine proteases. Protein Science 6(3), 501–523 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of New Brunswick, Saint John, New Brunswick, Canada
Jonas Bergman Laurila & Christopher J. O. Baker
Institute for Infocomm Research, Singapore
Rajaraman Kanagasabai

Authors

Jonas Bergman Laurila
View author publications
You can also search for this author in PubMed Google Scholar
Rajaraman Kanagasabai
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. O. Baker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, Linköpings universitet, 581 83, Linköping, Sweden
Patrick Lambrix
Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, 412 96, Gothenburg,, Sweden
Graham Kemp

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bergman Laurila, J., Kanagasabai, R., Baker, C.J.O. (2010). Algorithm for Grounding Mutation Mentions from Text to Protein Sequences. In: Lambrix, P., Kemp, G. (eds) Data Integration in the Life Sciences. DILS 2010. Lecture Notes in Computer Science(), vol 6254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15120-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-15120-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15119-4
Online ISBN: 978-3-642-15120-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics