Skip to main content

Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7348))

Abstract

We describe an experiment to elicit judgments on the validity of gene-mutation relations in MEDLINE abstracts via crowdsourcing. The biomedical literature contains rich information on such relations, but the correct pairings are difficult to extract automatically because a single abstract may mention multiple genes and mutations. We ran an experiment presenting candidate gene-mutation relations as Amazon Mechanical Turk HITs (human intelligence tasks). We extracted candidate mutations from a corpus of 250 MEDLINE abstracts using EMU combined with curated gene lists from NCBI. The resulting document-level annotations were projected into the abstract text to highlight mentions of genes and mutations for review. Reviewers returned results within 36 hours. Initial weighted results evaluated against a gold standard of expert curated gene-mutation relations achieved 85% accuracy, with the best reviewer achieving 91% accuracy. We expect performance to increase with further experimentation, providing a scalable approach for rapid manual curation of important biological relations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amberger, J., Bocchini, C.A., Scott, A.F., Hamosh, A.: McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 37(Database issue), 793–796 (2009)

    Article  Google Scholar 

  2. Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29(1), 308–311 (2001)

    Article  Google Scholar 

  3. Thorisson, G.A., Lancaster, O., Free, R.C., Hastings, R.K., Sarmah, P., Dash, D., Brahmachari, S.K., Brookes, A.J.: HGVbaseG2P: a central genetic association database. Nucleic Acids Res. 37(Database issue), D797–D802 (2009)

    Article  Google Scholar 

  4. Stenson, P.D., Ball, E.V., Howells, K., Phillips, A.D., Mort, M., Cooper, D.N.: The Human Gene Mutation Database: providing a comprehensive central mutation database for molecular diagnostics and personalized genomics. Human Genomics 4(2), 69–72 (2009)

    Google Scholar 

  5. Samuels, M.E., Rouleau, G.A.: The case for locus-specific databases. Nat. Rev. Genet. 12(6), 378–379 (2011)

    Article  Google Scholar 

  6. Klein, R.J., Zeiss, C., Chew, E.Y., Tsai, J.Y., Sackler, R.S., Haynes, C., Henning, A.K., SanGiovanni, J.P., Mane, S.M., Mayne, S.T., Bracken, M.B., Ferris, F.L., Ott, J., Barnstable, C., Hoh, J.: Complement factor H polymorphism in age-related macular degeneration. Science 308(5720), 385–389 (2005)

    Article  Google Scholar 

  7. Denny, J.C., Ritchie, M.D., Basford, M.A., Pulley, J.M., Bastarache, L., Brown-Gentry, K., Wang, D., Masys, D.R., Roden, D.M., Crawford, D.C.: PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26(9), 1205–1210 (2010)

    Article  Google Scholar 

  8. Tatonetti, N.P., Dudley, J.T., Sagreiya, H., Butte, A.J., Altman, R.B.: An integrative method for scoring candidate genes from association studies: application to warfarin dosing. BMC Bioinformatics 11(suppl. 9), S9 (2010)

    Article  Google Scholar 

  9. Caporaso, J.G., Baumgartner Jr., W.A., Randolph, D.A., Cohen, K.B., Hunter, L.: MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics 23(14), 1862–1865 (2007)

    Article  Google Scholar 

  10. Doughty, E., Kertesz-Farkas, A., Bodenreider, O., Thompson, G., Adadey, A., Peterson, T., Kann, M.G.: Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. Bioinformatics 27(3), 408–415 (2011)

    Article  Google Scholar 

  11. Winnenburg, R., Plake, C., Schroeder, M.: Improved mutation tagging with gene identifiers applied to membrane protein stability prediction. BMC Bioinformatics 10(suppl. 8), S3 (2009)

    Article  Google Scholar 

  12. Rebholz-Schuhmann, D., Marcel, S., Albert, S., Tolle, R., Casari, G., Kirsch, H.: Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res. 32(1), 135–142 (2004)

    Article  Google Scholar 

  13. Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20(4), 557–568 (2004)

    Article  Google Scholar 

  14. Erdogmus, M., Sezerman, O.U.: Application of automatic mutation-gene pair extraction to diseases. J. Bioinform. Comput. Biol. 5(6), 1261–1275 (2007)

    Article  Google Scholar 

  15. Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proc. AMIA Symp., pp. 17–21 (2001)

    Google Scholar 

  16. Aberdeen, J., Bayer, S., Yeniterzi, R., Wellner, B., Clark, C., Hanauer, D., Malin, B., Hirschman, L.: The MITRE Identification Scrubber Toolkit: design, training, and assessment. International Journal of Medical Informatics 79(12), 849–859 (2010)

    Article  Google Scholar 

  17. Callison-Burch, C., Dredze, M.: Creating speech and language data with Amazon’s Mechanical Turk NAACL HLT. In: 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Association for Computational Linguistics, Los Angeles (2010)

    Google Scholar 

  18. Yetisgen-Yildiz, M., Solti, I., Xia, F., Halgrim, S.: Preliminary Experiments with Amazon’s Mechanical Turk for Annotating Medical Named Entities. In: NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 180–183. Assn for Comp. Ling, Los Angeles (2010)

    Google Scholar 

  19. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Burger, J.D. et al. (2012). Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing. In: Bodenreider, O., Rance, B. (eds) Data Integration in the Life Sciences. DILS 2012. Lecture Notes in Computer Science(), vol 7348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31040-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31040-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31039-3

  • Online ISBN: 978-3-642-31040-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics