Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing

Burger, John D.; Doughty, Emily; Bayer, Sam; Tresner-Kirsch, David; Wellner, Ben; Aberdeen, John; Lee, Kyungjoon; Kann, Maricel G.; Hirschman, Lynette

doi:10.1007/978-3-642-31040-9_8

Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing

John D. Burger²⁰,
Emily Doughty²¹,
Sam Bayer²⁰,
David Tresner-Kirsch²⁰,
Ben Wellner²⁰,
John Aberdeen²⁰,
Kyungjoon Lee²²,
Maricel G. Kann²¹ &
…
Lynette Hirschman²⁰

Conference paper

585 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7348))

Abstract

We describe an experiment to elicit judgments on the validity of gene-mutation relations in MEDLINE abstracts via crowdsourcing. The biomedical literature contains rich information on such relations, but the correct pairings are difficult to extract automatically because a single abstract may mention multiple genes and mutations. We ran an experiment presenting candidate gene-mutation relations as Amazon Mechanical Turk HITs (human intelligence tasks). We extracted candidate mutations from a corpus of 250 MEDLINE abstracts using EMU combined with curated gene lists from NCBI. The resulting document-level annotations were projected into the abstract text to highlight mentions of genes and mutations for review. Reviewers returned results within 36 hours. Initial weighted results evaluated against a gold standard of expert curated gene-mutation relations achieved 85% accuracy, with the best reviewer achieving 91% accuracy. We expect performance to increase with further experimentation, providing a scalable approach for rapid manual curation of important biological relations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amberger, J., Bocchini, C.A., Scott, A.F., Hamosh, A.: McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 37(Database issue), 793–796 (2009)
Article Google Scholar
Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29(1), 308–311 (2001)
Article Google Scholar
Thorisson, G.A., Lancaster, O., Free, R.C., Hastings, R.K., Sarmah, P., Dash, D., Brahmachari, S.K., Brookes, A.J.: HGVbaseG2P: a central genetic association database. Nucleic Acids Res. 37(Database issue), D797–D802 (2009)
Article Google Scholar
Stenson, P.D., Ball, E.V., Howells, K., Phillips, A.D., Mort, M., Cooper, D.N.: The Human Gene Mutation Database: providing a comprehensive central mutation database for molecular diagnostics and personalized genomics. Human Genomics 4(2), 69–72 (2009)
Google Scholar
Samuels, M.E., Rouleau, G.A.: The case for locus-specific databases. Nat. Rev. Genet. 12(6), 378–379 (2011)
Article Google Scholar
Klein, R.J., Zeiss, C., Chew, E.Y., Tsai, J.Y., Sackler, R.S., Haynes, C., Henning, A.K., SanGiovanni, J.P., Mane, S.M., Mayne, S.T., Bracken, M.B., Ferris, F.L., Ott, J., Barnstable, C., Hoh, J.: Complement factor H polymorphism in age-related macular degeneration. Science 308(5720), 385–389 (2005)
Article Google Scholar
Denny, J.C., Ritchie, M.D., Basford, M.A., Pulley, J.M., Bastarache, L., Brown-Gentry, K., Wang, D., Masys, D.R., Roden, D.M., Crawford, D.C.: PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26(9), 1205–1210 (2010)
Article Google Scholar
Tatonetti, N.P., Dudley, J.T., Sagreiya, H., Butte, A.J., Altman, R.B.: An integrative method for scoring candidate genes from association studies: application to warfarin dosing. BMC Bioinformatics 11(suppl. 9), S9 (2010)
Article Google Scholar
Caporaso, J.G., Baumgartner Jr., W.A., Randolph, D.A., Cohen, K.B., Hunter, L.: MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics 23(14), 1862–1865 (2007)
Article Google Scholar
Doughty, E., Kertesz-Farkas, A., Bodenreider, O., Thompson, G., Adadey, A., Peterson, T., Kann, M.G.: Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. Bioinformatics 27(3), 408–415 (2011)
Article Google Scholar
Winnenburg, R., Plake, C., Schroeder, M.: Improved mutation tagging with gene identifiers applied to membrane protein stability prediction. BMC Bioinformatics 10(suppl. 8), S3 (2009)
Article Google Scholar
Rebholz-Schuhmann, D., Marcel, S., Albert, S., Tolle, R., Casari, G., Kirsch, H.: Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res. 32(1), 135–142 (2004)
Article Google Scholar
Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20(4), 557–568 (2004)
Article Google Scholar
Erdogmus, M., Sezerman, O.U.: Application of automatic mutation-gene pair extraction to diseases. J. Bioinform. Comput. Biol. 5(6), 1261–1275 (2007)
Article Google Scholar
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proc. AMIA Symp., pp. 17–21 (2001)
Google Scholar
Aberdeen, J., Bayer, S., Yeniterzi, R., Wellner, B., Clark, C., Hanauer, D., Malin, B., Hirschman, L.: The MITRE Identification Scrubber Toolkit: design, training, and assessment. International Journal of Medical Informatics 79(12), 849–859 (2010)
Article Google Scholar
Callison-Burch, C., Dredze, M.: Creating speech and language data with Amazon’s Mechanical Turk NAACL HLT. In: 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Association for Computational Linguistics, Los Angeles (2010)
Google Scholar
Yetisgen-Yildiz, M., Solti, I., Xia, F., Halgrim, S.: Preliminary Experiments with Amazon’s Mechanical Turk for Annotating Medical Named Entities. In: NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 180–183. Assn for Comp. Ling, Los Angeles (2010)
Google Scholar
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The MITRE Corporation, Bedford, MA, USA
John D. Burger, Sam Bayer, David Tresner-Kirsch, Ben Wellner, John Aberdeen & Lynette Hirschman
University of Maryland, Baltimore County, Baltimore, MD, USA
Emily Doughty & Maricel G. Kann
Harvard Medical School, Boston, MA, USA
Kyungjoon Lee

Authors

John D. Burger
View author publications
You can also search for this author in PubMed Google Scholar
Emily Doughty
View author publications
You can also search for this author in PubMed Google Scholar
Sam Bayer
View author publications
You can also search for this author in PubMed Google Scholar
David Tresner-Kirsch
View author publications
You can also search for this author in PubMed Google Scholar
Ben Wellner
View author publications
You can also search for this author in PubMed Google Scholar
John Aberdeen
View author publications
You can also search for this author in PubMed Google Scholar
Kyungjoon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Maricel G. Kann
View author publications
You can also search for this author in PubMed Google Scholar
Lynette Hirschman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, 20894, Bethesda, MD, USA
Olivier Bodenreider & Bastien Rance &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burger, J.D. et al. (2012). Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing. In: Bodenreider, O., Rance, B. (eds) Data Integration in the Life Sciences. DILS 2012. Lecture Notes in Computer Science(), vol 7348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31040-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-31040-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31039-3
Online ISBN: 978-3-642-31040-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics