Abstract
We describe an experiment to elicit judgments on the validity of gene-mutation relations in MEDLINE abstracts via crowdsourcing. The biomedical literature contains rich information on such relations, but the correct pairings are difficult to extract automatically because a single abstract may mention multiple genes and mutations. We ran an experiment presenting candidate gene-mutation relations as Amazon Mechanical Turk HITs (human intelligence tasks). We extracted candidate mutations from a corpus of 250 MEDLINE abstracts using EMU combined with curated gene lists from NCBI. The resulting document-level annotations were projected into the abstract text to highlight mentions of genes and mutations for review. Reviewers returned results within 36 hours. Initial weighted results evaluated against a gold standard of expert curated gene-mutation relations achieved 85% accuracy, with the best reviewer achieving 91% accuracy. We expect performance to increase with further experimentation, providing a scalable approach for rapid manual curation of important biological relations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amberger, J., Bocchini, C.A., Scott, A.F., Hamosh, A.: McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 37(Database issue), 793–796 (2009)
Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29(1), 308–311 (2001)
Thorisson, G.A., Lancaster, O., Free, R.C., Hastings, R.K., Sarmah, P., Dash, D., Brahmachari, S.K., Brookes, A.J.: HGVbaseG2P: a central genetic association database. Nucleic Acids Res. 37(Database issue), D797–D802 (2009)
Stenson, P.D., Ball, E.V., Howells, K., Phillips, A.D., Mort, M., Cooper, D.N.: The Human Gene Mutation Database: providing a comprehensive central mutation database for molecular diagnostics and personalized genomics. Human Genomics 4(2), 69–72 (2009)
Samuels, M.E., Rouleau, G.A.: The case for locus-specific databases. Nat. Rev. Genet. 12(6), 378–379 (2011)
Klein, R.J., Zeiss, C., Chew, E.Y., Tsai, J.Y., Sackler, R.S., Haynes, C., Henning, A.K., SanGiovanni, J.P., Mane, S.M., Mayne, S.T., Bracken, M.B., Ferris, F.L., Ott, J., Barnstable, C., Hoh, J.: Complement factor H polymorphism in age-related macular degeneration. Science 308(5720), 385–389 (2005)
Denny, J.C., Ritchie, M.D., Basford, M.A., Pulley, J.M., Bastarache, L., Brown-Gentry, K., Wang, D., Masys, D.R., Roden, D.M., Crawford, D.C.: PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26(9), 1205–1210 (2010)
Tatonetti, N.P., Dudley, J.T., Sagreiya, H., Butte, A.J., Altman, R.B.: An integrative method for scoring candidate genes from association studies: application to warfarin dosing. BMC Bioinformatics 11(suppl. 9), S9 (2010)
Caporaso, J.G., Baumgartner Jr., W.A., Randolph, D.A., Cohen, K.B., Hunter, L.: MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics 23(14), 1862–1865 (2007)
Doughty, E., Kertesz-Farkas, A., Bodenreider, O., Thompson, G., Adadey, A., Peterson, T., Kann, M.G.: Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. Bioinformatics 27(3), 408–415 (2011)
Winnenburg, R., Plake, C., Schroeder, M.: Improved mutation tagging with gene identifiers applied to membrane protein stability prediction. BMC Bioinformatics 10(suppl. 8), S3 (2009)
Rebholz-Schuhmann, D., Marcel, S., Albert, S., Tolle, R., Casari, G., Kirsch, H.: Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res. 32(1), 135–142 (2004)
Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20(4), 557–568 (2004)
Erdogmus, M., Sezerman, O.U.: Application of automatic mutation-gene pair extraction to diseases. J. Bioinform. Comput. Biol. 5(6), 1261–1275 (2007)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proc. AMIA Symp., pp. 17–21 (2001)
Aberdeen, J., Bayer, S., Yeniterzi, R., Wellner, B., Clark, C., Hanauer, D., Malin, B., Hirschman, L.: The MITRE Identification Scrubber Toolkit: design, training, and assessment. International Journal of Medical Informatics 79(12), 849–859 (2010)
Callison-Burch, C., Dredze, M.: Creating speech and language data with Amazon’s Mechanical Turk NAACL HLT. In: 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Association for Computational Linguistics, Los Angeles (2010)
Yetisgen-Yildiz, M., Solti, I., Xia, F., Halgrim, S.: Preliminary Experiments with Amazon’s Mechanical Turk for Annotating Medical Named Entities. In: NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 180–183. Assn for Comp. Ling, Los Angeles (2010)
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Burger, J.D. et al. (2012). Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing. In: Bodenreider, O., Rance, B. (eds) Data Integration in the Life Sciences. DILS 2012. Lecture Notes in Computer Science(), vol 7348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31040-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-31040-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31039-3
Online ISBN: 978-3-642-31040-9
eBook Packages: Computer ScienceComputer Science (R0)