skip to main content
10.1145/2649387.2660786acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

Text mining tools for assisting literature curation

Published: 20 September 2014 Publication History

Abstract

Today's biomedical research has become heavily dependent on the access to biological knowledge encoded in expert curated biological databases (e.g. Swiss-Prot). As the volume of biological literature grows rapidly, it becomes increasingly difficult for human curators to keep up with the literature because manual curation is an expensive and time-consuming endeavor. Past research has shown that (semi-)automated approach has the potential to greatly improve the manual curation productivity [1-3]. We recently developed PubTator, a web-based application for assisting literature curation through the use of various text mining tools [4-6].
PubTator has several unique features. First, PubTator is a web-based system, thus no installation is required and not restricted to any specific computer platforms. That is, it works on different computing platforms as long as there is a Web browser installed. Second, PubTator features a PubMed-like interface which many human curators find it to be familiar and easy to use with minimal training required. Third, PubTator integrates multiple competition-winning text mining approaches that we recently developed for recognizing important biological entities: Gene/Proteins, Diseases, Mutations, Chemical/Drugs, and Organisms [7-11]. Hence, it can guarantee the state-of-the-art performance on text-mined results. Lastly, PubTator is in sync with PubMed content through nightly updates. Interested users can access our text-mined results via a) PubTator web interface, b) RESTful API or c) ftp download.
We have conducted a formal text-mining aided curation experiment, results of which showed that PubTator was able to greatly improve both the curation efficiency and accuracy [6]. More recently, PubTator has been successfully deployed in practice for the curation of CDC's human genome epidemiology knowledge-base. Hence, we conclude that our text-mining tools and PubTator can provide practical benefits to literature curation in bioinformatics research.
PubTator is freely available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/

References

[1]
Van Auken, K., Jaffery, J., Chan, J., Müller, H.-M. and Sternberg, P. W. 2009. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation. BMC bioinformatics 10, 1(2009), 228. DOI=http://dx.doi.org/10.1186/1471-2105-10-228
[2]
Alex, B., Grover, C., Haddow, B., Kabadjor, M., Klein, E., Matthews, M., Roebuck, S., Tobin, R. and Wang, X. 2008 Assisted Curation: Does Text Mining Really Help? In Proceedings of the Pacific Symposium on Biocomputing (Hawaii, USA, 2008). Citeseer
[3]
Donaldson, I., Martin, J., De Bruijn, B., Wolting, C., Lay, V., Tuekam, B., Zhang, S., Baskin, B., Bader, G. D. and Michalickova, K. 2003. PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine. BMC bioinformatics 4, 1(2003), 11. DOI=http://dx.doi.org/10.1186/1471-2105-4-11
[4]
Wei, C.-H., Kao, H.-Y. and Lu, Z. 2013. PubTator: a Web-based text mining tool for assisting Biocuration. Nucleic acids research 41, Web Server Issue(2013), W518--W522. DOI=http://dx.doi.org/10.1093/nar/gkt44
[5]
Wei, C.-H., Kao, H.-Y. and Lu, Z. 2012 PubTator: A -like interactive curation system for document triage and literature curation. In Proceedings of the Proceedings of the BioCreative 2012 workshop (Washington DC, USA, 2012). BioCreative
[6]
Wei, C.-H., Harris, B. R., Li, D., Berardini, T. Z., Huala, E., Kao, H.-Y. and Lu, Z. 2012. Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in abstracts. Database (oxford) 2012 2012), bas041. DOI=http://dx.doi.org/10.1093/database/bas041
[7]
Wei, C.-H., Harris, B. R., Kao, H.-Y. and Lu, Z. 2013. tmVar: A text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 29, 11(2013), 1433--1439. DOI=http://dx.doi.org/10.1093/bioinformatics/btt156
[8]
Leaman, R., Wei, C.-H. and Lu, Z. 2013 NCBI at the BioCreative IV CHEMDNER Task: Recognizing chemical names in articles with tmChem. In Proceedings of the BioCreative Challenge Evaluation Workshop vol (Bethesda, Maryland, USA, 2013). BioCreative
[9]
Leaman, R., Doǧan, R. I. and Lu, Z. 2013. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29, 22(2013), 2909--2917. DOI=http://dx.doi.org/10.1093/bioinformatics/btt474
[10]
Wei, C.-H., Kao, H.-Y. and Lu, Z. 2012. SR4GN: a species recognition software tool for gene normalization. Plos one 7, 6(2012), e38460. DOI=http://dx.doi.org/10.1371/journal.pone.0038460
[11]
Wei, C.-H. and Kao, H.-Y. 2011. Cross-species gene normalization by species inference. BMC bioinformatics 12, Suppl 8(2011), S5. DOI=http://dx.doi.org/10.1186/1471-2105-12-S8-S5

Cited By

View all
  • (2025)Integration of biomedical concepts for enhanced medical literature retrievalInternational Journal of Data Science and Analytics10.1007/s41060-025-00724-zOnline publication date: 3-Feb-2025
  • (2018)A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstractsPLOS Computational Biology10.1371/journal.pcbi.100596214:2(e1005962)Online publication date: 15-Feb-2018

Index Terms

  1. Text mining tools for assisting literature curation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
      September 2014
      851 pages
      ISBN:9781450328944
      DOI:10.1145/2649387
      • General Chairs:
      • Pierre Baldi,
      • Wei Wang
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 September 2014

      Check for updates

      Author Tags

      1. biocuration
      2. biomedical text mining
      3. name entity normalization
      4. name entity recognition

      Qualifiers

      • Poster

      Conference

      BCB '14
      Sponsor:
      BCB '14: ACM-BCB '14
      September 20 - 23, 2014
      California, Newport Beach

      Acceptance Rates

      Overall Acceptance Rate 254 of 885 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Integration of biomedical concepts for enhanced medical literature retrievalInternational Journal of Data Science and Analytics10.1007/s41060-025-00724-zOnline publication date: 3-Feb-2025
      • (2018)A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstractsPLOS Computational Biology10.1371/journal.pcbi.100596214:2(e1005962)Online publication date: 15-Feb-2018

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media