skip to main content
10.1145/3307339.3343246acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

A Text-Mining System for Concept Annotation in Biomedical Full Text Articles

Published: 04 September 2019 Publication History

Abstract

PubTator Central (https://www.ncbi.nlm.nih.gov/research/pubtator/) [1] is a web service for exploring and retrieving bioconcept annotations in full text biomedical articles. PubTator Central (PTC) provides automated annotations from state-of-the-art text mining systems for genes/proteins, genetic variants, diseases, chemicals, species and cell lines, all available for immediate download. PTC annotates PubMed (30 million abstracts), the PMC Open Access Subset and the Author Manuscript Collection (3 million full text articles). These full text articles increase the total number of annotations nearly four-fold. The new PTC web interface features semantic search and faceted shortcuts to improve navigation in full text. Increased throughput and speed despite a huge increase in data volume is permitted by a significantly redesigned back end that heavily exploits nonrelational data. Updated entity identification methods and a new disambiguation module based on cutting-edge deep learning techniques provide increased accuracy. The PTC web interface allows users to easily navigate through bioentities present in full-text articles, build full text document collections and visualize concept annotations in each document. Annotations are downloadable in multiple formats (XML, JSON and tab delimited) via the online interface, a RESTful web service and bulk FTP. PTC is synchronized with PubMed and PubMed Central, with new articles added daily. The original PubTator [2] service has served annotated abstracts for ~300 million requests, enabling third-party research in use cases such as biocuration support, gene prioritization, genetic disease analysis, and literature-based knowledge discovery. We demonstrate the full text results in PTC significantly increase biomedical concept coverage and anticipate this expansion will both enhance existing downstream applications and enable new use cases.

References

[1]
Wei,C.H., Allot, A., Leaman,R., and Lu,Z. PubTator Central: Automated Concept Annotation for Biomedical Full Text Articles. Nucleic Acids Research, 2019 (Web Server issue)
[2]
Wei,C.H., Kao,H.Y., and Lu,Z. (2013) PubTator: a Web-based text mining tool for assisting Biocuration. Nucleic Acids Res., 41, W518-W522

Cited By

View all
  • (2019)Biomedical Mention Disambiguation using a Deep Learning ApproachProceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3307339.3342162(307-313)Online publication date: 7-Sep-2019

Index Terms

  1. A Text-Mining System for Concept Annotation in Biomedical Full Text Articles

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
    September 2019
    716 pages
    ISBN:9781450366663
    DOI:10.1145/3307339
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 September 2019

    Check for updates

    Author Tags

    1. biocuration
    2. name entity recognition
    3. natural language processing
    4. pubtator

    Qualifiers

    • Poster

    Conference

    BCB '19
    Sponsor:

    Acceptance Rates

    BCB '19 Paper Acceptance Rate 42 of 157 submissions, 27%;
    Overall Acceptance Rate 254 of 885 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Biomedical Mention Disambiguation using a Deep Learning ApproachProceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3307339.3342162(307-313)Online publication date: 7-Sep-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media