Skip to main content

Advertisement

Log in

A Transparent and Adaptable Method to Extract Colonoscopy and Pathology Data Using Natural Language Processing

  • Education & Training
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Key variables recorded as text in colonoscopy and pathology reports have been extracted using natural language processing (NLP) tools that were not easily adaptable to new settings. We aimed to develop a reliable NLP tool with broad adaptability. During 1996–2016, Kaiser Permanente Northern California performed 401,566 colonoscopies with linked pathology. We randomly sampled 1000 linked reports into a Training Set and developed an NLP tool using SAS® PERL regular expressions. The NLP tool captured five colonoscopy and pathology variables: type, size, and location of polyps; extent of procedure; and quality of bowel preparation. We used a Validation Set (N = 3000) to confirm the variables’ classifications using manual chart review as the reference. Performance of the NLP tool was assessed using the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Cohen’s κ. Cohen’s κ ranged from 93 to 99%. The sensitivity and specificity ranged from 95 to 100% across all categories. For categories with prevalence exceeding 10%, the PPV ranged from 97% to 100% except for adequate quality of preparation (prevalence 92%), for which the PPV was 65%. For categories with prevalence below 10%, the PPVs ranged from 62% to 100%. NPVs ranged from 94% to 100% except for the “complete” extent of procedure, for which the NPV was 73%. Using information from a large community-based population, we developed a transparent and adaptable NLP tool for extracting five colonoscopy and pathology variables. The tool can be readily tested in other healthcare settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Abbreviations

NLP:

natural language processing

CRC:

colorectal cancer

SP:

serrated polyp

SSA:

sessile serrated adenoma

SSP:

sessile serrated polyp

HP:

hyperplastic polyp

TSA:

traditional serrated adenoma

NPV:

negative predictive value

PPV:

positive predictive value

CI:

confidence interval

References

  1. Levin B, Lieberman DA, McFarland B, et al.. American Cancer Society Colorectal Cancer Advisory Group; US Multi-Society Task Force; American College of Radiology Colon Cancer Committee. Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: a joint guideline from the American Cancer Society, the US Multi-Society Task Force on Colorectal Cancer, and the American College of Radiology. CA Cancer J Clin 2008;58(3):130-60.

    Article  Google Scholar 

  2. Rex DK, Boland CR, Dominitz JA, et al. Colorectal cancer screening: recommendations for physicians and patients from the U.S. Multi-society Task Force on colorectal cancer. Gastroenterology 2017;153:307e323.

    Article  Google Scholar 

  3. Kaminski MF, Wieszczy P, Rupinski M, et al. Increased Rate of Adenoma Detection Associates With Reduced Risk of Colorectal Cancer and Death. Gastroenterology 2017;153(1):98-105.

    Article  Google Scholar 

  4. Rex, D. K, Ahnen, D. J, Baron, J. A, et al. 2012. Serrated lesions of the colorectum: review and recommendations from an expert panel. Am J Gastroenterol, 107:1315-29; quiz 1314, 1330.

  5. Erichsen R, Baron JA, Hamilton-Dutoit SJ, et al. Increased risk of colorectal cancer development among patients with serrated polyps. Gastroenterology 2016;150:895-902.

    Article  Google Scholar 

  6. Anderson JC, Butterly LF, Weiss JE, et al. Providing data for serrated polyp detection rate benchmarks: an analysis of the New Hampshire Colonoscopy Registry. Gastrointest Endosc 2017;85:1188-94.

    Article  Google Scholar 

  7. Liao KP, Cai T, Savova GK, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 2015;350:h1885.

    Article  Google Scholar 

  8. Lee JK, Jensen CD, Lee A, et al. Development and validation of an algorithm for classifying colonoscopy indication. Gastrointest Endosc 2015;81:575-82.

    Article  Google Scholar 

  9. Lee JK, Jensen CD, Levin TR, et al. Accurate identification of colonoscopy quality and polyp findings using natural language processing. J Clin Gastroenterol 2019;53(1):e25-e30.

  10. Harkema H, Chapman WW, Saul M, Dellon ES, Schoen RE, Mehrotra A. Developing a natural language processing application for measuring the quality of colonoscopy procedures. J Am Med Inform Assoc 2011;18 Suppl 1:i150-6.

    Article  Google Scholar 

  11. Carrell DS, Schoen RE, Leffler DA, et al. Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings. J Am Med Inform Assoc 2017;24(5):986-991.

    Article  Google Scholar 

  12. Gawron AJ, Thompson WK, Keswani RN, et al. Anatomic and advanced adenoma detection rates as quality metrics determined via natural language processing. Am J Gastroenterol 2014;109:1844-9.

    Article  Google Scholar 

  13. Imler TD, Morea J, Kahi C, Imperiale TF. Natural language processing accurately categorizes findings from colonoscopy and pathology reports. Clin Gastroenterol Hepatol 2013;11(6):689-94.

    Article  Google Scholar 

  14. Imler TD, Morea J, Kahi C, et al. Multi-center colonoscopy quality measurement utilizing natural language processing. Am J Gastroenterol. 2015;110:543-52.

    Article  Google Scholar 

  15. Naylor J, Borges LF, Goryachev S, Gainer VS, Saltzman JR. Natural language processing accurately calculates adenoma and sessile serrated polyp detection rates. Dig Dis Sci 2018;63:1794-1800.

    Article  Google Scholar 

  16. Raju GS, Lum PJ, Slack RS, et al. Natural language processing as an alternative to manual reporting of colonoscopy quality metrics. Gastrointest Endosc 2015;82(3):512-9.

    Article  Google Scholar 

  17. Miller T, Dligach D, Bethard S, et al. Towards generalizable entity-centric clinical coreference resolution. J Biomed Inform 2017;69:251-258.

    Article  Google Scholar 

  18. Li D, Woolfrey J, Jiang SF, et al. Diagnosis and predictors of sessile serrated adenoma after educational training in a large, community-based, integrated healthcare setting. Gastrointest Endosc 2018;87(3):755-765.

    Article  Google Scholar 

  19. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20(1):37–46.

    Article  Google Scholar 

  20. Corley DA, Jensen CD, Marks AR, et al. Adenoma detection rate and risk of colorectal cancer and death. N Engl J Med 2014;370:1298-306.

    Article  CAS  Google Scholar 

  21. SAS® Perl regular expressions tip sheet. https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf. Accessed January 21, 2019.

  22. Liu L, Shorstein NH, Amsden LB, Herrinton LJ. Natural language processing to ascertain two key variables from operative reports in ophthalmology. Pharmacoepidemiol Drug Saf 2017;26(4):378-385.

    Article  Google Scholar 

  23. Lieberman DA, Rex DK, Winawer SJ, Giardiello FM, Johnson DA, Levin TR et al.. Guidelines for colonoscopy surveillance after screening and polypectomy: a consensus update by the US Multi-Society Task Force on Colorectal Cancer. Gastroenterology 2012;143:844-57.

    Article  Google Scholar 

  24. Lai EJ, Calderwood AH, Doros G, et al. The Boston Bowel Preparation Scale: A valid and reliable instrument for colonoscopy-oriented research. Gastrointest Endosc 2009;69:620-5.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lisa J. Herrinton.

Ethics declarations

Financial support

This research was supported by Kaiser Permanente Northern California Division of Research Physician Researcher Program Funding.

Conflict of interest

None.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Ethics

Institutional Review Board approval was obtained from the Kaiser Permanente Northern California Institutional Review Board.

Guarantor of the article

Drs. Dan Li and Lisa Herrinton take full responsibility for the conduct of the study, had access to the data, and had control of the decision to publish.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Education & Training

Electronic supplementary material

ESM 1

(DOCX 47 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fevrier, H.B., Liu, L., Herrinton, L.J. et al. A Transparent and Adaptable Method to Extract Colonoscopy and Pathology Data Using Natural Language Processing. J Med Syst 44, 151 (2020). https://doi.org/10.1007/s10916-020-01604-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-020-01604-8

Keywords

Navigation