Skip to main content

Two Applications of Statistical Modelling to Natural Language Processing

  • Chapter
Book cover Learning from Data

Part of the book series: Lecture Notes in Statistics ((LNS,volume 112))

Abstract

Each week the Columbia-Presbyterian Medical Center collects several megabytes of English text transcribed from radiologists’ dictation and notes of their interpretations of medical diagnostic x-rays. It is desired to automate the extraction of diagnoses from these natural language reports. This paper reports on two aspects of this project requiring advanced statistical methods. First, the identification of pairs of words and phrases that tend to appear together (collocate) uses a hierarchical Bayesian model that adjusts to different word and word pair distributions in different bodies of text. Second, we present an analysis of data from experiments to compare the performance of the computer diagnostic program to that of a panel of physician and lay readers of randomly sampled texts. A measure of inter-subject distance with respect to the diagnoses is defined for which estimated variances and covariances are easily computed. This allows statistical conclusions about the similarities and dissimilarities among diagnoses by the various programs and experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dillon W, Goldstein M (1984) Multivariate Analysis, New York: Wiley, 587pp.

    MATH  Google Scholar 

  2. Dunning, Ted (1993) Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19: 61–74.

    Google Scholar 

  3. Friedman C, Hripcsak G, DuMouchel W, Johnson S, Clayton P (1995) Natural language processing in an operational clinical information system, Natural Language Engineering 1 (1): 1–28.

    Article  Google Scholar 

  4. Hripcsak G, Friedman C, Alderson P, DuMouchel W, Johnson S, Clayton P (1995) Unlocking clinical data from narrative reports: a study of natural language processing. Annals of Internal Medicine, 122: 681–688.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

DuMouchel, W., Friedman, C., Hripcsak, G., Johnson, S.B., Clayton, P.D. (1996). Two Applications of Statistical Modelling to Natural Language Processing. In: Fisher, D., Lenz, HJ. (eds) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2404-4_39

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2404-4_39

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-94736-5

  • Online ISBN: 978-1-4612-2404-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics