Skip to main content

Novelty Detection in Patient Histories: Experiments with Measures Based on Text Compression

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Abstract

Reviewing a patient history can be very time consuming, partly because of the large number of consultation notes. Often, most of the notes contain little new information. Tools facilitating this and other tasks could be constructed if we had the ability to automatically detect the novel notes. We propose the use of measures based on text compression, as an approximation of Kolmogorov complexity, for classifying note novelty. We define four compression-based and eight other measures. We evaluate their ability to predict the presence of previously unseen diagnosis codes associated with the notes in patient histories from general practice. The best measures show promising classification ability, which, while not enough to serve alone as a clinical tool, might be useful as part of a system taking more data types into account. The best individual measure was the normalized asymmetric compression distance between the concatenated prior notes and the current note.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bennett, C.H., Gács, P., Li, M., Vitányi, P.M.B., Zurek, W.H.: Information distance. IEEE Transactions on Information Theory 44(4), 1407–1423 (1998)

    Article  MATH  Google Scholar 

  2. Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metric. IEEE Transactions on information theory 50(12), 3250–3264 (2004)

    Article  Google Scholar 

  3. Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 206–215. ACM Press, New York (2004)

    Chapter  Google Scholar 

  4. Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Transactions of Information Theory 51(4), 1523–1545 (2005)

    Article  Google Scholar 

  5. Grumbach, S., Tahi, F.: A new challenge for compression algorithms. Information Processing and Management 30, 866–875 (1994)

    Article  Google Scholar 

  6. Varré, J.-S., Delahaye, J.-P., Rivals, E.: Transformation distances: a familty of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)

    Article  Google Scholar 

  7. Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 688–693. ACM Press, New York (2002)

    Chapter  Google Scholar 

  8. Brants, T., Chen, F.: A system for new event detection. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 330–337. ACM Press, New York (2003)

    Chapter  Google Scholar 

  9. Kumaran, G., Allan, J.: Using names and topics for new event detection. In: HLT 2005: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp. 121–128. Association for Computational Linguistics (2005)

    Google Scholar 

  10. WONCA. ICPC-2: International Classification of Primary Care. WONCA International Classification Committee, Oxford University Press (1998)

    Google Scholar 

  11. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  12. Sing, T., Sander, O., Beerenwinkel, N., Lengauer, T.: Rocr: visualizing classifier performance in R. Bioinformatics 21(20), 3940–3941 (2005)

    Article  Google Scholar 

  13. DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837–845 (1988)

    Article  MATH  Google Scholar 

  14. Hanley, J.A., Hajian-Tilaki, K.O.: Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: An update. Statistics in Radiology 4, 49–58 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Edsberg, O., Nytrø, Ø., Røst, T.B. (2007). Novelty Detection in Patient Histories: Experiments with Measures Based on Text Compression. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74825-0_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74824-3

  • Online ISBN: 978-3-540-74825-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics