Novelty Detection in Patient Histories: Experiments with Measures Based on Text Compression

Edsberg, Ole; Nytrø, Øystein; Røst, Thomas Brox

doi:10.1007/978-3-540-74825-0_33

Novelty Detection in Patient Histories: Experiments with Measures Based on Text Compression

Ole Edsberg¹,
Øystein Nytrø¹ &
Thomas Brox Røst¹

Conference paper

1480 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Abstract

Reviewing a patient history can be very time consuming, partly because of the large number of consultation notes. Often, most of the notes contain little new information. Tools facilitating this and other tasks could be constructed if we had the ability to automatically detect the novel notes. We propose the use of measures based on text compression, as an approximation of Kolmogorov complexity, for classifying note novelty. We define four compression-based and eight other measures. We evaluate their ability to predict the presence of previously unseen diagnosis codes associated with the notes in patient histories from general practice. The best measures show promising classification ability, which, while not enough to serve alone as a clinical tool, might be useful as part of a system taking more data types into account. The best individual measure was the normalized asymmetric compression distance between the concatenated prior notes and the current note.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bennett, C.H., Gács, P., Li, M., Vitányi, P.M.B., Zurek, W.H.: Information distance. IEEE Transactions on Information Theory 44(4), 1407–1423 (1998)
Article MATH Google Scholar
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metric. IEEE Transactions on information theory 50(12), 3250–3264 (2004)
Article Google Scholar
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 206–215. ACM Press, New York (2004)
Chapter Google Scholar
Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Transactions of Information Theory 51(4), 1523–1545 (2005)
Article Google Scholar
Grumbach, S., Tahi, F.: A new challenge for compression algorithms. Information Processing and Management 30, 866–875 (1994)
Article Google Scholar
Varré, J.-S., Delahaye, J.-P., Rivals, E.: Transformation distances: a familty of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)
Article Google Scholar
Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 688–693. ACM Press, New York (2002)
Chapter Google Scholar
Brants, T., Chen, F.: A system for new event detection. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 330–337. ACM Press, New York (2003)
Chapter Google Scholar
Kumaran, G., Allan, J.: Using names and topics for new event detection. In: HLT 2005: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp. 121–128. Association for Computational Linguistics (2005)
Google Scholar
WONCA. ICPC-2: International Classification of Primary Care. WONCA International Classification Committee, Oxford University Press (1998)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Sing, T., Sander, O., Beerenwinkel, N., Lengauer, T.: Rocr: visualizing classifier performance in R. Bioinformatics 21(20), 3940–3941 (2005)
Article Google Scholar
DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837–845 (1988)
Article MATH Google Scholar
Hanley, J.A., Hajian-Tilaki, K.O.: Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: An update. Statistics in Radiology 4, 49–58 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Norwegian University of Science and Technology, Department of Computer and Information Science, Sem Sælands vei 7-9, NO-7491 Trondheim, Norway
Ole Edsberg, Øystein Nytrø & Thomas Brox Røst

Authors

Ole Edsberg
View author publications
You can also search for this author in PubMed Google Scholar
Øystein Nytrø
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brox Røst
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Edsberg, O., Nytrø, Ø., Røst, T.B. (2007). Novelty Detection in Patient Histories: Experiments with Measures Based on Text Compression. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_33

Download citation

DOI: https://doi.org/10.1007/978-3-540-74825-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74824-3
Online ISBN: 978-3-540-74825-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics