Abstract
Reviewing a patient history can be very time consuming, partly because of the large number of consultation notes. Often, most of the notes contain little new information. Tools facilitating this and other tasks could be constructed if we had the ability to automatically detect the novel notes. We propose the use of measures based on text compression, as an approximation of Kolmogorov complexity, for classifying note novelty. We define four compression-based and eight other measures. We evaluate their ability to predict the presence of previously unseen diagnosis codes associated with the notes in patient histories from general practice. The best measures show promising classification ability, which, while not enough to serve alone as a clinical tool, might be useful as part of a system taking more data types into account. The best individual measure was the normalized asymmetric compression distance between the concatenated prior notes and the current note.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bennett, C.H., Gács, P., Li, M., Vitányi, P.M.B., Zurek, W.H.: Information distance. IEEE Transactions on Information Theory 44(4), 1407–1423 (1998)
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metric. IEEE Transactions on information theory 50(12), 3250–3264 (2004)
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 206–215. ACM Press, New York (2004)
Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Transactions of Information Theory 51(4), 1523–1545 (2005)
Grumbach, S., Tahi, F.: A new challenge for compression algorithms. Information Processing and Management 30, 866–875 (1994)
Varré, J.-S., Delahaye, J.-P., Rivals, E.: Transformation distances: a familty of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)
Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 688–693. ACM Press, New York (2002)
Brants, T., Chen, F.: A system for new event detection. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 330–337. ACM Press, New York (2003)
Kumaran, G., Allan, J.: Using names and topics for new event detection. In: HLT 2005: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp. 121–128. Association for Computational Linguistics (2005)
WONCA. ICPC-2: International Classification of Primary Care. WONCA International Classification Committee, Oxford University Press (1998)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Sing, T., Sander, O., Beerenwinkel, N., Lengauer, T.: Rocr: visualizing classifier performance in R. Bioinformatics 21(20), 3940–3941 (2005)
DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837–845 (1988)
Hanley, J.A., Hajian-Tilaki, K.O.: Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: An update. Statistics in Radiology 4, 49–58 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Edsberg, O., Nytrø, Ø., Røst, T.B. (2007). Novelty Detection in Patient Histories: Experiments with Measures Based on Text Compression. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-74825-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74824-3
Online ISBN: 978-3-540-74825-0
eBook Packages: Computer ScienceComputer Science (R0)