Data mining from a patient safety database: the lessons learned

Bentham, James; Hand, David J.

doi:10.1007/s10618-011-0225-y

Data mining from a patient safety database: the lessons learned

Published: 09 June 2011

Volume 24, pages 195–217, (2012)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

James Bentham¹ &
David J. Hand^2,3

595 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

The issue of patient safety is an extremely important one; each year in the UK, hundreds of thousands of people suffer due to some sort of incident that occurs whilst they are in National Health Service care. The National Patient Safety Agency (NPSA) works to try to reduce the scale of the problem. One of its major projects is to collect a very large dataset, the Reporting and Learning System (RLS), which describes several million of these incidents. The RLS is used as the basis for research by the NPSA. However, the NPSA has identified a gap in their work between high-level quantitative analysis and detailed, manual analysis of small samples. This paper describes the lessons learned from a knowledge discovery process that attempted to fill this gap. The RLS contains a free text description of each incident. A high dimensional model of the text is calculated, using the vector space model with term weighting applied. Dimensionality reduction techniques are used to produce the final models of the text. These models are examined using an anomaly detection tool to find groups of incidents that should be coherent in meaning, and that might be of interest to the NPSA. A three stage process is developed for assessing the results. The first stage uses a quantitative measure based on the use of planted groups of known interest, the second stage involves manual filtering by a non-expert, and the third stage is assessment by clinical experts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC, Hinneburg A, Keim DA (2000) On the surprising behavior of distance metrics in high dimensional spaces. Lect Notes Comput Sci 1973: 420–434
Article Google Scholar
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: COLT: proceedings of the workshop on Computational learning theory, pp 92–100
Department of Health Expert Group (2000) An organisation with a memory. Department of Health, London
Google Scholar
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, pp 226–231
Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) The KDD process for extracting useful knowledge from volumes of data. Commun ACM 39(11): 27–34
Article Google Scholar
Foster J, Wagner J, van Genabith J (2008) Adapting a WSJ-trained parser to grammatically noisy text. In: Proceedings of the 46th annual meeting of the association for computational linguistics on Human lanauge technologies, pp 221–224, 16–17 June 2008
Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces. In: Proceedings of the 26th VLDB conference, Egypt, pp 506–515
Honnibal M, Nothman J, Curran JR (2009) Evaluating a statistical CCG parser on wikipedia. In: Proceedings of the 2009 workshop on the People’s web meets NLP: collaboratively constructed semantic resources, pp 38–41, August
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York
Book Google Scholar
Lease M, Charniak E (2005) Parsing biomedical literature. In: Second international joint conference on Natural language processing, Jeju Island, pp 58–69
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on Mathematical statistics and probability. University of California Press, Berkeley, pp 281–297
Mannila H (1996) Data mining: machine learning, statistics, and databases. In: Proceedings of the eighth international conference on Scientific and statistical database management, 18–20 June 1996, pp 2–9
Manning C, Raghavan P, Schuetze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
MATH Google Scholar
Rasmussen C (2000) The infinite Gaussian mixture model. Adv Neural Inf Process Syst 12: 554–560
Google Scholar
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
MATH Google Scholar
Saad FH, de la Iglesia B (2006) A comparison of two document clustering approaches for clustering medical documents. In: Proceedings of the 2006 international conference on Data Mining, Las Vegas, USA
Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7): 969–978
Article Google Scholar
Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York
MATH Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11): 613–620
Article MATH Google Scholar
Zhang Z, Hand DJ (2005) Detecting groups of anomalously similar objects in large data sets. In: Proceedings of LNCS. Springer, Heidelberg, pp 509–519
Zhang X, Jing L, Hu X, Ng MK, Jiangxi JX, Zhou X (2008) Medical document clustering using ontology-based term similarity measures. IJDWM 4(1): 62–73
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Medical and Molecular Genetics, King’s College, London, UK
James Bentham
Department of Mathematics, Imperial College, London, UK
David J. Hand
Institute for Mathematical Sciences, Imperial College, London, UK
David J. Hand

Authors

James Bentham
View author publications
You can also search for this author in PubMed Google Scholar
David J. Hand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Bentham.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bentham, J., Hand, D.J. Data mining from a patient safety database: the lessons learned. Data Min Knowl Disc 24, 195–217 (2012). https://doi.org/10.1007/s10618-011-0225-y

Download citation

Received: 21 December 2009
Accepted: 23 May 2011
Published: 09 June 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s10618-011-0225-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data mining from a patient safety database: the lessons learned

Abstract

Access this article

Similar content being viewed by others

The role of artificial intelligence in healthcare: a structured literature review

Big data in healthcare: management, analysis and future prospects

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data mining from a patient safety database: the lessons learned

Abstract

Access this article

Similar content being viewed by others

The role of artificial intelligence in healthcare: a structured literature review

Big data in healthcare: management, analysis and future prospects

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation