Scaling Unsupervised Risk Stratification to Massive Clinical Datasets

Scaling Unsupervised Risk Stratification to Massive Clinical Datasets

Zeeshan Syed, Ilan Rubinfeld
Copyright: © 2011 |Volume: 2 |Issue: 1 |Pages: 15
ISSN: 1947-9115|EISSN: 1947-9123|EISBN13: 9781613508152|DOI: 10.4018/jkdb.2011010103
Cite Article Cite Article

MLA

Syed, Zeeshan, and Ilan Rubinfeld. "Scaling Unsupervised Risk Stratification to Massive Clinical Datasets." IJKDB vol.2, no.1 2011: pp.45-59. http://doi.org/10.4018/jkdb.2011010103

APA

Syed, Z. & Rubinfeld, I. (2011). Scaling Unsupervised Risk Stratification to Massive Clinical Datasets. International Journal of Knowledge Discovery in Bioinformatics (IJKDB), 2(1), 45-59. http://doi.org/10.4018/jkdb.2011010103

Chicago

Syed, Zeeshan, and Ilan Rubinfeld. "Scaling Unsupervised Risk Stratification to Massive Clinical Datasets," International Journal of Knowledge Discovery in Bioinformatics (IJKDB) 2, no.1: 45-59. http://doi.org/10.4018/jkdb.2011010103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

While rare clinical events, by definition, occur infrequently in a population, the consequences of these events can be drastic. Unfortunately, developing risk stratification algorithms for these conditions requires large volumes of data to capture enough positive and negative cases. This process is slow, expensive, and burdensome to both patients and caregivers. This paper proposes an unsupervised machine learning approach to address this challenge and risk stratify patients for adverse outcomes without use of a priori knowledge or labeled training data. The key idea of the approach is to identify high-risk patients as anomalies in a population. Cases are identified through a novel algorithm that finds an approximate solution to the k-nearest neighbor problem using locality sensitive hashing (LSH) based on p-stable distributions. The algorithm is optimized to use multiple LSH searches, each with a geometrically increasing radius, to find the k-nearest neighbors of patients in a dynamically changing dataset where patients are being added or removed over time. When evaluated on data from the National Surgical Quality Improvement Program (NSQIP), this approach successfully identifies patients at an elevated risk of mortality and rare morbidities. The LSH-based algorithm provided a substantial improvement over an exact k-nearest neighbor algorithm in runtime, while achieving a similar accuracy.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.