Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions

Saha, Budhaditya; Gupta, Sunil; Phung, Dinh; Venkatesh, Svetha

doi:10.1007/s10115-017-1038-0

Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions

Regular Paper
Published: 18 March 2017

Volume 53, pages 179–206, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Budhaditya Saha¹,
Sunil Gupta¹,
Dinh Phung¹ &
…
Svetha Venkatesh¹

541 Accesses
9 Citations
Explore all metrics

Abstract

Electronic medical records (EMRs) are being increasingly used for “risk” prediction. By “risks,” we denote outcomes such as emergency presentation, readmission, and the length of hospitalizations. However, EMR data analysis is complicated by missing entries. There are two reasons—the “primary reason for admission” is included in EMR, but the comorbidities (other chronic diseases) are left uncoded, and many zero values in the data are accurate, reflecting that a patient has not accessed medical facilities. A key challenge is to deal with the peculiarities of this data—unlike many other datasets, EMR is sparse, reflecting the fact that patients have some but not all diseases. We propose a novel model to fill-in these missing values and use the new representation for prediction of key hospital events. To “fill-in” missing values, we represent the feature-patient matrix as a product of two low-rank factors, preserving the sparsity property in the product. Intuitively, the product regularization allows sparse imputation of patient conditions reflecting common comorbidities across patients. We develop a scalable optimization algorithm based on Block coordinate descent method to find an optimal solution. We evaluate the proposed framework on two real-world EMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions). Our result shows that the AUC for 3-month emergency presentation prediction is improved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMI data. Similarly, AUC for 3-month emergency admission prediction from (0.730 to 0.752) for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method to a supervised model for predicting multiple related risk outcomes (e.g., emergency presentations and admissions in hospital over 3, 6, and 12 months period) in an integrated framework. The supervised model consistently outperforms state-of-the-art baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contrastive Learning-Based Imputation-Prediction Networks for In-hospital Mortality Risk Modeling Using EHRs

A Graph-based Imputation Method for Sparse Medical Records

Multimorbidity in middle-aged women and COVID-19: binary data clustering for unsupervised binning of rare multimorbidity features and predictive modeling

Article Open access 24 April 2024

Notes

http://apps.who.int/classifications/icd10/browse/2010/en.
Ethics approval obtained through University and the Hospital—12/83.

References

Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR 6:1817–1853
MathSciNet MATH Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Article MATH Google Scholar
Cai J, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982
Article MathSciNet MATH Google Scholar
Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9(6):717–772
Article MathSciNet MATH Google Scholar
Hariharan B, Zelnik-Manor L, Varma M, Vishwanathan S (2010) Large scale max-margin multi-label classification with priors. In: Proceedings of ICML, pp 423–430
Ho JC, Ghosh J, Sun J (2014) Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In: Proceedings of ACM SIGKDD, pp 115–124. ACM
Hripcsak G, Albers DJ (2012) Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 20:117–121
Article Google Scholar
Hu Y, Zhang D, Ye J, Li X, He X (2013) Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE PAMI 35(9):2117–2130
Article Google Scholar
Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 13(6):395–405
Article Google Scholar
Ji S, Tang L, Yu S, Ye J (2010) A shared-subspace learning framework for multi-label classification. TKDD 4(2):8
Article Google Scholar
Keshavan R, Montanari A, Oh S (2009) Matrix completion from noisy entries. In: NIPS, pp 952–960
Lee DD, Sebastian SH (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp 556–562
Lee JD, Sun Y, Saunders MA (2014) Proximal newton-type methods for minimizing composite functions. SIAM J Optim 24(3):1420–1443
Article MathSciNet MATH Google Scholar
Luo D, Wang F, Sun J, Markatou M, Hu J, Ebadollahi S (2012) Sor: Scalable orthogonal regression for non-redundant feature selection and its healthcare applications. In: SIAM. SDM
Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. JMLR 11:2287–2322
MathSciNet MATH Google Scholar
Mnih A, Salakhutdinov R (2007) Probabilistic matrix factorization. In: Proceedings of NIPS, pp 1257–1264
Rana S, Tran T, Luo W, Phung D, Kennedy R, Venkatesh S (2014) Predicting unplanned readmission after myocardial infarction from routinely collected administrative hospital data. Aust Health Rev 38:377–382
Article Google Scholar
Saha B, Gupta S, Venkatesh S (2015) Prediciton of emergency events: a multi-task multi-label learning approach. In: Proceedings of PAKDD. Springer (accepted)
Saha B, Pham DS, Phung D, Venkatesh S (2013) Sparse subspace clustering via group sparse coding. In: Proceedings of SDM, pp 130–138. SIAM
Shen Y, Wen Z, Zhang Y (2014) Augmented lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim Methods Softw 29(2):239–263
Article MathSciNet MATH Google Scholar
Tran T, Luo W, Phung D, Gupta S, Rana S, Kennedy RL, Larkins A, Venkatesh S (2014) A framework for feature extraction from hospital medical data with applications in risk prediction. BMC Bioinformatics 15(1):6596
Article Google Scholar
Tran T, Phung D, Luo W, Venkatesh S (2014) Stabilized sparse ordinal regression for medical risk stratification. In: KAIS, pp 1–28
Wang F, Sun J, Ebadollahi S (2011) Integrating distance metrics learned from multiple experts and its application in inter-patient similarity assessment. SDM 11:59–70
Google Scholar
Wang F, Zhou J, Hu J (2014) Density transfer: a data driven approach for imputing electronic health records. In: Proceedings of ICPR, pp 2763–2768. IEEE
Wu J, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Medicalcare 48(6):S106–S113
Google Scholar
Yu K, Zhu S, Lafferty J, Gong Y (2009) Fast nonparametric matrix factorization for large-scale collaborative filtering. In: Proceedings of ACM SIGIR, pp 211–218. ACM
Zhou J, Wang F, Hu J, Ye J (2014) From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records. In: proc. of ACM SIGKDD, pages 135–144. ACM
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. Journal of computational and graphical statistics 15(2):265–286
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is partially supported by the Telstra-Deakin Center of Excellence in Big Data and Machine Learning.

Author information

Authors and Affiliations

Center for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Melbourne, Australia
Budhaditya Saha, Sunil Gupta, Dinh Phung & Svetha Venkatesh

Authors

Budhaditya Saha
View author publications
You can also search for this author inPubMed Google Scholar
Sunil Gupta
View author publications
You can also search for this author inPubMed Google Scholar
Dinh Phung
View author publications
You can also search for this author inPubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Budhaditya Saha.

Appendix

The optimization steps of supervised framework of modeling multiple risk outcomes: the formulation of the framework can be expressed as

$$\begin{aligned}&\min _{\varvec{\mathbf {U}},\varvec{\mathbf {V}},\varvec{\mathbf {E}},\varvec{\mathbf {Z}}\ge \mathbf {0}}||\varvec{\mathbf {E}}||_{F}^{2}+\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-\mathbf {H}^{T}\mathbf {G}||_{F}^{2}++\lambda _{uv}||\varvec{\mathbf {Z}}||_{1}+\frac{\lambda }{2}\bigg [||\varvec{\mathbf {U}}\Vert _{F}^{2}+||\varvec{\mathbf {V}}\Vert _{F}^{2}\bigg ]+\lambda _{w}||\mathbf {W}||_{*}\nonumber \\&\quad \hbox {s.t. }\varvec{\mathbf {X}}-\varvec{\mathbf {U}}\varvec{\mathbf {V}}=\varvec{\mathbf {E}},\varvec{\mathbf {U}}\varvec{\mathbf {V}}=\varvec{\mathbf {Z}},\mathbf {H}=\varvec{\mathbf {U}}\varvec{\mathbf {V}},\mathbf {G}=\mathbf {W}\end{aligned}$$

(27)

The Augmented Lagrangian of the above formulation can be expressed as

$$\begin{aligned} \mathcal {F}&=||\varvec{\mathbf {E}}||_{F}^{2}+\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-\mathbf {H}^{T}\mathbf {G}||_{F}^{2}+\lambda _{uv}||\varvec{\mathbf {Z}}||_{1}+\frac{\lambda }{2}\bigg [||\varvec{\mathbf {U}}||_{F}^{2}+||\varvec{\mathbf {V}}||_{F}^{2}\bigg ]++\text {tr}(\mathbf {L}^{T}(\varvec{\mathbf {U}}\varvec{\mathbf {V}}+\varvec{\mathbf {E}}\nonumber \\&\quad -\varvec{\mathbf {X}}))+\frac{\beta }{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}+\varvec{\mathbf {E}}-\varvec{\mathbf {X}}||_{F}^{2}+\text {tr}(\mathbf {Q}^{T}(\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\varvec{\mathbf {Z}}))+\frac{\beta }{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\varvec{\mathbf {Z}}||_{F}^{2}\nonumber \\&\quad +\frac{\beta }{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\mathbf {H}||_{F}^{2}+\text {tr}(\varvec{{{\varLambda }}}_{1}^{T}(\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\mathbf {H}))+\frac{\beta }{2}||\mathbf {G}-\mathbf {W}||_{F}^{2}+\text {tr}(\varvec{{{\varLambda }}}_{2}^{T}(\mathbf {G}-\mathbf {W}))\nonumber \\&\quad +\lambda _{w}||\mathbf {W}||_{*} \end{aligned}$$

(28)

$$\begin{aligned} \hbox {s.t.}&\quad \varvec{\mathbf {Z}}\ge \mathbf {0} \end{aligned}$$

(29)

where $\mathbf {L}$,$\mathbf {Q}$,$\varvec{{{\varLambda }}}_{1}$ and $\varvec{{{\varLambda }}}_{2}$ are the Lagrange multipliers and $\beta $ is a parameter to improve the numerical stability of the algorithm. After some manipulation, the Eq. (29) becomes

$$\begin{aligned} \mathcal {F}&=||\varvec{\mathbf {E}}||_{F}^{2}+\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-\mathbf {H}^{T}\mathbf {G}||_{F}^{2}+\lambda _{uv}||\varvec{\mathbf {Z}}||_{1}+\frac{\lambda }{2}\bigg [||\varvec{\mathbf {U}}||_{F}^{2}+||\varvec{\mathbf {V}}||_{F}^{2}\bigg ]\\&\quad +\frac{\beta }{2}\bigg [||\varvec{\mathbf {U}}\varvec{\mathbf {V}}+\varvec{\mathbf {E}}-\varvec{\mathbf {X}}-\frac{\mathbf {L}}{\beta }||_{F}^{2}\\&\quad +||\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\varvec{\mathbf {Z}}-\frac{\mathbf {Q}}{\beta }||_{F}^{2}+||\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\mathbf {H}-\frac{\varvec{{{\varLambda }}}_{1}}{\beta }||_{F}^{2}+||\mathbf {G}-\mathbf {W}-\frac{\varvec{{{\varLambda }}}_{2}}{\beta }||_{F}^{2}\bigg ] \end{aligned}$$

Assume $\varvec{\mathbf {C}}_{1}^{t}=\varvec{\mathbf {X}}+\frac{\mathbf {L}^{t}}{\beta }-\varvec{\mathbf {S}}^{t}$ and $\varvec{\mathbf {C}}_{2}^{t}=\varvec{\mathbf {Z}}^{k}+\frac{\mathbf {Q}^{t}}{\beta }$ and $\varvec{\mathbf {C}}_{3}^{t}=\mathbf {G}^{t}+\frac{\varvec{\mathbf {E}}_{1}^{t}}{\beta }$, the updating steps of the variables are as following

$$\begin{aligned} \varvec{\mathbf {V}}^{t+1}&=\min _{\varvec{\mathbf {V}}}\frac{1}{2}||\varvec{\mathbf {U}}^{t}\varvec{\mathbf {V}}-\varvec{\mathbf {C}}_{1}^{t}||_{F}^{2}+\frac{1}{2}||\varvec{\mathbf {U}}^{t}\varvec{\mathbf {V}}-\varvec{\mathbf {C}}_{2}^{t}||_{F}^{2}+\frac{1}{2}||\varvec{\mathbf {U}}^{t}\varvec{\mathbf {V}}-\varvec{\mathbf {C}}_{3}^{t}||_{F}^{2}+\frac{\lambda }{2\beta }||\varvec{\mathbf {V}}||_{F}^{2}.\\ \varvec{\mathbf {U}}^{t+1}&=\min _{\varvec{\mathbf {U}}}\frac{1}{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}^{t+1}-\varvec{\mathbf {C}}_{1}^{t}||_{F}^{2}+\frac{1}{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}^{t+1}-\varvec{\mathbf {C}}_{2}^{t}||_{F}^{2}+\frac{1}{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}^{t+1}-\varvec{\mathbf {C}}_{3}^{t}||_{F}^{2}+\frac{\lambda }{2\beta }||\varvec{\mathbf {U}}||_{F}^{2}\\ \varvec{\mathbf {Z}}^{t+1}&=\min _{\varvec{\mathbf {Z}}\ge \mathbf {0}}\lambda _{uv}||\varvec{\mathbf {Z}}||_{1}+\frac{\beta }{2}||\varvec{\mathbf {Z}}-(\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1}-\frac{\mathbf {Q}^{t}}{\beta })||_{F}^{2}\\ \varvec{\mathbf {E}}^{t+1}&=\min _{\varvec{\mathbf {S}}}||\varvec{\mathbf {S}}||_{F}^{2}+\frac{\beta }{2}||\varvec{\mathbf {S}}-(\varvec{\mathbf {X}}+\frac{\mathbf {L}^{t}}{\beta }-\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1})||_{F}^{2}\\ \mathbf {H}^{t+1}&=\min _{\mathbf {H}}\bigg [\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-\mathbf {H}^{T}\mathbf {G}^{t}||_{F}^{2}+||\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1}-\mathbf {H}-\frac{\varvec{{{\varLambda }}}_{1}^{t}}{\beta }||_{F}^{2}\bigg ]\\ \mathbf {G}^{t+1}&=\min _{\mathbf {G}}\bigg [\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-(\mathbf {H}^{t+1})^{T}\mathbf {G}||_{F}^{2}+||\mathbf {G}-\mathbf {W}^{t}-\frac{\varvec{{{\varLambda }}}_{2}^{t}}{\beta }||_{F}^{2}\bigg ]\\ \mathbf {W}^{t+1}&=\min _{\mathbf {W}}\lambda _{w}||\mathbf {W}||_{*}+||\mathbf {G}^{t+1}-\frac{\varvec{{{\varLambda }}}_{2}^{t}}{\beta }-\mathbf {W}||_{F}^{2}\\ \mathbf {L}^{t+1}&=\mathbf {L}^{t}+\beta (\varvec{\mathbf {X}}-\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1}-\varvec{\mathbf {S}}^{t+1})\\ \mathbf {Q}^{t+1}&=\mathbf {Q}^{t}+\beta (\varvec{\mathbf {Z}}^{t+1}-\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1})\\ \varvec{{{\varLambda }}}_{1}^{t+1}&=\varvec{{{\varLambda }}}_{1}^{t}+\beta (\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1}-\mathbf {H}^{t+1})\\ \varvec{{{\varLambda }}}_{2}^{t+1}&=\varvec{{{\varLambda }}}_{2}^{t}+\beta (\mathbf {G}^{t+1}-\mathbf {W}^{t+1}) \end{aligned}$$

As updating variables $\varvec{\mathbf {U}}$, $\varvec{\mathbf {V}}$, $\varvec{\mathbf {Z}}$, $\varvec{\mathbf {E}}$, $\mathbf {H}$, $\mathbf {G}$ and the Lagrange multipliers are similar to optimization methods of unsupervised framework (Sect. 3.3), for $\mathbf {W}$ with nuclear norm, the solution is given by

$$\begin{aligned}&\mathcal {P}(\mathbf {A},\lambda _{w})=\min _{\mathbf {W}}\bigg [||\mathbf {A}-\mathbf {W}||_{F}^{2}+\lambda _{w}||\mathbf {W}||_{*}\bigg ] \end{aligned}$$

where $\mathbf {A}=\mathbf {G}^{t+1}-\frac{\varvec{{{\varLambda }}}_{2}^{t}}{\beta }$. If $\varvec{\mathbf {U}}_{\ell }$,$\varvec{\mathbf {V}}_{r}$ are the left and right singular vectors of $\mathbf {A}$ and $\varvec{{{\varSigma }}}_{\mathbf {A}}$ is diagonal matrix of singular values, then $\mathcal {P}(\mathbf {A},\lambda _{w})=\varvec{\mathbf {U}}_{\ell }\varvec{{{\varSigma }}}_{(\mathbf {A},\lambda _{w})}\varvec{\mathbf {V}}_{\ell }^{T}$ where $diag(\varvec{{{\varSigma }}}_{(\mathbf {A},\lambda _{w})})=\max (0,diag(\varvec{{{\varSigma }}}_{\mathbf {A}}-\lambda _{w})).$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saha, B., Gupta, S., Phung, D. et al. Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions. Knowl Inf Syst 53, 179–206 (2017). https://doi.org/10.1007/s10115-017-1038-0

Download citation

Received: 03 July 2015
Revised: 13 January 2017
Accepted: 01 March 2017
Published: 18 March 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10115-017-1038-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Contrastive Learning-Based Imputation-Prediction Networks for In-hospital Mortality Risk Modeling Using EHRs

A Graph-based Imputation Method for Sparse Medical Records

Multimorbidity in middle-aged women and COVID-19: binary data clustering for unsupervised binning of rare multimorbidity features and predictive modeling

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now