Skip to main content
Log in

Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Electronic medical records (EMRs) are being increasingly used for “risk” prediction. By “risks,” we denote outcomes such as emergency presentation, readmission, and the length of hospitalizations. However, EMR data analysis is complicated by missing entries. There are two reasons—the “primary reason for admission” is included in EMR, but the comorbidities (other chronic diseases) are left uncoded, and many zero values in the data are accurate, reflecting that a patient has not accessed medical facilities. A key challenge is to deal with the peculiarities of this data—unlike many other datasets, EMR is sparse, reflecting the fact that patients have some but not all diseases. We propose a novel model to fill-in these missing values and use the new representation for prediction of key hospital events. To “fill-in” missing values, we represent the feature-patient matrix as a product of two low-rank factors, preserving the sparsity property in the product. Intuitively, the product regularization allows sparse imputation of patient conditions reflecting common comorbidities across patients. We develop a scalable optimization algorithm based on Block coordinate descent method to find an optimal solution. We evaluate the proposed framework on two real-world EMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions). Our result shows that the AUC for 3-month emergency presentation prediction is improved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMI data. Similarly, AUC for 3-month emergency admission prediction from (0.730 to 0.752) for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method to a supervised model for predicting multiple related risk outcomes (e.g., emergency presentations and admissions in hospital over 3, 6, and 12 months period) in an integrated framework. The supervised model consistently outperforms state-of-the-art baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://apps.who.int/classifications/icd10/browse/2010/en.

  2. Ethics approval obtained through University and the Hospital—12/83.

References

  1. Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR 6:1817–1853

    MathSciNet  MATH  Google Scholar 

  2. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  MATH  Google Scholar 

  3. Cai J, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982

    Article  MathSciNet  MATH  Google Scholar 

  4. Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9(6):717–772

    Article  MathSciNet  MATH  Google Scholar 

  5. Hariharan B, Zelnik-Manor L, Varma M, Vishwanathan S (2010) Large scale max-margin multi-label classification with priors. In: Proceedings of ICML, pp 423–430

  6. Ho JC, Ghosh J, Sun J (2014) Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In: Proceedings of ACM SIGKDD, pp 115–124. ACM

  7. Hripcsak G, Albers DJ (2012) Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 20:117–121

    Article  Google Scholar 

  8. Hu Y, Zhang D, Ye J, Li X, He X (2013) Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE PAMI 35(9):2117–2130

    Article  Google Scholar 

  9. Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 13(6):395–405

    Article  Google Scholar 

  10. Ji S, Tang L, Yu S, Ye J (2010) A shared-subspace learning framework for multi-label classification. TKDD 4(2):8

    Article  Google Scholar 

  11. Keshavan R, Montanari A, Oh S (2009) Matrix completion from noisy entries. In: NIPS, pp 952–960

  12. Lee DD, Sebastian SH (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp 556–562

  13. Lee JD, Sun Y, Saunders MA (2014) Proximal newton-type methods for minimizing composite functions. SIAM J Optim 24(3):1420–1443

    Article  MathSciNet  MATH  Google Scholar 

  14. Luo D, Wang F, Sun J, Markatou M, Hu J, Ebadollahi S (2012) Sor: Scalable orthogonal regression for non-redundant feature selection and its healthcare applications. In: SIAM. SDM

  15. Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. JMLR 11:2287–2322

    MathSciNet  MATH  Google Scholar 

  16. Mnih A, Salakhutdinov R (2007) Probabilistic matrix factorization. In: Proceedings of NIPS, pp 1257–1264

  17. Rana S, Tran T, Luo W, Phung D, Kennedy R, Venkatesh S (2014) Predicting unplanned readmission after myocardial infarction from routinely collected administrative hospital data. Aust Health Rev 38:377–382

    Article  Google Scholar 

  18. Saha B, Gupta S, Venkatesh S (2015) Prediciton of emergency events: a multi-task multi-label learning approach. In: Proceedings of PAKDD. Springer (accepted)

  19. Saha B, Pham DS, Phung D, Venkatesh S (2013) Sparse subspace clustering via group sparse coding. In: Proceedings of SDM, pp 130–138. SIAM

  20. Shen Y, Wen Z, Zhang Y (2014) Augmented lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim Methods Softw 29(2):239–263

    Article  MathSciNet  MATH  Google Scholar 

  21. Tran T, Luo W, Phung D, Gupta S, Rana S, Kennedy RL, Larkins A, Venkatesh S (2014) A framework for feature extraction from hospital medical data with applications in risk prediction. BMC Bioinformatics 15(1):6596

    Article  Google Scholar 

  22. Tran T, Phung D, Luo W, Venkatesh S (2014) Stabilized sparse ordinal regression for medical risk stratification. In: KAIS, pp 1–28

  23. Wang F, Sun J, Ebadollahi S (2011) Integrating distance metrics learned from multiple experts and its application in inter-patient similarity assessment. SDM 11:59–70

    Google Scholar 

  24. Wang F, Zhou J, Hu J (2014) Density transfer: a data driven approach for imputing electronic health records. In: Proceedings of ICPR, pp 2763–2768. IEEE

  25. Wu J, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Medicalcare 48(6):S106–S113

    Google Scholar 

  26. Yu K, Zhu S, Lafferty J, Gong Y (2009) Fast nonparametric matrix factorization for large-scale collaborative filtering. In: Proceedings of ACM SIGIR, pp 211–218. ACM

  27. Zhou J, Wang F, Hu J, Ye J (2014) From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records. In: proc. of ACM SIGKDD, pages 135–144. ACM

  28. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. Journal of computational and graphical statistics 15(2):265–286

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the Telstra-Deakin Center of Excellence in Big Data and Machine Learning.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Budhaditya Saha.

Appendix

Appendix

The optimization steps of supervised framework of modeling multiple risk outcomes: the formulation of the framework can be expressed as

$$\begin{aligned}&\min _{\varvec{\mathbf {U}},\varvec{\mathbf {V}},\varvec{\mathbf {E}},\varvec{\mathbf {Z}}\ge \mathbf {0}}||\varvec{\mathbf {E}}||_{F}^{2}+\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-\mathbf {H}^{T}\mathbf {G}||_{F}^{2}++\lambda _{uv}||\varvec{\mathbf {Z}}||_{1}+\frac{\lambda }{2}\bigg [||\varvec{\mathbf {U}}\Vert _{F}^{2}+||\varvec{\mathbf {V}}\Vert _{F}^{2}\bigg ]+\lambda _{w}||\mathbf {W}||_{*}\nonumber \\&\quad \hbox {s.t. }\varvec{\mathbf {X}}-\varvec{\mathbf {U}}\varvec{\mathbf {V}}=\varvec{\mathbf {E}},\varvec{\mathbf {U}}\varvec{\mathbf {V}}=\varvec{\mathbf {Z}},\mathbf {H}=\varvec{\mathbf {U}}\varvec{\mathbf {V}},\mathbf {G}=\mathbf {W}\end{aligned}$$
(27)

The Augmented Lagrangian of the above formulation can be expressed as

$$\begin{aligned} \mathcal {F}&=||\varvec{\mathbf {E}}||_{F}^{2}+\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-\mathbf {H}^{T}\mathbf {G}||_{F}^{2}+\lambda _{uv}||\varvec{\mathbf {Z}}||_{1}+\frac{\lambda }{2}\bigg [||\varvec{\mathbf {U}}||_{F}^{2}+||\varvec{\mathbf {V}}||_{F}^{2}\bigg ]++\text {tr}(\mathbf {L}^{T}(\varvec{\mathbf {U}}\varvec{\mathbf {V}}+\varvec{\mathbf {E}}\nonumber \\&\quad -\varvec{\mathbf {X}}))+\frac{\beta }{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}+\varvec{\mathbf {E}}-\varvec{\mathbf {X}}||_{F}^{2}+\text {tr}(\mathbf {Q}^{T}(\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\varvec{\mathbf {Z}}))+\frac{\beta }{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\varvec{\mathbf {Z}}||_{F}^{2}\nonumber \\&\quad +\frac{\beta }{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\mathbf {H}||_{F}^{2}+\text {tr}(\varvec{{{\varLambda }}}_{1}^{T}(\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\mathbf {H}))+\frac{\beta }{2}||\mathbf {G}-\mathbf {W}||_{F}^{2}+\text {tr}(\varvec{{{\varLambda }}}_{2}^{T}(\mathbf {G}-\mathbf {W}))\nonumber \\&\quad +\lambda _{w}||\mathbf {W}||_{*} \end{aligned}$$
(28)
$$\begin{aligned} \hbox {s.t.}&\quad \varvec{\mathbf {Z}}\ge \mathbf {0} \end{aligned}$$
(29)

where \(\mathbf {L}\),\(\mathbf {Q}\),\(\varvec{{{\varLambda }}}_{1}\) and \(\varvec{{{\varLambda }}}_{2}\) are the Lagrange multipliers and \(\beta \) is a parameter to improve the numerical stability of the algorithm. After some manipulation, the Eq. (29) becomes

$$\begin{aligned} \mathcal {F}&=||\varvec{\mathbf {E}}||_{F}^{2}+\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-\mathbf {H}^{T}\mathbf {G}||_{F}^{2}+\lambda _{uv}||\varvec{\mathbf {Z}}||_{1}+\frac{\lambda }{2}\bigg [||\varvec{\mathbf {U}}||_{F}^{2}+||\varvec{\mathbf {V}}||_{F}^{2}\bigg ]\\&\quad +\frac{\beta }{2}\bigg [||\varvec{\mathbf {U}}\varvec{\mathbf {V}}+\varvec{\mathbf {E}}-\varvec{\mathbf {X}}-\frac{\mathbf {L}}{\beta }||_{F}^{2}\\&\quad +||\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\varvec{\mathbf {Z}}-\frac{\mathbf {Q}}{\beta }||_{F}^{2}+||\varvec{\mathbf {U}}\varvec{\mathbf {V}}-\mathbf {H}-\frac{\varvec{{{\varLambda }}}_{1}}{\beta }||_{F}^{2}+||\mathbf {G}-\mathbf {W}-\frac{\varvec{{{\varLambda }}}_{2}}{\beta }||_{F}^{2}\bigg ] \end{aligned}$$

Assume \(\varvec{\mathbf {C}}_{1}^{t}=\varvec{\mathbf {X}}+\frac{\mathbf {L}^{t}}{\beta }-\varvec{\mathbf {S}}^{t}\) and \(\varvec{\mathbf {C}}_{2}^{t}=\varvec{\mathbf {Z}}^{k}+\frac{\mathbf {Q}^{t}}{\beta }\) and \(\varvec{\mathbf {C}}_{3}^{t}=\mathbf {G}^{t}+\frac{\varvec{\mathbf {E}}_{1}^{t}}{\beta }\), the updating steps of the variables are as following

$$\begin{aligned} \varvec{\mathbf {V}}^{t+1}&=\min _{\varvec{\mathbf {V}}}\frac{1}{2}||\varvec{\mathbf {U}}^{t}\varvec{\mathbf {V}}-\varvec{\mathbf {C}}_{1}^{t}||_{F}^{2}+\frac{1}{2}||\varvec{\mathbf {U}}^{t}\varvec{\mathbf {V}}-\varvec{\mathbf {C}}_{2}^{t}||_{F}^{2}+\frac{1}{2}||\varvec{\mathbf {U}}^{t}\varvec{\mathbf {V}}-\varvec{\mathbf {C}}_{3}^{t}||_{F}^{2}+\frac{\lambda }{2\beta }||\varvec{\mathbf {V}}||_{F}^{2}.\\ \varvec{\mathbf {U}}^{t+1}&=\min _{\varvec{\mathbf {U}}}\frac{1}{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}^{t+1}-\varvec{\mathbf {C}}_{1}^{t}||_{F}^{2}+\frac{1}{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}^{t+1}-\varvec{\mathbf {C}}_{2}^{t}||_{F}^{2}+\frac{1}{2}||\varvec{\mathbf {U}}\varvec{\mathbf {V}}^{t+1}-\varvec{\mathbf {C}}_{3}^{t}||_{F}^{2}+\frac{\lambda }{2\beta }||\varvec{\mathbf {U}}||_{F}^{2}\\ \varvec{\mathbf {Z}}^{t+1}&=\min _{\varvec{\mathbf {Z}}\ge \mathbf {0}}\lambda _{uv}||\varvec{\mathbf {Z}}||_{1}+\frac{\beta }{2}||\varvec{\mathbf {Z}}-(\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1}-\frac{\mathbf {Q}^{t}}{\beta })||_{F}^{2}\\ \varvec{\mathbf {E}}^{t+1}&=\min _{\varvec{\mathbf {S}}}||\varvec{\mathbf {S}}||_{F}^{2}+\frac{\beta }{2}||\varvec{\mathbf {S}}-(\varvec{\mathbf {X}}+\frac{\mathbf {L}^{t}}{\beta }-\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1})||_{F}^{2}\\ \mathbf {H}^{t+1}&=\min _{\mathbf {H}}\bigg [\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-\mathbf {H}^{T}\mathbf {G}^{t}||_{F}^{2}+||\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1}-\mathbf {H}-\frac{\varvec{{{\varLambda }}}_{1}^{t}}{\beta }||_{F}^{2}\bigg ]\\ \mathbf {G}^{t+1}&=\min _{\mathbf {G}}\bigg [\frac{\lambda _{y}}{2}||\varvec{\mathbf {Y}}-(\mathbf {H}^{t+1})^{T}\mathbf {G}||_{F}^{2}+||\mathbf {G}-\mathbf {W}^{t}-\frac{\varvec{{{\varLambda }}}_{2}^{t}}{\beta }||_{F}^{2}\bigg ]\\ \mathbf {W}^{t+1}&=\min _{\mathbf {W}}\lambda _{w}||\mathbf {W}||_{*}+||\mathbf {G}^{t+1}-\frac{\varvec{{{\varLambda }}}_{2}^{t}}{\beta }-\mathbf {W}||_{F}^{2}\\ \mathbf {L}^{t+1}&=\mathbf {L}^{t}+\beta (\varvec{\mathbf {X}}-\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1}-\varvec{\mathbf {S}}^{t+1})\\ \mathbf {Q}^{t+1}&=\mathbf {Q}^{t}+\beta (\varvec{\mathbf {Z}}^{t+1}-\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1})\\ \varvec{{{\varLambda }}}_{1}^{t+1}&=\varvec{{{\varLambda }}}_{1}^{t}+\beta (\varvec{\mathbf {U}}^{t+1}\varvec{\mathbf {V}}^{t+1}-\mathbf {H}^{t+1})\\ \varvec{{{\varLambda }}}_{2}^{t+1}&=\varvec{{{\varLambda }}}_{2}^{t}+\beta (\mathbf {G}^{t+1}-\mathbf {W}^{t+1}) \end{aligned}$$

As updating variables \(\varvec{\mathbf {U}}\), \(\varvec{\mathbf {V}}\), \(\varvec{\mathbf {Z}}\), \(\varvec{\mathbf {E}}\), \(\mathbf {H}\), \(\mathbf {G}\) and the Lagrange multipliers are similar to optimization methods of unsupervised framework (Sect. 3.3), for \(\mathbf {W}\) with nuclear norm, the solution is given by

$$\begin{aligned}&\mathcal {P}(\mathbf {A},\lambda _{w})=\min _{\mathbf {W}}\bigg [||\mathbf {A}-\mathbf {W}||_{F}^{2}+\lambda _{w}||\mathbf {W}||_{*}\bigg ] \end{aligned}$$

where \(\mathbf {A}=\mathbf {G}^{t+1}-\frac{\varvec{{{\varLambda }}}_{2}^{t}}{\beta }\). If \(\varvec{\mathbf {U}}_{\ell }\),\(\varvec{\mathbf {V}}_{r}\) are the left and right singular vectors of \(\mathbf {A}\) and \(\varvec{{{\varSigma }}}_{\mathbf {A}}\) is diagonal matrix of singular values, then \(\mathcal {P}(\mathbf {A},\lambda _{w})=\varvec{\mathbf {U}}_{\ell }\varvec{{{\varSigma }}}_{(\mathbf {A},\lambda _{w})}\varvec{\mathbf {V}}_{\ell }^{T}\) where \(diag(\varvec{{{\varSigma }}}_{(\mathbf {A},\lambda _{w})})=\max (0,diag(\varvec{{{\varSigma }}}_{\mathbf {A}}-\lambda _{w})).\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, B., Gupta, S., Phung, D. et al. Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions. Knowl Inf Syst 53, 179–206 (2017). https://doi.org/10.1007/s10115-017-1038-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1038-0

Keywords

Navigation