Abstract
Electronic medical records (EMRs) are being increasingly used for “risk” prediction. By “risks,” we denote outcomes such as emergency presentation, readmission, and the length of hospitalizations. However, EMR data analysis is complicated by missing entries. There are two reasons—the “primary reason for admission” is included in EMR, but the comorbidities (other chronic diseases) are left uncoded, and many zero values in the data are accurate, reflecting that a patient has not accessed medical facilities. A key challenge is to deal with the peculiarities of this data—unlike many other datasets, EMR is sparse, reflecting the fact that patients have some but not all diseases. We propose a novel model to fill-in these missing values and use the new representation for prediction of key hospital events. To “fill-in” missing values, we represent the feature-patient matrix as a product of two low-rank factors, preserving the sparsity property in the product. Intuitively, the product regularization allows sparse imputation of patient conditions reflecting common comorbidities across patients. We develop a scalable optimization algorithm based on Block coordinate descent method to find an optimal solution. We evaluate the proposed framework on two real-world EMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions). Our result shows that the AUC for 3-month emergency presentation prediction is improved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMI data. Similarly, AUC for 3-month emergency admission prediction from (0.730 to 0.752) for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method to a supervised model for predicting multiple related risk outcomes (e.g., emergency presentations and admissions in hospital over 3, 6, and 12 months period) in an integrated framework. The supervised model consistently outperforms state-of-the-art baseline methods.










Similar content being viewed by others
Notes
Ethics approval obtained through University and the Hospital—12/83.
References
Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR 6:1817–1853
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Cai J, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982
Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9(6):717–772
Hariharan B, Zelnik-Manor L, Varma M, Vishwanathan S (2010) Large scale max-margin multi-label classification with priors. In: Proceedings of ICML, pp 423–430
Ho JC, Ghosh J, Sun J (2014) Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In: Proceedings of ACM SIGKDD, pp 115–124. ACM
Hripcsak G, Albers DJ (2012) Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 20:117–121
Hu Y, Zhang D, Ye J, Li X, He X (2013) Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE PAMI 35(9):2117–2130
Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 13(6):395–405
Ji S, Tang L, Yu S, Ye J (2010) A shared-subspace learning framework for multi-label classification. TKDD 4(2):8
Keshavan R, Montanari A, Oh S (2009) Matrix completion from noisy entries. In: NIPS, pp 952–960
Lee DD, Sebastian SH (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp 556–562
Lee JD, Sun Y, Saunders MA (2014) Proximal newton-type methods for minimizing composite functions. SIAM J Optim 24(3):1420–1443
Luo D, Wang F, Sun J, Markatou M, Hu J, Ebadollahi S (2012) Sor: Scalable orthogonal regression for non-redundant feature selection and its healthcare applications. In: SIAM. SDM
Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. JMLR 11:2287–2322
Mnih A, Salakhutdinov R (2007) Probabilistic matrix factorization. In: Proceedings of NIPS, pp 1257–1264
Rana S, Tran T, Luo W, Phung D, Kennedy R, Venkatesh S (2014) Predicting unplanned readmission after myocardial infarction from routinely collected administrative hospital data. Aust Health Rev 38:377–382
Saha B, Gupta S, Venkatesh S (2015) Prediciton of emergency events: a multi-task multi-label learning approach. In: Proceedings of PAKDD. Springer (accepted)
Saha B, Pham DS, Phung D, Venkatesh S (2013) Sparse subspace clustering via group sparse coding. In: Proceedings of SDM, pp 130–138. SIAM
Shen Y, Wen Z, Zhang Y (2014) Augmented lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim Methods Softw 29(2):239–263
Tran T, Luo W, Phung D, Gupta S, Rana S, Kennedy RL, Larkins A, Venkatesh S (2014) A framework for feature extraction from hospital medical data with applications in risk prediction. BMC Bioinformatics 15(1):6596
Tran T, Phung D, Luo W, Venkatesh S (2014) Stabilized sparse ordinal regression for medical risk stratification. In: KAIS, pp 1–28
Wang F, Sun J, Ebadollahi S (2011) Integrating distance metrics learned from multiple experts and its application in inter-patient similarity assessment. SDM 11:59–70
Wang F, Zhou J, Hu J (2014) Density transfer: a data driven approach for imputing electronic health records. In: Proceedings of ICPR, pp 2763–2768. IEEE
Wu J, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Medicalcare 48(6):S106–S113
Yu K, Zhu S, Lafferty J, Gong Y (2009) Fast nonparametric matrix factorization for large-scale collaborative filtering. In: Proceedings of ACM SIGIR, pp 211–218. ACM
Zhou J, Wang F, Hu J, Ye J (2014) From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records. In: proc. of ACM SIGKDD, pages 135–144. ACM
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. Journal of computational and graphical statistics 15(2):265–286
Acknowledgements
This work is partially supported by the Telstra-Deakin Center of Excellence in Big Data and Machine Learning.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The optimization steps of supervised framework of modeling multiple risk outcomes: the formulation of the framework can be expressed as
The Augmented Lagrangian of the above formulation can be expressed as
where \(\mathbf {L}\),\(\mathbf {Q}\),\(\varvec{{{\varLambda }}}_{1}\) and \(\varvec{{{\varLambda }}}_{2}\) are the Lagrange multipliers and \(\beta \) is a parameter to improve the numerical stability of the algorithm. After some manipulation, the Eq. (29) becomes
Assume \(\varvec{\mathbf {C}}_{1}^{t}=\varvec{\mathbf {X}}+\frac{\mathbf {L}^{t}}{\beta }-\varvec{\mathbf {S}}^{t}\) and \(\varvec{\mathbf {C}}_{2}^{t}=\varvec{\mathbf {Z}}^{k}+\frac{\mathbf {Q}^{t}}{\beta }\) and \(\varvec{\mathbf {C}}_{3}^{t}=\mathbf {G}^{t}+\frac{\varvec{\mathbf {E}}_{1}^{t}}{\beta }\), the updating steps of the variables are as following
As updating variables \(\varvec{\mathbf {U}}\), \(\varvec{\mathbf {V}}\), \(\varvec{\mathbf {Z}}\), \(\varvec{\mathbf {E}}\), \(\mathbf {H}\), \(\mathbf {G}\) and the Lagrange multipliers are similar to optimization methods of unsupervised framework (Sect. 3.3), for \(\mathbf {W}\) with nuclear norm, the solution is given by
where \(\mathbf {A}=\mathbf {G}^{t+1}-\frac{\varvec{{{\varLambda }}}_{2}^{t}}{\beta }\). If \(\varvec{\mathbf {U}}_{\ell }\),\(\varvec{\mathbf {V}}_{r}\) are the left and right singular vectors of \(\mathbf {A}\) and \(\varvec{{{\varSigma }}}_{\mathbf {A}}\) is diagonal matrix of singular values, then \(\mathcal {P}(\mathbf {A},\lambda _{w})=\varvec{\mathbf {U}}_{\ell }\varvec{{{\varSigma }}}_{(\mathbf {A},\lambda _{w})}\varvec{\mathbf {V}}_{\ell }^{T}\) where \(diag(\varvec{{{\varSigma }}}_{(\mathbf {A},\lambda _{w})})=\max (0,diag(\varvec{{{\varSigma }}}_{\mathbf {A}}-\lambda _{w})).\)
Rights and permissions
About this article
Cite this article
Saha, B., Gupta, S., Phung, D. et al. Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions. Knowl Inf Syst 53, 179–206 (2017). https://doi.org/10.1007/s10115-017-1038-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1038-0