Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding

Nguyen, Dang; Luo, Wei; Venkatesh, Svetha; Phung, Dinh

doi:10.1007/s10916-018-0951-4

Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding

Patient Facing Systems
Published: 11 April 2018

Volume 42, article number 94, (2018)
Cite this article

Journal of Medical Systems Aims and scope Submit manuscript

Dang Nguyen¹,
Wei Luo¹,
Svetha Venkatesh¹ &
…
Dinh Phung¹

1058 Accesses
Explore all metrics

Abstract

Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) dataset

Article Open access 25 October 2021

NNBSVR: Neural Network-Based Semantic Vector Representations of ICD-10 codes

Article 21 February 2025

Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data

Article Open access 27 October 2021

Notes

The R codes of WVM and others are available at https://github.com/nphdang/WVM
Ethics approval was obtained from the NSW Population and Health Services Research Ethics Committee (AU RED Reference: HREC/15/CIPHS/1)
We exclude the runtime of learning ICD code vectors in CSM and our WVM since this task is negligible, which only takes 106 (second) in our experiment.

References

World Health Organization: International Classification of Diseases (ICD). http://www.who.int/classifications/icd/en/, 2013
World Health Organization: International statistical classification of diseases and related health problems 10th revision. [Online]. Available: http://apps.who.int/classifications/icd10/browse/2010/en, 2010
Australian Consortium for Classification Development: ICD-10-AM. [Online]. Available: https://www.accd.net.au/Icd10.aspx, 2017
O’Malley, K., Cook, K., Price, M., Wildes, K. R., Hurdle, J., and Ashton, C., Measuring diagnoses: ICD code accuracy. Health Serv. Res. 40:1620–1639, 2005.
Article PubMed PubMed Central Google Scholar
Wang, F., Hu, J., and Sun, J.: Medical prognosis based on patient similarity and expert feedback. In: The 21st International Conference on Pattern Recognition, pp. 1799–1802, IEEE, 2012.
Choi, E., Schuetz, A., Stewart, W. F., and Sun, J.: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv:1602.03686, 2016
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119, 2013.
Lee, J., Maslove, D.M., and Dubin, J., Personalized mortality prediction driven by electronic medical data and a patient similarity metric. PloS One 10(5):e0127428, 2015.
Article PubMed PubMed Central CAS Google Scholar
Carnaby-Mann, G., and Crary, M., Mcneill dysphagia therapy program: a case-control study. Arch. Phys. Med. Rehabil. 91(5):743–749, 2010.
Article PubMed Google Scholar
Hielscher, T., Spiliopoulou, M., Völzke, H., and Kühn, J.-P.: Using participant similarity for the classification of epidemiological data on hepatic steatosis. In: The 27th International Symposium on Computer-Based Medical Systems, pp. 1–7, IEEE, 2014.
Le, Q, and Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196, 2014.
Levy, O., Goldberg, Y., and Dagan, I., Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3:211–225, 2015.
Google Scholar
Grover, A, and Leskovec, J.: node2vec: scalable feature learning for networks in KDD. In: ACM, pp. 855–864, 2016.
Nguyen, D., Luo, W., Nguyen, T. D., Venkatesh, S., and Phung, D.: Learning graph representation via frequent subgraphs. In: SDM. Accepted, SIAM, 2018.
Moen, H., Ginter, F., Marsi, E., Peltonen, L.-M., Salakoski, T., and Salanterä, S., Care episode retrieval: distributional semantic models for information retrieval in the clinical domain. BMC Med. Inform. Decis. Mak. 15(2):1, 2015.
Google Scholar
Nguyen, P., Tran, T., Wickramasinghe, N., and Venkatesh, S., Deepr: a convolutional net for medical records. IEEE J. Biomed. Health Inform. 21(1):22–30, 2017.
Article PubMed Google Scholar
Choi, E., Bahadori, M. T., Searles, E., Coffey, C., Thompson, M., Bost, J., Tejedor-Sojo, J., and Sun. J.: Multi-layer representation learning for medical concepts in KDD. In: ACM, pp. 1495–1504, 2016.
Choi, Y., Chiu, C. Y.-I., and Sontag, D.: Learning low-dimensional representations of medical concepts. In: AMIA Summits on Translational Science Proceedings, pp. 41–51, 2016.
Mikolov, T., Chen, K., Corrado, G., and Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013
Pearce, N., Analysis of matched case-control studies. BMJ 352:i969, 2016.
Article PubMed PubMed Central Google Scholar
Nguyen, D., Luo, W., Phung, D., and Venkatesh, S.: Exceptional contrast set mining: moving beyond the deluge of the obvious. In: Australasian Joint Conference on Artificial Intelligence, pp. 455–468. Springer, Berlin, 2016.
Bigus, J., Campbell, M., Carmeli, B., Cefkin, M., Chang, H., Chen-Ritzo, C.-H., Cody, W., Ebadollahi, S., Evfimievski, A., Farkash, A., et al., Information technology for healthcare transformation. IBM Journal of Research and Development 55(5):6–20, 2011.
Article Google Scholar
Thomas, K., Rahman, M., Mor, V., and Intrator, O., Influence of hospital and nursing home quality on hospital readmissions. The American Journal of Managed Care 20(11):e523, 2014.
PubMed PubMed Central Google Scholar
Håkonsen, S., Pedersen, P., Bjerrum, M., Bygholm, A., and Peters, M., Nursing minimum data sets for documenting nutritional care for adults in primary healthcare: a scoping review. JBI Database of Systematic Reviews and Implementation Reports 16(1):117–139, 2018.
Article PubMed Google Scholar
Maaten, L. V. D., and Hinton, G., Visualizing data using t-sne. Journal of Machine Learning Research 9: 2579–2605, 2008.
Google Scholar
Futoma, J., Morris, J., and Lucas, J., A comparison of models for predicting early hospital readmissions. Journal of Biomedical Informatics 56:229–238, 2015.
Article PubMed Google Scholar
Pham, T., Tran, T., Phung, D., and Venkatesh, S., Deepcare: a deep dynamic memory model for predictive medicine in PAKDD, pp. 30–41. Berlin: Springer, 2016.
Google Scholar
Turgeman, L., May, J., and Sciulli, R., Insights from a machine learning model for predicting the hospital length of stay (los) at the time of admission. Expert Systems with Applications 78:376–385, 2017.
Article Google Scholar
Chaou, C.-H., Chen, H.-H., Chang, S.-H., Tang, P., Pan, S.-L., Yen, A. M.-F., and Chiu, T.-F., Predicting length of stay among patients discharged from the emergency departmentusing an accelerated failure time model. PloS One 12(1):e0165756, 2017.
Article PubMed PubMed Central CAS Google Scholar
Nguyen, D., Nguyen, T. D., Luo, W., and Venkatesh, S.: Trans2vec: learning transaction embedding via items and frequent itemsets. In: PAKDD. Accepted. Springer, Berlin, 2018.
Pobiedina, N., and Ichise, R., Citation count prediction as a link prediction problem. Applied Intelligence 44(2):252–268, 2016.
Article Google Scholar

Download references

Acknowledgments

This work is partially supported by the Telstra-Deakin Centre of Excellence (CoE) in Big Data and Machine Learning. Dinh Phung gratefully acknowledges the partial support from the Australian Research Council (ARC).

Author information

Authors and Affiliations

Centre for Pattern Recognition and Data Analytics, School of Information Technology, Deakin University, Geelong, Australia
Dang Nguyen, Wei Luo, Svetha Venkatesh & Dinh Phung

Authors

Dang Nguyen
View author publications
You can also search for this author inPubMed Google Scholar
Wei Luo
View author publications
You can also search for this author inPubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author inPubMed Google Scholar
Dinh Phung
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Dang Nguyen.

Ethics declarations

Conflict of Interest

The authors have no conflict of interest to declare.

Ethical Approval

Ethics approval was obtained from the New South Wales Population and Health Services Research Ethics Committee (AU RED Reference: HREC/15/CIPHS/1).

Informed Consent

This study is a secondary analysis of routinely collected data, and the consent had been obtained by the original data guarantor.

Additional information

This article is part of the Topical Collection on Patient Facing Systems

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, D., Luo, W., Venkatesh, S. et al. Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding. J Med Syst 42, 94 (2018). https://doi.org/10.1007/s10916-018-0951-4

Download citation

Received: 14 February 2018
Accepted: 26 March 2018
Published: 11 April 2018
DOI: https://doi.org/10.1007/s10916-018-0951-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Discovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) dataset

NNBSVR: Neural Network-Based Semantic Vector Representations of ICD-10 codes

Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Informed Consent

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now