research-article

LSAN: Modeling Long-term Dependencies and Short-term Correlations with Hierarchical Attention for Risk Prediction

Authors:
Muchao Ye

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Junyu Luo

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Cao Xiao

IQVIA, Cambridge, MA, USA

IQVIA, Cambridge, MA, USA
View Profile

,
Fenglong Ma

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020Pages 1753–1762https://doi.org/10.1145/3340531.3411864

Published:19 October 2020Publication History

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 1753–1762

ABSTRACT

Risk prediction using electronic health records (EHR) is a challenging data mining task due to the two-level hierarchical structure of EHR data. EHR data consist of a set of time-ordered visits, and within each visit, there is a set of unordered diagnosis codes. Existing approaches focus on modeling temporal visits with deep neural network (DNN) techniques. However, they ignore the importance of modeling diagnosis codes within visits, and a lot of task-unrelated information within visits usually leads to unsatisfactory performance of existing approaches. To minimize the effect caused by noise information of EHR data, in this paper, we propose a novel DNN for risk prediction termed as LSAN, which consists of a Hierarchical Attention Module (HAM) and a Temporal Aggregation Module (TAM). Particularly, LSAN applies HAM to model the hierarchical structure of EHR data. Using the attention mechanism in the hierarchy of diagnosis code, HAM is able to retain diagnosis details and assign flexible attention weights to different diagnosis codes by their relevance to corresponding diseases. Moreover, the attention mechanism in the hierarchy of visit learns a comprehensive feature throughout the visit history by paying greater attention to visits with higher relevance. Based on the foundation laying by HAM, TAM uses a two-pathway structure to learn a robust temporal aggregation mechanism among all visits for LSAN. It extracts long-term dependencies by a Transformer encoder and short-term correlations by a parallel convolutional layer among different visits. With the construction of HAM and TAM, LSAN achieves the state-of-the-art performance on three real-world datasets with larger AUCs, recalls and F1 scores. Furthermore, the model analysis results demonstrate the effectiveness of the network construction with good interpretability and robustness of decision making by LSAN.

Supplemental Material

3340531.3411864.mp4

mp4

7.4 MB

Download

References

Harun Akar, Gulcan Coskun Akar, Juan Jesús Carrero, Peter Stenvinkel, and Bengt Lindholm. 2011. Systemic consequences of poor oral health in chronic kidney disease patients. Clinical Journal of the American Society of Nephrology, Vol. 6, 1 (2011), 218--226.Google ScholarCross Ref
Tian Bai, Shanshan Zhang, Brian L Egleston, and Slobodan Vucetic. 2018. Interpretable representation learning for healthcare via capturing disease progression through time. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 43--51.Google ScholarDigital Library
Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. 2017. Patient subtyping via time-aware LSTM networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 65--74.Google ScholarDigital Library
Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk prediction with electronic health records: A deep learning approach. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 432--440.Google ScholarCross Ref
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, and Jimeng Sun. 2017. GRAM: graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 787--795.Google ScholarDigital Library
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems. 3504--3512.Google ScholarDigital Library
Edward Choi, Cao Xiao, Walter F. Stewart, and Jimeng Sun. 2018. MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 4552--4562.Google Scholar
Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014.Google Scholar
Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2017. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, 4 (April 2017), 677--691. https://doi.org/10.1109/TPAMI.2016.2599174Google ScholarDigital Library
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters, Vol. 27, 8 (2006), 861--874.Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.Google Scholar
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).Google ScholarDigital Library
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980Google Scholar
Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, and Jaegul Choo. 2018. Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE transactions on visualization and computer graphics, Vol. 25, 1 (2018), 299--309.Google Scholar
Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by randomForest. R news, Vol. 2, 3 (2002), 18--22.Google Scholar
Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1903--1911.Google ScholarDigital Library
Fenglong Ma, Jing Gao, Qiuling Suo, Quanzeng You, Jing Zhou, and Aidong Zhang. 2018a. Risk prediction on electronic health records with prior medical knowledge. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1910--1919.Google ScholarDigital Library
Fenglong Ma, Yaqing Wang, Houping Xiao, Ye Yuan, Radha Chitta, Jing Zhou, and Jing Gao. 2018b. A general framework for diagnosis prediction via incorporating medical code descriptions. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 1070--1075.Google ScholarCross Ref
Fenglong Ma, Quanzeng You, Houping Xiao, Radha Chitta, Jing Zhou, and Jing Gao. 2018 d. KAME: Knowledge-Based Attention Model for Diagnosis Prediction in Healthcare. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). Association for Computing Machinery, New York, NY, USA, 743--752.Google ScholarDigital Library
Tengfei Ma, Cao Xiao, and Fei Wang. 2018c. Health-atm: A deep architecture for multifaceted patient health record representation and risk prediction. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 261--269.Google ScholarCross Ref
Francc ois Madore. 2009. Periodontal disease: a modifiable risk factor for cardiovascular disease in ESRD patients? Kidney international, Vol. 75, 7 (2009), 672--674.Google ScholarCross Ref
Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel T Dudley. 2018. Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics, Vol. 19, 6 (2018), 1236--1246.Google Scholar
Olivia M Murton, Robert E Hillman, Daryush D Mehta, Marc Semigran, Maureen Daher, Thomas Cunningham, Karla Verkouw, Sara Tabtabai, Johannes Steiner, G William Dec, et al. 2017. Acoustic speech analysis of patients with decompensated heart failure: A pilot study. The Journal of the Acoustical Society of America, Vol. 142, 4 (2017), EL401--EL407.Google ScholarCross Ref
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfGoogle ScholarDigital Library
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, Vol. 12 (2011), 2825--2830.Google Scholar
Trang Pham, Truyen Tran, Dinh Phung, and Svetha Venkatesh. 2016. Deepcare: A deep dynamic memory model for predictive medicine. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 30--41.Google ScholarDigital Library
George AF Seber and Alan J Lee. 2012. Linear regression analysis. Vol. 329. John Wiley & Sons.Google Scholar
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, Vol. 15, 1 (2014), 1929--1958.Google Scholar
Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters, Vol. 9, 3 (1999), 293--300.Google ScholarDigital Library
Qingxiong Tan, Andy Jinhua Ma, Mang Ye, Baoyao Yang, Huiqi Deng, Vincent Wai-Sun Wong, Yee-Kit Tse, Terry Cheuk-Fung Yip, Grace Lai-Hung Wong, Jessica Yuet-Ling Ching, et al. 2019. UA-CRNN: Uncertainty-Aware Convolutional Recurrent Neural Network for Mortality Risk Prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 109--118.Google ScholarDigital Library
Reimar W Thomsen, Nongyao Kasatpibal, Anders Riis, Mette Nørgaard, and Henrik T Sørensen. 2008. The impact of pre-existing heart failure on pneumonia prognosis: population-based cohort study. Journal of general internal medicine, Vol. 23, 9 (2008), 1407.Google ScholarCross Ref
Cao Xiao, Edward Choi, and Jimeng Sun. 2018. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. Journal of the American Medical Informatics Association, Vol. 25, 10 (2018), 1419--1428.Google ScholarCross Ref
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 2048--2057. http://proceedings.mlr.press/v37/xuc15.htmlGoogle ScholarDigital Library
Changchang Yin, Rongjian Zhao, Buyue Qian, Xin Lv, and Ping Zhang. 2019. Domain Knowledge guided deep learning with electronic health records. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 738--747.Google ScholarCross Ref
Yi Zhou, Zimo Li, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2018. Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis. In International Conference on Learning Representations.Google Scholar

Index Terms

LSAN: Modeling Long-term Dependencies and Short-term Correlations with Hierarchical Attention for Risk Prediction
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Modeling long-term dependencies and short-term correlations in patient journey data with temporal attention networks for health prediction
BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

Building models for health prediction based on Electronic Health Records (EHR) has become an active research area. EHR patient journey data consists of patient time-ordered clinical events/visits from patients. Most existing studies focus on modeling ...
Read More
Attention-based recurrent neural networks for accurate short-term and long-term dissolved oxygen prediction
Highlights
- State-of-the-art attention-based RNN is developed for dissolved oxygen prediction.
Abstract
Accurate prediction of dissolved oxygen is important for the intelligent management and control in aquaculture. However, due to the interference of external factors and the irregularity of its own changes, it is still a difficult ...
Read More
Fuzzy modeling to predict short and long-term mortality among patients with Acute Kidney Injury
2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)
Acute kidney injury (AKI) affects 5 to 7% of all hospitalized patients, with a much higher incidence in the critically ill. Although AKI patients have increased risks of death in the intensive care units (ICU) and in the short term, few studies examined ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attention mechanism
data mining
electronic health records
temporal modeling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 720
  Total Downloads
- Downloads (Last 12 months)107
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LSAN: Modeling Long-term Dependencies and Short-term Correlations with Hierarchical Attention for Risk Prediction

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Modeling long-term dependencies and short-term correlations in patient journey data with temporal attention networks for health prediction

Attention-based recurrent neural networks for accurate short-term and long-term dissolved oxygen prediction

Fuzzy modeling to predict short and long-term mortality among patients with Acute Kidney Injury

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media