skip to main content
10.1145/3340531.3411864acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

LSAN: Modeling Long-term Dependencies and Short-term Correlations with Hierarchical Attention for Risk Prediction

Authors Info & Claims
Published:19 October 2020Publication History

ABSTRACT

Risk prediction using electronic health records (EHR) is a challenging data mining task due to the two-level hierarchical structure of EHR data. EHR data consist of a set of time-ordered visits, and within each visit, there is a set of unordered diagnosis codes. Existing approaches focus on modeling temporal visits with deep neural network (DNN) techniques. However, they ignore the importance of modeling diagnosis codes within visits, and a lot of task-unrelated information within visits usually leads to unsatisfactory performance of existing approaches. To minimize the effect caused by noise information of EHR data, in this paper, we propose a novel DNN for risk prediction termed as LSAN, which consists of a Hierarchical Attention Module (HAM) and a Temporal Aggregation Module (TAM). Particularly, LSAN applies HAM to model the hierarchical structure of EHR data. Using the attention mechanism in the hierarchy of diagnosis code, HAM is able to retain diagnosis details and assign flexible attention weights to different diagnosis codes by their relevance to corresponding diseases. Moreover, the attention mechanism in the hierarchy of visit learns a comprehensive feature throughout the visit history by paying greater attention to visits with higher relevance. Based on the foundation laying by HAM, TAM uses a two-pathway structure to learn a robust temporal aggregation mechanism among all visits for LSAN. It extracts long-term dependencies by a Transformer encoder and short-term correlations by a parallel convolutional layer among different visits. With the construction of HAM and TAM, LSAN achieves the state-of-the-art performance on three real-world datasets with larger AUCs, recalls and F1 scores. Furthermore, the model analysis results demonstrate the effectiveness of the network construction with good interpretability and robustness of decision making by LSAN.

Skip Supplemental Material Section

Supplemental Material

3340531.3411864.mp4

mp4

7.4 MB

References

  1. Harun Akar, Gulcan Coskun Akar, Juan Jesús Carrero, Peter Stenvinkel, and Bengt Lindholm. 2011. Systemic consequences of poor oral health in chronic kidney disease patients. Clinical Journal of the American Society of Nephrology, Vol. 6, 1 (2011), 218--226.Google ScholarGoogle ScholarCross RefCross Ref
  2. Tian Bai, Shanshan Zhang, Brian L Egleston, and Slobodan Vucetic. 2018. Interpretable representation learning for healthcare via capturing disease progression through time. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 43--51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. 2017. Patient subtyping via time-aware LSTM networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 65--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk prediction with electronic health records: A deep learning approach. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 432--440.Google ScholarGoogle ScholarCross RefCross Ref
  5. Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, and Jimeng Sun. 2017. GRAM: graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 787--795.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems. 3504--3512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Edward Choi, Cao Xiao, Walter F. Stewart, and Jimeng Sun. 2018. MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 4552--4562.Google ScholarGoogle Scholar
  8. Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014.Google ScholarGoogle Scholar
  9. Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2017. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, 4 (April 2017), 677--691. https://doi.org/10.1109/TPAMI.2016.2599174Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters, Vol. 27, 8 (2006), 861--874.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.Google ScholarGoogle Scholar
  13. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980Google ScholarGoogle Scholar
  15. Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, and Jaegul Choo. 2018. Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE transactions on visualization and computer graphics, Vol. 25, 1 (2018), 299--309.Google ScholarGoogle Scholar
  16. Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by randomForest. R news, Vol. 2, 3 (2002), 18--22.Google ScholarGoogle Scholar
  17. Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1903--1911.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fenglong Ma, Jing Gao, Qiuling Suo, Quanzeng You, Jing Zhou, and Aidong Zhang. 2018a. Risk prediction on electronic health records with prior medical knowledge. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1910--1919.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fenglong Ma, Yaqing Wang, Houping Xiao, Ye Yuan, Radha Chitta, Jing Zhou, and Jing Gao. 2018b. A general framework for diagnosis prediction via incorporating medical code descriptions. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 1070--1075.Google ScholarGoogle ScholarCross RefCross Ref
  20. Fenglong Ma, Quanzeng You, Houping Xiao, Radha Chitta, Jing Zhou, and Jing Gao. 2018 d. KAME: Knowledge-Based Attention Model for Diagnosis Prediction in Healthcare. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). Association for Computing Machinery, New York, NY, USA, 743--752.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tengfei Ma, Cao Xiao, and Fei Wang. 2018c. Health-atm: A deep architecture for multifaceted patient health record representation and risk prediction. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 261--269.Google ScholarGoogle ScholarCross RefCross Ref
  22. Francc ois Madore. 2009. Periodontal disease: a modifiable risk factor for cardiovascular disease in ESRD patients? Kidney international, Vol. 75, 7 (2009), 672--674.Google ScholarGoogle ScholarCross RefCross Ref
  23. Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel T Dudley. 2018. Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics, Vol. 19, 6 (2018), 1236--1246.Google ScholarGoogle Scholar
  24. Olivia M Murton, Robert E Hillman, Daryush D Mehta, Marc Semigran, Maureen Daher, Thomas Cunningham, Karla Verkouw, Sara Tabtabai, Johannes Steiner, G William Dec, et al. 2017. Acoustic speech analysis of patients with decompensated heart failure: A pilot study. The Journal of the Acoustical Society of America, Vol. 142, 4 (2017), EL401--EL407.Google ScholarGoogle ScholarCross RefCross Ref
  25. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, Vol. 12 (2011), 2825--2830.Google ScholarGoogle Scholar
  27. Trang Pham, Truyen Tran, Dinh Phung, and Svetha Venkatesh. 2016. Deepcare: A deep dynamic memory model for predictive medicine. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 30--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. George AF Seber and Alan J Lee. 2012. Linear regression analysis. Vol. 329. John Wiley & Sons.Google ScholarGoogle Scholar
  29. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, Vol. 15, 1 (2014), 1929--1958.Google ScholarGoogle Scholar
  30. Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters, Vol. 9, 3 (1999), 293--300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Qingxiong Tan, Andy Jinhua Ma, Mang Ye, Baoyao Yang, Huiqi Deng, Vincent Wai-Sun Wong, Yee-Kit Tse, Terry Cheuk-Fung Yip, Grace Lai-Hung Wong, Jessica Yuet-Ling Ching, et al. 2019. UA-CRNN: Uncertainty-Aware Convolutional Recurrent Neural Network for Mortality Risk Prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 109--118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Reimar W Thomsen, Nongyao Kasatpibal, Anders Riis, Mette Nørgaard, and Henrik T Sørensen. 2008. The impact of pre-existing heart failure on pneumonia prognosis: population-based cohort study. Journal of general internal medicine, Vol. 23, 9 (2008), 1407.Google ScholarGoogle ScholarCross RefCross Ref
  33. Cao Xiao, Edward Choi, and Jimeng Sun. 2018. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. Journal of the American Medical Informatics Association, Vol. 25, 10 (2018), 1419--1428.Google ScholarGoogle ScholarCross RefCross Ref
  34. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 2048--2057. http://proceedings.mlr.press/v37/xuc15.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  35. Changchang Yin, Rongjian Zhao, Buyue Qian, Xin Lv, and Ping Zhang. 2019. Domain Knowledge guided deep learning with electronic health records. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 738--747.Google ScholarGoogle ScholarCross RefCross Ref
  36. Yi Zhou, Zimo Li, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2018. Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis. In International Conference on Learning Representations.Google ScholarGoogle Scholar

Index Terms

  1. LSAN: Modeling Long-term Dependencies and Short-term Correlations with Hierarchical Attention for Risk Prediction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
        October 2020
        3619 pages
        ISBN:9781450368599
        DOI:10.1145/3340531

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 October 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader