skip to main content
10.1145/3447548.3470789acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
abstract
Public Access

Advances in Mining Heterogeneous Healthcare Data

Published: 14 August 2021 Publication History

Abstract

Thanks to the explosion of heterogeneous healthcare data and advanced machine learning and data mining techniques, specifically deep learning methods, we now have an opportunity to make difference in healthcare. In this tutorial, we will present state-of-the-art deep learning methods and their real-world applications, specifically focusing on exploring the unique characteristics of different types of healthcare data. The first half will be spent on introducing recent advances in mining structured healthcare data, including computational phenotyping, disease early detection/risk prediction and treatment recommendation. In the second half, we will focus on challenges specific to the unstructured healthcare data, and introduce advanced deep learning methods in automated ICD coding, understandable medical language translation, clinical trial mining, and medical report generation. This tutorial is intended for students, engineers and researchers who are interested in applying deep learning methods to healthcare, and prerequisite knowledge will be minimal. The tutorial will be concluded with open problems and a Q&A session.

References

[1]
Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. 2017. Patient subtyping via time-aware lstm networks. In SIGKDD. 65--74.
[2]
Siddharth Biswal, Cao Xiao, Lucas M. Glass, Elizabeth Milkovits, and Jimeng Sun. 2020 a. Doctor2Vec: Dynamic Doctor Representation Learning for Clinical Trial Recruitment. In AAAI . 557--564.
[3]
Siddharth Biswal, Cao Xiao, Lucas M Glass, Brandon Westover, and Jimeng Sun. 2020 b. CLARA: Clinical Report Auto-completion. In The Web Conference. 541--550.
[4]
Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao, Shengping Liu, and Weifeng Chong. 2020. HyperCore: Hyperbolic and Co-graph Representation for Automatic ICD Coding. In ACL . 3105--3114.
[5]
Chacha Chen, Junjie Liang, Fenglong Ma, Lucas M Glass, Jimeng Sun, and Cao Xiao. 2021. UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data. In The Web Conference. 217--226.
[6]
Edward Choi, Mohammad Taha Bahadori, Joshua A Kulas, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016a. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. In NeurIPS. 3512--3520.
[7]
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016b. Doctor ai: Predicting clinical events via recurrent neural networks. In ML4H . 301--318.
[8]
Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles, Catherine Coffey, Michael Thompson, James Bost, Javier Tejedor-Sojo, and Jimeng Sun. 2016c. Multi-layer representation learning for medical concepts. In SIGKDD . 1495--1504.
[9]
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, and Jimeng Sun. 2017. GRAM: graph-based attention model for healthcare representation learning. In SIGKDD . 787--795.
[10]
Edward Choi, Cao Xiao, Walter F Stewart, and Jimeng Sun. 2018. MiME: multilevel medical embedding of electronic health records for predictive healthcare. In NeurIPS. 4552--4562.
[11]
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In CVPR . 2625--2634.
[12]
Junyi Gao, Rakshith Sharma, Cheng Qian, Lucas M Glass, Jeffrey Spaeder, Justin Romberg, Jimeng Sun, and Cao Xiao. 2021. STAN: spatio-temporal attention network for pandemic prediction using real-world evidence . JAMIA, Vol. 28, 4 (01 2021), 733--743.
[13]
Min Li, Zhihui Fei, Min Zeng, Fang-Xiang Wu, Yaohang Li, Yi Pan, and Jianxin Wang. 2018. Automated ICD-9 coding via a deep learning approach. TCBB, Vol. 16, 4 (2018), 1193--1202.
[14]
Junyu Luo, Muchao Ye, Cao Xiao, and Fenglong Ma. 2020 a. HiTANet: Hierarchical Time-Aware Attention Networks for Risk Prediction on Electronic Health Records. In SIGKDD. 647--656.
[15]
Junyu Luo, Zifei Zheng, Hanzhong Ye, Muchao Ye, Yaqing Wang, Quanzeng You, Cao Xiao, and Fenglong Ma. 2020 b. A Benchmark Dataset for Understandable Medical Language Translation. arXiv preprint arXiv:2012.02420 (2020).
[16]
Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In SIGKDD. 1903--1911.
[17]
Fenglong Ma, Jing Gao, Qiuling Suo, Quanzeng You, Jing Zhou, and Aidong Zhang. 2018a. Risk prediction on electronic health records with prior medical knowledge. In SIGKDD . 1910--1919.
[18]
Fenglong Ma, Quanzeng You, Houping Xiao, Radha Chitta, Jing Zhou, and Jing Gao. 2018b. Kame: Knowledge-based attention model for diagnosis prediction in healthcare. In CIKM . 743--752.
[19]
James Mullenbach, Sarah Wiegreffe, Jon Duke, Jimeng Sun, and Jacob Eisenstein. 2018. Explainable Prediction of Medical Codes from Clinical Text. In NAACL-HLT .
[20]
Junyuan Shang, Tengfei Ma, Cao Xiao, and Jimeng Sun. 2019 a. Pre-training of graph augmented transformers for medication recommendation. In IJCAI . 5953--5959.
[21]
Junyuan Shang, Cao Xiao, Tengfei Ma, Hongyan Li, and Jimeng Sun. 2019 b. Gamenet: Graph augmented memory networks for recommending medication combination. In AAAI . 1126--1133.
[22]
Wei-Hung Weng, Yu-An Chung, and Peter Szolovits. 2019. Unsupervised clinical language translation. In SIGKDD. 3121--3131.
[23]
Keyang Xu, Mike Lam, Jingzhi Pang, Xin Gao, Charlotte Band, Piyush Mathur, Frank Papay, Ashish K Khanna, Jacek B Cywinski, Kamal Maheshwari, et almbox. 2019. Multimodal machine learning for automated ICD coding. In ML4H. 197--215.
[24]
Yanbo Xu, Siddharth Biswal, Shriprasad R Deshpande, Kevin O Maher, and Jimeng Sun. 2018. Raim: Recurrent attentive and intensive model of multimodal patient monitoring data. In SIGKDD . 2565--2573.
[25]
Muchao Ye, Suhan Cui, Yaqing Wang, Junyu Luo, Cao Xiao, and Fenglong Ma. 2021. MedPath: Augmenting Health Risk Prediction via Medical Knowledge Paths. In The Web Conference . 1397--1409.
[26]
Muchao Ye, Junyu Luo, Cao Xiao, and Fenglong Ma. 2020. LSAN: Modeling Long-term Dependencies and Short-term Correlations with Hierarchical Attention for Risk Prediction. In CIKM. 1753--1762.
[27]
Jianbo Yuan, Haofu Liao, Rui Luo, and Jiebo Luo. 2019. Automatic radiology report generation based on multi-view image fusion and medical concept enrichment. In MICCAI. 721--729.
[28]
Xingyao Zhang, Cao Xiao, Lucas M Glass, and Jimeng Sun. 2020. DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction. In The Web Conference. 1029--1037.
[29]
Yutao Zhang, Robert Chen, Jie Tang, Walter F Stewart, and Jimeng Sun. 2017. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. In SIGKDD. 1315--1324.

Cited By

View all
  • (2025)Enhancing Decision-Making and Data Management in Healthcare: A Hybrid Ensemble Learning and Blockchain ApproachTechnologies10.3390/technologies1302004313:2(43)Online publication date: 23-Jan-2025
  • (2024)Analyzing recent trends in deep-learning approaches: a review on urban environmental hazards and disaster studies for monitoring, management, and mitigation toward sustainabilityInternational Journal on Smart Sensing and Intelligent Systems10.2478/ijssis-2024-001417:1Online publication date: 23-May-2024
  • (2024)MetaFed: Federated Learning Among Federations With Cyclic Knowledge Distillation for Personalized HealthcareIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.329710335:11(16671-16682)Online publication date: Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Check for updates

Author Tags

  1. deep learning
  2. electronic health records
  3. health analytics

Qualifiers

  • Abstract

Funding Sources

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)167
  • Downloads (Last 6 weeks)22
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Enhancing Decision-Making and Data Management in Healthcare: A Hybrid Ensemble Learning and Blockchain ApproachTechnologies10.3390/technologies1302004313:2(43)Online publication date: 23-Jan-2025
  • (2024)Analyzing recent trends in deep-learning approaches: a review on urban environmental hazards and disaster studies for monitoring, management, and mitigation toward sustainabilityInternational Journal on Smart Sensing and Intelligent Systems10.2478/ijssis-2024-001417:1Online publication date: 23-May-2024
  • (2024)MetaFed: Federated Learning Among Federations With Cyclic Knowledge Distillation for Personalized HealthcareIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.329710335:11(16671-16682)Online publication date: Nov-2024
  • (2023)Unveiling Thyroid Disease Associations: An Exceptionality-Based Data Mining TechniqueEndocrines10.3390/endocrines40300404:3(558-572)Online publication date: 28-Jul-2023
  • (2023)ECGGAN: A Framework for Effective and Interpretable Electrocardiogram Anomaly DetectionProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599812(5071-5081)Online publication date: 6-Aug-2023
  • (2023)Predicting line of therapy transition via similar patient augmentationJournal of Biomedical Informatics10.1016/j.jbi.2023.104511147(104511)Online publication date: Nov-2023
  • (2023)Cooperative dual medical ontology representation learning for clinical assisted decision-makingComputers in Biology and Medicine10.1016/j.compbiomed.2023.107138163(107138)Online publication date: Sep-2023
  • (2023)Deep Learning in Healthcare InformaticsComputational Intelligence for Clinical Diagnosis10.1007/978-3-031-23683-9_7(87-115)Online publication date: 6-Jun-2023
  • (2022)MedSkim: Denoised Health Risk Prediction via Skimming Medical Claims Data2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00018(81-90)Online publication date: Nov-2022
  • (2022)AUTOMED: Automated Medical Risk Predictive Modeling on Electronic Health Records2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM55620.2022.9995209(948-953)Online publication date: 6-Dec-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media