skip to main content
10.1145/2339530.2339712acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

SympGraph: a framework for mining clinical notes through symptom relation graphs

Published: 12 August 2012 Publication History

Abstract

As an integral part of Electronic Health Records (EHRs), clinical notes pose special challenges for analyzing EHRs due to their unstructured nature. In this paper, we present a general mining framework SympGraph for modeling and analyzing symptom relationships in clinical notes.
A SympGraph has symptoms as nodes and co-occurrence relations between symptoms as edges, and can be constructed automatically through extracting symptoms over sequences of clinical notes for a large number of patients. We present an important clinical application of SympGraph: symptom expansion, which can expand a given set of symptoms to other related symptoms by analyzing the underlying SympGraph structure. We further propose a matrix update algorithm which provides a significant computational saving for dynamic updates to the graph. Comprehensive evaluation on 1 million longitudinal clinical notes over 13K patients shows that static symptom expansion can successfully expand a set of known symptoms to a disease with high agreement rate with physician input (average precision 0.46), a 31% improvement over baseline co-occurrence based methods. The experimental results also show that the expanded symptoms can serve as useful features for improving AUC measure for disease diagnosis prediction, thus confirming the potential clinical value of our work.

Supplementary Material

JPG File (310_t_talk_12.jpg)
MP4 File (310_t_talk_12.mp4)

References

[1]
T. Bailloeul, C. Zhu, and Y. Xu. Automatic image tagging as a random walk with priors on the canonical correlation subspace. In MIR'08, pages 75--82, New York, NY, USA, 2008. ACM.
[2]
S. Doan and H. Xu. Recognizing medication related entities in hospital discharge summaries using support vector machine. In COLING (Posters), pages 259--266, 2010.
[3]
G.H. Golub and C.F.V. Loan. Matrix Computation. Johns Hopkins, 1996.
[4]
H. Harkema, I. Roberts, R. Gaizauskas, and M. Hepple. Information extraction from clinical records. In Proceedings of the 4th UK e-Science All Hands Meeting, 2005. Nottingham, UK.
[5]
T.H. Haveliwala. Topic-sensitive pagerank. In WWW, pages 517--526, 2002.
[6]
B.E. Himes, Y. Dai, I.S. Kohane, S.T. Weiss, and M.F. Ramoni. Prediction of chronic obstructive pulmonary disease (copd) in asthma patients using electronic medical records. JAMIA, 16(3):371--379, 2009.
[7]
K. Jones and C.J.V. Rijsbergen. Report on the need for and provision of an "ideal" information retrieval test collection. Technical Report British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge, 1975.
[8]
J.M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[9]
C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010.
[10]
D. Lizorkin, P. Velikhov, M.N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. PVLDB, 1(1):422--433, 2008.
[11]
P.A. McKee, W.P. Castelli, P.M. McNamara, and W.B. Kannel. The natural history of congestive heart failure: The framingham study. N Engl J Med., 285:1441--1446, 1971.
[12]
S. Meystre, G. Savova, K.K. Schuler, and J. Hurdle. Extracting information from textual documents in the electronic health record: A review of recent research. IMIA Yearbook of Medical Informatics Methods Inf Med 2008, 2008. 47 Suppl 1:128--44.
[13]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[14]
A. Pathak, S. Chakrabarti, and M.S. Gupta. Index design for dynamic personalized pagerank. In ICDE, pages 1489--1491, 2008.
[15]
W.W. Piegorsch and G.E. Casella. Inverting a sum of matrices. In SIAM Review, volume 32, pages 470--470, 1990.
[16]
A.R. Post and J.H. Harrison Jr. Protempa: A method for specifying and identifying temporal sequences in retrospective data for patient selection. JAMIA, 14(5):674--683, 2007.
[17]
A.D. Sarma, S. Gollapudi, and R. Panigrahy. Estimating pagerank on graph streams. J. ACM, 58(3):13, 2011.
[18]
G.K. Savova, J.J. Masanz, P.V. Ogren, J. Zheng, S. Sohn, K.K. Schuler, and C.G. Chute. Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. JAMIA, 17(5):507--513, 2010.
[19]
Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In John F. Elder IV, Françoise Fogelman-Soulié, Peter A. Flach, and Mohammed Javeed Zaki, editors, KDD, pages 797--806. ACM, 2009.
[20]
H. Tong, C. Faloutsos, and J.Y. Pan. Random walk with restart: fast solutions and applications. Knowl. Inf. Syst., 14:327--346, March 2008.
[21]
H. Tong, S. Papadimitriou, P.S. Yu, and C. Faloutsos. Proximity tracking on time-evolving bipartite graphs. In SDM, pages 704--715, 2008.
[22]
Y. Wang. Annotating and recognising named entities in clinical notes. In ACL/AFNLP (Student Research Workshop), pages 18--26. The Association for Computer Linguistics, 2009.
[23]
H.Xu, S.P. Stenner, S. Doan, K.B. Johnson, L.R. Waitman, and J.C. Denny. Medex: a medication information extraction system for clinical narratives. Journal of American Medical Informatics Association, 17(1):19--24, Jan-Feb 2010.

Cited By

View all
  • (2023)Fusion Model for Tentative Diagnosis Inference Based on Clinical NarrativesTsinghua Science and Technology10.26599/TST.2022.901004928:4(686-695)Online publication date: Aug-2023
  • (2023)SymptomGraph: Identifying Symptom Clusters from Narrative Clinical Notes using Graph ClusteringProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577685(518-527)Online publication date: 27-Mar-2023
  • (2022)A Case Study on Coronary Heart Disease using Machine Learning TechniquesInternational Journal of Health Sciences and Pharmacy10.47992/IJHSP.2581.6411.0091(149-165)Online publication date: 16-Nov-2022
  • Show More Cited By

Index Terms

  1. SympGraph: a framework for mining clinical notes through symptom relation graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2012
    1616 pages
    ISBN:9781450314626
    DOI:10.1145/2339530
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. patient records
    2. physician notes
    3. random walk
    4. symptom graphs

    Qualifiers

    • Research-article

    Conference

    KDD '12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Fusion Model for Tentative Diagnosis Inference Based on Clinical NarrativesTsinghua Science and Technology10.26599/TST.2022.901004928:4(686-695)Online publication date: Aug-2023
    • (2023)SymptomGraph: Identifying Symptom Clusters from Narrative Clinical Notes using Graph ClusteringProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577685(518-527)Online publication date: 27-Mar-2023
    • (2022)A Case Study on Coronary Heart Disease using Machine Learning TechniquesInternational Journal of Health Sciences and Pharmacy10.47992/IJHSP.2581.6411.0091(149-165)Online publication date: 16-Nov-2022
    • (2022)DyHealthProceedings of the VLDB Endowment10.14778/3554821.355483515:12(3445-3458)Online publication date: 1-Aug-2022
    • (2022)Using classification and visualization to support clinical texts review in electronic clinical documentationProceedings of the 6th International Conference on Medical and Health Informatics10.1145/3545729.3545746(78-84)Online publication date: 13-May-2022
    • (2022)How Severe is Your COVID-19? Predicting SARS-CoV-2 Infection with Graph Attention Capsule Networks2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00121(751-754)Online publication date: Nov-2022
    • (2021)A Computational Framework to Analyze the Associations Between Symptoms and Cancer Patient Attributes Post Chemotherapy Using EHR DataIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2021.311723825:11(4098-4109)Online publication date: Nov-2021
    • (2020)Graph-Based Natural Language Processing for the Pharmaceutical IndustryProvenance in Data Science10.1007/978-3-030-67681-0_6(75-110)Online publication date: 28-Dec-2020
    • (2019)Identifying Symptom Clusters in Breast Cancer and Colorectal Cancer Patients using EHR DataProceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3307339.3342164(405-413)Online publication date: 4-Sep-2019
    • (2019)Naranjo Question Answering using End-to-End Multi-task Learning ModelProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330770(2547-2555)Online publication date: 25-Jul-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media