Daehr: A Discriminant Analysis Framework for Electronic Health Record Data and an Application to Early Detection of Mental Health Disorders

Published: 08 February 2017


Electronic health records (EHR) provide a rich source of temporal data that present a unique opportunity to characterize disease patterns and risk of imminent disease. While many data-mining tools have been adopted for EHR-based disease early detection, linear discriminant analysis (LDA) is one of the most commonly used statistical methods. However, it is difficult to train an accurate LDA model for early disease diagnosis when too few patients are known to have the target disease. Furthermore, EHR data are heterogeneous with significant noise. In such cases, the covariance matrices used in LDA are usually singular and estimated with a large variance.
This article presents Daehr, an extension of the LDA framework using electronic health record data to address these issues. Beyond existing LDA analyzers, we propose Daehr to (1) eliminate the data noise caused by the manual encoding of EHR data and (2) lower the variance of parameter (covariance matrices) estimation for LDA models when only a few patients’ EHR are available for training. To achieve these two goals, we designed an iterative algorithm to improve the covariance matrix estimation with embedded data-noise/parameter-variance reduction for LDA. We evaluated Daehr extensively using the College Health Surveillance Network, a large, real-world EHR dataset. Specifically, our experiments compared the performance of LDA to three baselines (i.e., LDA and its derivatives) in identifying college students at high risk for mental health disorders from 23 U.S. universities. Experimental results demonstrate Daehr significantly outperforms the three baselines by achieving 1.4%--19.4% higher accuracy and a 7.5%--43.5% higher F1-score.

Supplemental movie, appendix, image and software files for, Daehr: A Discriminant Analysis Framework for Electronic Health Record Data and an Application to Early Detection of Mental Health Disorders


    ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 3
    Special Issue: Mobile Social Multimedia Analytics in the Big Data Era and Regular Papers
    May 2017
    320 pages
    • Editor:
    • Yu Zheng
    Issue’s Table of Contents
    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 February 2017
    Accepted: 01 October 2016
    Revised: 01 July 2016
    Received: 01 December 2015
    Published in TIST Volume 8, Issue 3


    Author Tags

    1. Predictive models
    2. anxiety/depression
    3. early detection
    4. electronic health data
    5. temporal order


    Funding Sources

    University of Virginia Hobby Postdoctoral and Predoctoral Fellowships in Computational Science


