skip to main content
10.1145/2623330.2623658acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization

Published: 24 August 2014 Publication History

Abstract

The rapidly increasing availability of electronic health records (EHRs) from multiple heterogeneous sources has spearheaded the adoption of data-driven approaches for improved clinical research, decision making, prognosis, and patient management. Unfortunately, EHR data do not always directly and reliably map to phenotypes, or medical concepts, that clinical researchers need or use. Existing phenotyping approaches typically require labor intensive supervision from medical experts. We propose Marble, a novel sparse non-negative tensor factorization method to derive phenotype candidates with virtually no human supervision. Marble decomposes the observed tensor into two terms, a bias tensor and an interaction tensor. The bias tensor represents the baseline characteristics common amongst the overall population and the interaction tensor defines the phenotypes. We demonstrate the capability of our proposed model on both simulated and patient data from a publicly available clinical database. Our results show that Marble derived phenotypes provide at least a 42.8% reduction in the number of non-zero element and also retains predictive power for classification purposes. Furthermore, the resulting phenotypes and baseline characteristics from real EHR data are consistent with known characteristics of the patient population. Thus it can potentially be used to rapidly characterize, predict, and manage a large number of diseases, thereby promising a novel, data-driven solution that can benefit very large segments of the population.

Supplementary Material

MP4 File (p115-sidebyside.mp4)

References

[1]
B. W. Bader and T. G. Kolda. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing, 2007.
[2]
C. L. Byrne. Alternating Minimization as Sequential Unconstrained Minimization: A Survey. Journal of Optimization Theory and Applications, 156(3):554--566, Mar. 2013.
[3]
J. D. Carroll and J.-J. Chang. Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckart-Young" decomposition. Psychometrika, 35(3):283--319, 1970.
[4]
Centers for Disease Control and Prevention (CDC). Chronic diseases at a glance 2009. Technical report, CDC, Feb. 2009.
[5]
Y. Chen, R. J. Carroll, E. R. M. Hinz, A. Shah, A. E. Eyler, J. C. Denny, and H. Xu. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. JAMIA, 20(e2):e253--e259, Dec. 2013.
[6]
E. C. Chi and T. G. Kolda. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4):1272--1299, 2012.
[7]
A. Cichocki, R. Zdunek, S. Choi, R. Plemmons, and S.-I. Amari. Novel multi-layer non-negative tensor factorization with sparsity constraints. In ICANNGA 2007, pages 271--280. Springer, 2007.
[8]
A. Cichocki, R. Zdunek, A. H. Phan, and S.-I. Amari. Nonnegative matrix and tensor factorizations: Applications to exploratory multi-way data analysis and blind source separation. Wiley, 2009.
[9]
I. Davidson, S. Gilpin, O. Carmichael, and P. Walker. Network discovery via constrained tensor analysis of fMRI data. In KDD 2013, Aug. 2013.
[10]
J. C. Denny. Mining electronic health records in the genomics era. PLoS Computational Biology, 8(12):e1002823--e1002823, Dec. 2012.
[11]
N. Gillis and F. Glineur. Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization. Neural Computation, 24(4), Apr. 2012.
[12]
S. Hansen, T. Plantenga, and T. G. Kolda. Newton-Based Optimization for Nonnegative Tensor Factorizations. arXiv, Apr. 2013.
[13]
R. A. Harshman. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multimodal factor analysis. UCLA Working Papers in Phonetics, 16:1--84, 1970.
[14]
J. C. Ho, J. Ghosh, S. Steinhubl, W. Stewart, J. C. Denny, B. A. Malin, and J. Sun. Limestone: High-throughput candidate phenotype generation via tensor factorization. Journal of Biomedical Informatics, accepted.
[15]
G. Hripcsak and D. J. Albers. Next-generation phenotyping of electronic health records. JAMIA, 20(1):117--121, Dec. 2012.
[16]
U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos. Gigatensor: Scaling tensor analysis up by 100 times-algorithms and discoveries. In KDD 2012, pages 316--324, 2012.
[17]
T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455--500, 2009.
[18]
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, Oct. 1999.
[19]
C.-J. Lin. On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Transactions on Neural Networks, 18(6):1589--1596, Nov. 2007.
[20]
Y.-R. Lin, J. Sun, H. Sundaram, A. Kelliher, P. Castro, and R. Konuru. Community discovery via metagraph factorization. ACM Transactions on Knowledge Discovery from Data, 5(3), Aug. 2011.
[21]
M. Mørup. Applications of tensor (multiway array) factorizations and decompositions in data mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):24--40, 2011.
[22]
K. M. Newton, P. L. Peissig, A. N. Kho, S. J. Bielinski, R. L. Berg, V. Choudhary, M. Basford, C. G. Chute, I. J. Kullo, R. Li, J. A. Pacheco, L. V. Rasmussen, L. Spangler, and J. C. Denny. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. JAMIA, 20(e1):e147--e154, June 2013.
[23]
D. Wang and S. Kong. Feature selection from high-order tensorial data via sparse decomposition. Pattern Recognition Letters, 33(13):1695--1702, 2012.
[24]
Z. Xu, F. Yan, Yuan, and Qi. Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis. In ICML 2012, pages 1023--1030. Alan, 2012.

Cited By

View all
  • (2024)The Role of Predictive Analytics in Disease Prevention : A Technical OverviewInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2410617410:6(321-331)Online publication date: 8-Nov-2024
  • (2024)Neural Additive Tensor Decomposition for Sparse TensorsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679833(14-23)Online publication date: 21-Oct-2024
  • (2024)Tensor Kernel Learning for Classification of Alzheimer’s Conditions using Multimodal Data2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)10.1109/MAPR63514.2024.10661014(1-6)Online publication date: 15-Aug-2024
  • Show More Cited By

Index Terms

  1. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2014
    2028 pages
    ISBN:9781450329569
    DOI:10.1145/2623330
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. EHR phenotyping
    2. application
    3. dimensionality reduction
    4. tensor

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '14
    Sponsor:

    Acceptance Rates

    KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)The Role of Predictive Analytics in Disease Prevention : A Technical OverviewInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2410617410:6(321-331)Online publication date: 8-Nov-2024
    • (2024)Neural Additive Tensor Decomposition for Sparse TensorsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679833(14-23)Online publication date: 21-Oct-2024
    • (2024)Tensor Kernel Learning for Classification of Alzheimer’s Conditions using Multimodal Data2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)10.1109/MAPR63514.2024.10661014(1-6)Online publication date: 15-Aug-2024
    • (2024)FedPAR: Federated PARAFAC2 tensor factorization for computational phenotypingIISE Transactions on Healthcare Systems Engineering10.1080/24725579.2024.233326114:3(264-275)Online publication date: 8-Apr-2024
    • (2024)Tensor decompositions for count data that leverage stochastic and deterministic optimizationOptimization Methods and Software10.1080/10556788.2024.2401981(1-36)Online publication date: 24-Sep-2024
    • (2023)VecoCareProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/547(4921-4929)Online publication date: 19-Aug-2023
    • (2023)Creating High-Quality Synthetic Health Data: A Framework for Model Development and Validation (Preprint)JMIR Formative Research10.2196/53241Online publication date: 2-Oct-2023
    • (2023)Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor DecompositionACM Transactions on Parallel Computing10.1145/358031510:2(1-27)Online publication date: 20-Jun-2023
    • (2023)SeqCare: Sequential Training with External Medical Knowledge Graph for Diagnosis Prediction in Healthcare DataProceedings of the ACM Web Conference 202310.1145/3543507.3583543(2819-2830)Online publication date: 30-Apr-2023
    • (2023)Tensor Embedding: A Supervised Framework for Human Behavioral Data Mining and Prediction2023 IEEE 11th International Conference on Healthcare Informatics (ICHI)10.1109/ICHI57859.2023.00023(91-100)Online publication date: 26-Jun-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media