Skip to main content

Extracting Phenotypes from Patient Claim Records Using Nonnegative Tensor Factorization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8609))

Abstract

Electronic health records (EHRs) are becoming an increasingly important source of patient information. Unfortunately, EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use. Some recent studies have focused on EHR-derived phenotyping, which aims at mapping the EHR data to specific medical concepts; however, most of these approaches require labor intensive supervision from experienced clinical professionals.

In this paper, we use Limestone, a nonnegative tensor factorization method to derive phenotype candidates from claims data with virtually no human supervision. Limestone represents the interactions between diagnoses and procedures among patients naturally using tensors (a generalization of matrices). The resulting tensor factors are reported as phenotype candidates that automatically reveal patient clusters on specific diagnoses and procedures. To the best of our knowledge, this is the first study that successfully extracts useful phenotypes by applying sparse nonnegative tensor factorization to a large, public-domain EHR dataset covering a broad range of diseases. Our experiments demonstrate the interpretability and the promise of high-throughput phenotypes generated from tensor factorization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nature Reviews: Genetics 13(6), 395–405 (2012)

    Article  Google Scholar 

  2. Greengard, S.: A new model for healthcare. Communications of the ACM 56(2), 17–19 (2013)

    Article  Google Scholar 

  3. Savage, N.: Better medicine through machine learning. Communications of the ACM 55(1), 17–19 (2012)

    Article  Google Scholar 

  4. Hripcsak, G., Albers, D.J.: Next-generation phenotyping of electronic health records. Journal of the American Medical Informatics Association 20(1), 117–121 (2012)

    Article  Google Scholar 

  5. Denny, J.C., Bastarache, L., Ritchie, M.D., Carroll, R.J., Zink, R., Mosley, J.D., Field, J.R., Pulley, J.M., Ramirez, A.H., Bowton, E., Basford, M.A., Carrell, D.S., Peissig, P.L., Kho, A.N., Pacheco, J.A., Rasmussen, L.V., Crosslin, D.R., Crane, P.K., Pathak, J., Bielinski, S.J., Pendergrass, S.A., Xu, H., Hindorff, L.A., Li, R., Manolio, T.A., Chute, C.G., Chisholm, R.L., Larson, E.B., Jarvik, G.P., Brilliant, M.H., McCarty, C.A., Kullo, I.J., Haines, J.L., Crawford, D.C., Masys, D.R., Roden, D.M.: Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nature Biotechnology 31(12), 1102–1111 (2013)

    Article  Google Scholar 

  6. Newton, K.M., Peissig, P.L., Kho, A.N., Bielinski, S.J., Berg, R.L., Choudhary, V., Basford, M., Chute, C.G., Kullo, I.J., Li, R., Pacheco, J.A., Rasmussen, L.V., Spangler, L., Denny, J.C.: Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. Journal of the American Medical Informatics Association 20(e1), e147–e154 (2013)

    Google Scholar 

  7. McCarty, C.A., Chisholm, R.L., Chute, C.G., Kullo, I.J., Jarvik, G.P., Larson, E.B., Li, R., Masys, D.R., Ritchie, M.D., Roden, D.M., Struewing, J.P., Wolf, W.A.: eMERGE Team: The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Medical Genomics 4, 13 (2011)

    Article  Google Scholar 

  8. Overhage, J.M., Ryan, P.B., Reich, C.G., Hartzema, A.G., Stang, P.E.: Validation of a common data model for active safety surveillance research. Journal of the American Medical Informatics Association 19(1), 54–60 (2012)

    Article  Google Scholar 

  9. Hripcsak, G., Albers, D.J.: Correlating electronic health record concepts with healthcare process events. Journal of the American Medical Informatics Association 20(e2), e311–e318 (2013)

    Google Scholar 

  10. Chen, Y., Carroll, R.J., Hinz, E.R.M., Shah, A., Eyler, A.E., Denny, J.C., Xu, H.: Applying active learning to high-throughput phenotyping algorithms for electronic health records data. Journal of the American Medical Informatics Association 20(e2), e253–e259 (2013)

    Google Scholar 

  11. Ho, J.C., Ghosh, J., Steinhubl, S., Stewart, W., Denny, J.C., Malin, B.A., Sun, J.: Limestone: High-throughput candidate phenotype generation via tensor factorization. Journal of Biomedical Informatics (accepted)

    Google Scholar 

  12. Mørup, M.: Applications of tensor (multiway array) factorizations and decompositions in data mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1(1), 24–40 (2011)

    Google Scholar 

  13. Wang, D., Kong, S.: Feature selection from high-order tensorial data via sparse decomposition. Pattern Recognition Letters 33(13), 1695–1702 (2012)

    Article  Google Scholar 

  14. Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika 35(3), 283–319 (1970)

    Article  MATH  Google Scholar 

  15. Harshman, R.A.: Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics 16, 1–84 (1970)

    Google Scholar 

  16. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51(3), 455–500 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  17. Kang, U., Papalexakis, E., Harpale, A., Faloutsos, C.: Gigatensor: Scaling tensor analysis up by 100 times-algorithms and discoveries. In: Proceeding of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 316–324. ACM (2012)

    Google Scholar 

  18. Davidson, I., Gilpin, S., Carmichael, O., Walker, P.: Network discovery via constrained tensor analysis of fMRI data. In: Proceeding of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM (August 2013)

    Google Scholar 

  19. Lin, Y.R., Sun, J., Sundaram, H., Kelliher, A., Castro, P., Konuru, R.: Community discovery via metagraph factorization. ACM Transactions on Knowledge Discovery from Data (TKDD) 5(3) (August 2011)

    Google Scholar 

  20. Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative matrix and tensor factorizations: Applications to exploratory multi-way data analysis and blind source separation. Wiley (2009)

    Google Scholar 

  21. Chi, E.C., Kolda, T.G.: On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications 33(4), 1272–1299 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  22. Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4(1) (January 2012)

    Google Scholar 

  23. Centers for Disease Control and Prevention (CDC): Chronic diseases at a glance 2009. Technical report, CDC (2009)

    Google Scholar 

  24. Lochner, K.A., Cox, C.S.: Prevalence of multiple chronic conditions among Medicare beneficiaries, United State 2010. Preventing Chronic Disease: Public Health Research, Practice, and Policy (2013)

    Google Scholar 

  25. Hansen, S., Plantenga, T., Kolda, T.G.: Newton-Based Optimization for Nonnegative Tensor Factorizations. arXiv.org (April 2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ho, J.C., Ghosh, J., Sun, J. (2014). Extracting Phenotypes from Patient Claim Records Using Nonnegative Tensor Factorization. In: Ślȩzak, D., Tan, AH., Peters, J.F., Schwabe, L. (eds) Brain Informatics and Health. BIH 2014. Lecture Notes in Computer Science(), vol 8609. Springer, Cham. https://doi.org/10.1007/978-3-319-09891-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09891-3_14

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09890-6

  • Online ISBN: 978-3-319-09891-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics