ABSTRACT
We apply deep learning to the problem of discovery and detection of characteristic patterns of physiology in clinical time series data. We propose two novel modifications to standard neural net training that address challenges and exploit properties that are peculiar, if not exclusive, to medical data. First, we examine a general framework for using prior knowledge to regularize parameters in the topmost layers. This framework can leverage priors of any form, ranging from formal ontologies (e.g., ICD9 codes) to data-derived similarity. Second, we describe a scalable procedure for training a collection of neural networks of different sizes but with partially shared architectures. Both of these innovations are well-suited to medical applications, where available data are not yet Internet scale and have many sparse outputs (e.g., rare diagnoses) but which have exploitable structure (e.g., temporal order and relationships between labels). However, both techniques are sufficiently general to be applied to other problems and domains. We demonstrate the empirical efficacy of both techniques on two real-world hospital data sets and show that the resulting neural nets learn interpretable and clinically relevant features.
- S. L. Achour, M. Dojat, C. Rieux, P. Bierling, and E. Lepage. A umls-based knowledge acquisition tool for rule-based clinical decision support system development. JAMIA, 8(4):351--360, 2001.Google Scholar
- R. K. Ando and T. Zhang. Learning on graph with laplacian regularization. NIPS, 2007.Google Scholar
- M. T. Bahadori, Q. R. Yu, and Y. Liu. Fast multivariate spatio-temporal analysis via low rank tensor learning. In NIPS, 2014.Google ScholarDigital Library
- P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. JMLR, 2003. Google ScholarDigital Library
- Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 2013. Google ScholarDigital Library
- Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In NIPS. 2007.Google Scholar
- Y. Bengio, G. Mesnil, Y. Dauphin, and S. Rifai. Better mixing via deep representations. In ICML, 2013.Google Scholar
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In SciPy, 2010.Google ScholarCross Ref
- O. Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic Acids Res., 2004.Google ScholarCross Ref
- R. Cornet and N. de Keizer. Forty years of snomed: a literature review. BMC medical informatics and decision making, 8(Suppl 1):S2, 2008.Google Scholar
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.Google ScholarCross Ref
- M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. de Freitas. Predicting parameters in deep learning. In NIPS. 2013.Google Scholar
- M. C. Díaz-Galiano, M. T. Martín-Valdivia, and L. Ureña-López. Query expansion with a medical ontology to improve a multimodal information retrieval system. Comput. Biol. Med., 2009. Google ScholarDigital Library
- A. Goldberger, L. N. Amaral, L. Glass, J. Hausdorff, P. Ivanov, R. Mark, J. Mietus, G. Moody, C. Peng, and H. Stanley. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation, 2000.Google Scholar
- A. Graves and N. Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1764--1772, 2014.Google ScholarDigital Library
- G. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Comput., 2006. Google ScholarDigital Library
- J. C. Ho, J. Ghosh, and J. Sun. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In KDD, 2014. Google ScholarDigital Library
- A. Hyvärinen and S. M. Smith. Pairwise likelihood ratios for estimation of non-gaussian structural equation models. JMLR, 2013. Google ScholarDigital Library
- D. Kale, Z. Che, Y. Liu, and R. Wetzel. Computational discovery of physiomes in critically ill children using deep learning. In DMMI Workshop, AMIA, 2014.Google Scholar
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.Google ScholarCross Ref
- T. A. Lasko, J. Denny, and M. Levy. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS ONE, 2013.Google Scholar
- B. Marlin, D. Kale, R. Khemani, and R. Wetzel. Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. In IHI, 2012. Google ScholarDigital Library
- W. H. Organization. International statistical classification of diseases and related health problems. 2004.Google Scholar
- J. Pearl. Causality: models, reasoning and inference. Cambridge Univ Press, 2009. Google ScholarDigital Library
- M. Ranzato, C. Poultney, S. Chopra, and Y. L. Cun. Efficient learning of sparse representations with an energy-based model. In NIPS. 2007.Google Scholar
- S. Reed, K. Sohn, Y. Zhang, and H. Lee. Learning to disentangle factors of variation with manifold interaction. In ICML, 2014.Google ScholarDigital Library
- P. Schulam, F. Wigley, and S. Saria. Clustering longitudinal clinical marker trajectories from electronic health data: Applications to phenotyping and endotype discovery. 2015.Google Scholar
- S. Shimizu, P. O. Hoyer, A. Hyvçrinen, and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. JMLR, 2006. Google ScholarDigital Library
- S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvçrinen, Y. Kawahara, T. Washio, P. O. Hoyer, and K. Bollen. Directlingam: A direct method for learning a linear non-gaussian structural equation model. JMLR, 2011. Google ScholarDigital Library
- I. Silva, G. Moody, D. J. Scott, L. A. Celi, and R. G. Mark. Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. Computing in cardiology, 2012.Google Scholar
- N. Srivastava and R. R. Salakhutdinov. Discriminative transfer learning with tree-based priors. In NIPS, pages 2094--2102, 2013.Google Scholar
- A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. PAMI, 2008. Google ScholarDigital Library
- J. Turian, L. Ratinov, and Y. Bengio. Word representations: A simple and general method for semi-supervised learning. In ACL, 2010. Google ScholarDigital Library
- V. Vapnik. The nature of statistical learning theory. Springer Science & Business Media, 2000. Google ScholarDigital Library
- P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In ICML, 2008. Google ScholarDigital Library
- K. Q. Weinberger, F. Sha, Q. Zhu, and L. K. Saul. Graph laplacian regularization for large-scale semidefinite programming. In NIPS, 2006.Google Scholar
- G. Wu, M. Kim, Q. Wang, Y. Gao, S. Liao, and D. Shen. Unsupervised deep feature learning for deformable registration of mr brain images. In MICCAI. 2013.Google ScholarCross Ref
- R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun. Deep image: Scaling up image recognition. arXiv:1501.02876, 2015.Google Scholar
- T. Xiang, D. Ray, T. Lohrenz, P. Dayan, and P. R. Montague. Computational phenotyping of two-person interactions reveals differential neural response to depth-of-thought. PLoS Comput. Biol., 2012.Google ScholarCross Ref
- T. Zhang, A. Popescul, and B. Dom. Linear prediction models with graph regularization for web-page categorization. In KDD, 2006. Google ScholarDigital Library
- G. Zhou, K. Sohn, and H. Lee. Online incremental feature learning with denoising autoencoders. In AISTATS, 2012.Google Scholar
- J. Zhou, F. Wang, J. Hu, and J. Ye. From micro to macro: Data driven phenotyping by densification of longitudinal electronic medical records. In KDD, 2014. Google ScholarDigital Library
Index Terms
- Deep Computational Phenotyping
Recommendations
GRAM: Graph-based Attention Model for Healthcare Representation Learning
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningDeep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: -
- Data insufficiency: Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to ...
KAME: Knowledge-based Attention Model for Diagnosis Prediction in Healthcare
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementThe goal of diagnosis prediction task is to predict the future health information of patients from their historical Electronic Healthcare Records (EHR). The most important and challenging problem of diagnosis prediction is to design an accurate, robust ...
Patient Subtyping via Time-Aware LSTM Networks
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningIn the study of various diseases, heterogeneity among patients usually leads to different progression patterns and may require different types of therapeutic intervention. Therefore, it is important to study patient subtyping, which is grouping of ...
Comments