skip to main content
10.1145/3107411.3107422acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
Public Access

HEMnet: Integration of Electronic Medical Records with Molecular Interaction Networks and Domain Knowledge for Survival Analysis

Published: 20 August 2017 Publication History


The continual growth of electronic medical record (EMR) databases has paved the way for many data mining applications, including the discovery of novel disease-drug associations and the prediction of patient survival rates. However, these tasks are hindered because EMRs are usually segmented or incomplete. EMR analysis is further limited by the overabundance of medical term synonyms and morphologies, which causes existing techniques to mismatch records containing semantically similar but lexically distinct terms. Current solutions fill in missing values with techniques that tend to introduce noise rather than reduce it. In this paper, we propose to simultaneously infer missing data and solve semantic mismatching in EMRs by first integrating EMR data with molecular interaction networks and domain knowledge to build the HEMnet, a heterogeneous medical information network. We then project this network onto a low-dimensional space, and group entities in the network according to their relative distances. Lastly, we use this entity distance information to enrich the original EMRs. We evaluate the effectiveness of this method according to its ability to separate patients with dissimilar survival functions. We show that our method can obtain significant (p-value < 0.01) results for each cancer subtype in a lung cancer dataset, while the baselines cannot.


The 111th United States Congress. 2009. American Recovery and Reinvestment Act. (2009).
Wenna Bao, Haifeng Pan, Min Lu, Yuan Ni, Rui Zhang, and Xingguo Gong. 2007. The apoptotic effect of sarsasapogenin from Anemarrhena asphodeloides on HepG2 human hepatoma cells. Cell biology international Vol. 31, 9 (2007), 887--892.
David W Bates, Lucian L Leape, David J Cullen, Nan Laird, Laura A Petersen, Jonathan M Teich, Elizabeth Burdick, Mairead Hickey, Sharon Kleefield, Brian Shea, and others 1998. Effect of computerized physician order entry and a team intervention on prevention of serious medication errors. Jama, Vol. 280, 15 (1998), 1311--1316.
David G Beer, Sharon LR Kardia, Chiang-Ching Huang, Thomas J Giordano, Albert M Levin, David E Misek, Lin Lin, Guoan Chen, Tarek G Gharib, Dafydd G Thomas, and others 2002. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature medicine, Vol. 8, 8 (2002), 816--824.
Peter C Burger and Sylvan B Green 1987. Patient age, histologic features, and length of survival in patients with glioblastoma multiforme. Cancer, Vol. 59, 9 (1987), 1617--1625.
Karla Caballero and Ram Akella 2015. Dynamic Estimation of the Probability of Patient Readmission to the ICU using Electronic Medical Records. In AMIA Annual Symposium Proceedings, Vol. Vol. 2015. American Medical Informatics Association, 1831.
Jinpeng Chen, Josiah Poon, Simon K Poon, Ling Xu, and Daniel M Y Sze 2015. Mining symptom-herb patterns from patient records using tripartite graph. Evidence-Based Complementary and Alternative Medicine Vol. 2015 (2015).
Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles, Catherine Coffey, Michael Thompson, James Bost, Javier Tejedor-Sojo, and Jimeng Sun. 2016. Multi-layer representation learning for medical concepts Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1495--1504.
Steven Clavey. 1995. Fluid physiology and pathology in traditional Chinese medicine. Churchill Livingstone.
James R Egner. 2010. AJCC cancer staging manual. JAMA, Vol. 304, 15 (2010), 1726--1727.
Donna Glover, Alan Lipton, Alan Keller, Antonius A Miller, Scott Browning, Robert J Fram, Sebastian George, Kenneth Zelenakas, Richard S Macerata, and John J Seaman. 1994. Intravenous pamidronate disodium treatment of bone metastases in patients with breast cancer. A dose-seeking study. Cancer, Vol. 74, 11 (1994), 2949--2955.
Stephan Gripp, Sibylle Moeller, Edwin Boilke, Gerd Schmitt, Christiane Matuschek, Sonja Asgari, Farzin Asgharzadeh, Stephan Roth, Wilfried Budach, Matthias Franz, and others 2007. Survival prediction in terminally ill cancer patients by clinical estimates, laboratory tests, and self-rated anxiety and depression. Journal of Clinical Oncology Vol. 25, 22 (2007), 3313--3320.
Rave Harpaz, William DuMouchel, Nigam H Shah, David Madigan, Patrick Ryan, and Carol Friedman. 2012. Novel Data-Mining Methodologies for Adverse Drug Event Discovery and Analysis. Clinical Pharmacology & Therapeutics Vol. 91, 6 (2012), 1010--1021.
Richard Hillestad, James Bigelow, Anthony Bower, Federico Girosi, Robin Meili, Richard Scoville, and Roger Taylor 2005. Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. Health affairs, Vol. 24, 5 (2005), 1103--1117.
Lynette Hirschman, Gully AP C Burns, Martin Krallinger, Cecilia Arighi, K Bretonnel Cohen, Alfonso Valencia, Cathy H Wu, Andrew Chatr-Aryamontri, Karen G Dowell, Eva Huala, and others 2012. Text mining for the biocuration workflow. Database Vol. 2012 (2012), bas020.
Edward W Huang, Sheng Wang, Runshun Zhang, Baoyan Liu, Xuezhong Zhou, and ChengXiang Zhai. 2016. PaReCat: Patient Record Subcategorization for Precision Traditional Chinese Medicine Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 443--452.
Jen-Yu Hung, Chih-Jen Yang, Ying-Ming Tsai, Hurng-Wern Huang, and Ming-Shyan Huang 2008. Anti-proliferative activity of Paeoniflorin is through cell cycle arrest and the FAS/FAS ligand-mediated apoptotic pathway in human non-small cell lung cancer A549 cells. Clinical and Experimental Pharmacology and Physiology, Vol. 35, 2 (2008), 141--147.
Edward L Kaplan and Paul Meier 1958. Nonparametric estimation from incomplete observations. Journal of the American statistical association, Vol. 53, 282 (1958), 457--481.
David A Karnofsky, Walter H Abelmann, Lloyd F Craver, and Joseph H Burchenal 1948. The use of the nitrogen mustards in the palliative treatment of carcinoma. With particular reference to bronchogenic carcinoma. Cancer, Vol. 1, 4 (1948), 634--656.
Tae Hwan Kim, Ju Sung Kim, Zoo Haye Kim, Ren Bin Huang, and Ren Sheng Wang 2014. Khz (Fusion of Ganoderma lucidum and Polyporus umbellatus Mycelia) induces apoptosis in A549 human lung cancer cells by generating reactive oxygen species and decreasing the Mitochondrial membrane potential. Food Science and Biotechnology Vol. 23, 3 (2014), 859--864.
Insuk Lee, U Martin Blom, Peggy I Wang, Jung Eun Shim, and Edward M Marcotte 2011. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome research, Vol. 21, 7 (2011), 1109--1121.
Yongjin Li and Jagdish C Patra 2010. Genome-wide inferring gene--phenotype relationship by walking on the heterogeneous network. Bioinformatics, Vol. 26, 9 (2010), 1219--1224.
Yu-Bing Li, Xue-Zhong Zhou, Run-Shun Zhang, Ying-Hui Wang, Yonghong Peng, Jing-Qing Hu, Qi Xie, Yan-Xing Xue, Li-Li Xu, and Xiao-Fang Liu. 2015. Detection of herb-symptom associations from traditional Chinese medicine clinical data. Evidence-Based Complementary and Alternative Medicine Vol. 2015 (2015).
Lu Liu, Jie Tang, Jiawei Han, Meng Jiang, and Shiqiang Yang 2010. Mining topic-level influence in heterogeneous networks Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 199--208.
Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, and Elia El-Darzi. 2006. Healthcare data mining: Prediction inpatient length of stay Intelligent Systems, 2006 3rd International IEEE Conference on. IEEE, 832--837.
Nathan Mantel. 1966. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer chemotherapy reports. Part 1 Vol. 50, 3 (1966), 163--170.
Catherine A McCarty, Rex L Chisholm, Christopher G Chute, Iftikhar J Kullo, Gail P Jarvik, Eric B Larson, Rongling Li, Daniel R Masys, Marylyn D Ritchie, Dan M Roden, and others 2011. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC medical genomics, Vol. 4, 1 (2011), 13.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
Riccardo Miotto, Li Li, Brian A Kidd, and Joel T Dudley. 2016. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Scientific reports Vol. 6 (2016).
Asa J Nixon, Donna Neuberg, Daniel F Hayes, Rebecca Gelman, James L Connolly, Stuart Schnitt, Anthony Abner, Abram Recht, Frank Vicini, and Jay R Harris. 1994. Relationship of patient age to pathologic features of the tumor and prognosis for patients with stage I or II breast cancer. Journal of Clinical Oncology Vol. 12, 5 (1994), 888--894.
Stephen Oliver. 2000. Proteomics: guilt-by-association goes global. Nature, Vol. 403, 6770 (2000), 601--603.
G Miller Rupert and JR Miller 1981. Survival analysis. (1981).
Amit Singhal. 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull., Vol. 24, 4 (2001), 35--43.
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment Vol. 4, 11 (2011), 992--1003.
Damian Szklarczyk, Andrea Franceschini, Stefan Wyder, Kristoffer Forslund, Davide Heller, Jaime Huerta-Cepas, Milan Simonovic, Alexander Roth, Alberto Santos, Kalliopi P Tsafou, and others 2014. STRING v10: protein--protein interaction networks, integrated over the tree of life. Nucleic acids research (2014), gku1003.
Sheng Wang, Hyunghoon Cho, ChengXiang Zhai, Bonnie Berger, and Jian Peng 2015. Exploiting ontology graph for predicting sparsely annotated gene function. Bioinformatics, Vol. 31, 12 (2015), i357--i364.
Sheng Wang, Meng Qu, and Jian Peng 2016. ProSNet: Integrating homology with molecular networks for protein function prediction Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, Vol. Vol. 22. 27.
Yixin Wang, Jan GM Klijn, Yi Zhang, Anieta M Sieuwerts, Maxime P Look, Fei Yang, Dmitri Talantov, Mieke Timmermans, Marion E Meijer-van Gelder, Jack Yu, and others 2005. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. The Lancet, Vol. 365, 9460 (2005), 671--679.
Huiming Xu, Yueming Wang, and Tiansong Zhang 2014. Data Mining of Regularities and Rules of Compound Herbal Formulae for Nonalcoholic Fatty Liver Disease. Chinese Journal of Information on Traditional Chinese Medicine (2014), 21(8):38--41.
Xiao Yu, Quanquan Gu, Mianwei Zhou, and Jiawei Han. 2012. Citation prediction in heterogeneous bibliographic networks Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM, 1119--1130.
Nevin L Zhang, Shihong Yuan, Tao Chen, and Yi Wang. 2008. Latent tree models and diagnosis in traditional Chinese medicine. Artificial intelligence in medicine Vol. 42, 3 (2008), 229--245.
Yunxin Zhang, Qiusheng Wang, Tie Wang, Haikui Zhang, Ying Tian, Hong Luo, Shen Yang, Yuan Wang, and Xun Huang 2012. Inhibition of human gastric carcinoma cell growth in vitro by a polysaccharide from Aster tataricus. International journal of biological macromolecules, Vol. 51, 4 (2012), 509--513.
Ding Zhou, Sergey A Orshanskiy, Hongyuan Zha, and C Lee Giles 2007. Co-ranking authors and documents in a heterogeneous network Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on. IEEE, 739--744.
Xuezhong Zhou, Shibo Chen, Baoyan Liu, Runsun Zhang, Yinghui Wang, Ping Li, Yufeng Guo, Hua Zhang, Zhuye Gao, and Xiufeng Yan. 2010. Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artificial Intelligence in Medicine Vol. 48, 2 (2010), 139--152. endthebibliography

Cited By

View all
  • (2023)Classification Characteristics of COPD Based on Combination of Disease and Syndrome in Real World2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385602(4667-4674)Online publication date: 5-Dec-2023
  • (2022)RESurv: A Deep Survival Analysis Model to Reveal Population Heterogeneity by Individual Risk2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM55620.2022.9995313(2751-2758)Online publication date: 6-Dec-2022

Index Terms

  1. HEMnet: Integration of Electronic Medical Records with Molecular Interaction Networks and Domain Knowledge for Survival Analysis



        Information & Contributors


        Published In

        cover image ACM Conferences
        ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics
        August 2017
        800 pages
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 August 2017


        Request permissions for this article.

        Check for updates

        Author Tags

        1. electronic medical records
        2. heterogeneous information networks
        3. network embedding


        • Research-article

        Funding Sources


        BCB '17

        Acceptance Rates

        ACM-BCB '17 Paper Acceptance Rate 42 of 132 submissions, 32%;
        Overall Acceptance Rate 254 of 885 submissions, 29%


        Other Metrics

        Bibliometrics & Citations


        Article Metrics

        • Downloads (Last 12 months)77
        • Downloads (Last 6 weeks)7
        Reflects downloads up to 02 Mar 2025

        Other Metrics


        Cited By

        View all
        • (2023)Classification Characteristics of COPD Based on Combination of Disease and Syndrome in Real World2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385602(4667-4674)Online publication date: 5-Dec-2023
        • (2022)RESurv: A Deep Survival Analysis Model to Reveal Population Heterogeneity by Individual Risk2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM55620.2022.9995313(2751-2758)Online publication date: 6-Dec-2022

        View Options

        View options


        View or Download as a PDF file.



        View online with eReader.


        Login options






        Share this Publication link

        Share on social media