Abstract
Representation learning algorithms have recently led to a significant progress in knowledge extraction from network structures. In this paper, a representation learning framework for the medical diagnosis domain is proposed. It is based on a heterogeneous network-based model of diagnostic data combined with an algorithm for learning latent node representation. Furthermore, a modification of metapath2vec algorithm is proposed for representation learning of heterogeneous networks. The proposed algorithm is compared with other representation learning approaches in two practical case studies: symptom/disease classification and disease prediction. A significant performance boost can be observed for these tasks, resulting from learning representations of domain data in a form of a heterogeneous network. It is also shown that in certain situations the modified algorithm improves the quality of learned embeddings compared to reference methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
multimetapath2vec. https://github.com/KarolAntczak/multimetapath2vec. Accessed 04 Feb 2020
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:13013781 Cs (2013)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 701–710. ACM Press (2014). https://doi.org/10.1145/2623330.2623732
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. arXiv:160700653 Cs Stat (2016)
Peng, J., Guan, J., Shang, X.: Predicting Parkinson’s disease genes based on node2vec and autoencoder. Front. Genet. 10 (2019). https://doi.org/10.3389/fgene.2019.00226
Shen, F., et al.: Constructing node embeddings for human phenotype ontology to assist phenotypic similarity measurement. In: 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W), pp. 29–33 (2018). https://doi.org/10.1109/ichi-w.2018.00011
Kim, M., Baek, S.H., Song, M.: Relation extraction for biological pathway construction using node2vec. BMC Bioinform. 19, 206 (2018)
Wu, T., et al.: Representation learning of EHR data via graph-based medical entity embedding. arXiv:191002574 Cs Stat (2019)
Gao, Z., et al.: edge2vec: representation learning using edge semantics for biomedical knowledge discovery. arXiv:180902269 Cs (2019)
Walczak, A., Paczkowski, M.: Medical data preprocessing for increased selectivity of diagnosis. Bio-algorithms Med.-Syst. 12, 39–43 (2016)
Budowa nowoczesnej aplikacji ICT do wsparcia badań naukowych w dziedzinie innowacyjnych metod diagnostyki i leczenia chorób cywilizacyjnych. https://isi.wat.edu.pl/sites/default/files/isi_ver8/proj_POIG.html. Accessed 04 Feb 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Antczak, K. (2020). Representation Learning for Diagnostic Data. In: Saeed, K., Dvorský, J. (eds) Computer Information Systems and Industrial Management. CISIM 2020. Lecture Notes in Computer Science(), vol 12133. Springer, Cham. https://doi.org/10.1007/978-3-030-47679-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-47679-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-47678-6
Online ISBN: 978-3-030-47679-3
eBook Packages: Computer ScienceComputer Science (R0)