Abstract
With the rapid proliferation of scientific literature, it has become increasingly impossible for researchers to keep up with all published papers, especially in the biomedical fields with thousands of citations indexed every day. This has created a demand for algorithms to assist in literature search and discovery. A particular case is the literature related to SARS-CoV-2 where a large volume of papers was generated in a short span. As part of the 2021 Smoky Mountains Data Challenge, a COVID-19 knowledge graph constructed using links between concepts and papers from PubMed, Semantic MEDLINE, and CORD-19, was provided for analysis and knowledge mining. In this paper, we analyze this COVID-19 knowledge graph and implement various algorithms to predict as-yet-undiscovered links between concepts, using methods of embedding concepts in Euclidean space followed by link prediction using machine learning algorithms. Three embedding techniques: the Large-scale Information Network Embedding (LINE), the High-Order Proximity-preserved Embedding (HOPE) and the Structural Deep Network Embedding (SDNE) are implemented in conjunction with three machine learning algorithms (logistic regression, random forests, and feed forward neural-networks). We also implement GraphSAGE, another framework for inductive representation on large graphs. Among the methods, we observed that SDNE in conjunction with feed-forward neural network performed the best with an F1 score of 88.0% followed by GraphSAGE with F1 score of 86.3%. The predicted links are ranked using PageRank product to assess the relative importance of predictions. Finally, we visualize the knowledge graphs and predictions to gain insight into the structure of the graph.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
GitHub repository: https://github.com/SaeelPai/GraphVizards2.
References
Landhuis, E.: Scientific literature: information overload. Nature 535, 457–458 (2016). https://doi.org/10.1038/NJ7612-457A
MEDLINE PubMed Production Statistics. https://www.nlm.nih.gov/bsd/medline_pubmed_production_stats.html. Accessed 13 Sept 2021
Herrmannova, D., Kannan, R., Lim, S., Potok, T.E.: Finding novel links in COVID-19 knowledge graph; smoky mountains data challange (2021). https://smc-datachallenge.ornl.gov/2021-challenge-2/. Accessed 13 Sept 2021
Kannan, R., et al.: Scalable knowledge graph analytics at 136 Petaflop/s. In: SC20, pp. 1–13. IEEE (2020)
Swanson, D.R., Smalheiser, N.R.: Artificial intelligence an interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intell. 9, 183–203 (1997)
Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25, 211–230 (2003). https://doi.org/10.1016/S0378-8733(03)00009-1
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab, Stanford (1999)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58, 1019–1031 (2007). https://doi.org/10.1002/ASI.20591
Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and performance: a survey. Knowl.-Based Syst. 151, 78–94 (2018). https://doi.org/10.1016/J.KNOSYS.2018.03.022
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: WWW 2015 - Proceedings 24th of International Conference World Wide Web, pp. 1067–1077 (2015). https://doi.org/10.1145/2736277.2741093
Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: Proceedings 22nd ACM SIGKDD International Conference Knowledge Discovery Data Mining (2016). https://doi.org/10.1145/2939672
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings 22nd ACM SIGKDD International Conference Knowledge Discovery Data Mining (2016). https://doi.org/10.1145/2939672
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: 31st Conference Neural Information Processing Systems (2017)
Graphia—visualisation tool for the creation and analysis of graphs. https://graphia.app/. Accessed 13 Sept 2021
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Patel, A., Pai, S.S., Rajamohan, H.R., Bongarala, M., Samyak, R. (2022). Finding Novel Links in COVID-19 Knowledge Graph Using Graph Embedding Techniques. In: Nichols, J., et al. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation. SMC 2021. Communications in Computer and Information Science, vol 1512. Springer, Cham. https://doi.org/10.1007/978-3-030-96498-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-96498-6_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96497-9
Online ISBN: 978-3-030-96498-6
eBook Packages: Computer ScienceComputer Science (R0)