Skip to main content

Abstract

With the rapid proliferation of scientific literature, it has become increasingly impossible for researchers to keep up with all published papers, especially in the biomedical fields with thousands of citations indexed every day. This has created a demand for algorithms to assist in literature search and discovery. A particular case is the literature related to SARS-CoV-2 where a large volume of papers was generated in a short span. As part of the 2021 Smoky Mountains Data Challenge, a COVID-19 knowledge graph constructed using links between concepts and papers from PubMed, Semantic MEDLINE, and CORD-19, was provided for analysis and knowledge mining. In this paper, we analyze this COVID-19 knowledge graph and implement various algorithms to predict as-yet-undiscovered links between concepts, using methods of embedding concepts in Euclidean space followed by link prediction using machine learning algorithms. Three embedding techniques: the Large-scale Information Network Embedding (LINE), the High-Order Proximity-preserved Embedding (HOPE) and the Structural Deep Network Embedding (SDNE) are implemented in conjunction with three machine learning algorithms (logistic regression, random forests, and feed forward neural-networks). We also implement GraphSAGE, another framework for inductive representation on large graphs. Among the methods, we observed that SDNE in conjunction with feed-forward neural network performed the best with an F1 score of 88.0% followed by GraphSAGE with F1 score of 86.3%. The predicted links are ranked using PageRank product to assess the relative importance of predictions. Finally, we visualize the knowledge graphs and predictions to gain insight into the structure of the graph.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    GitHub repository: https://github.com/SaeelPai/GraphVizards2.

References

  1. Landhuis, E.: Scientific literature: information overload. Nature 535, 457–458 (2016). https://doi.org/10.1038/NJ7612-457A

    Article  Google Scholar 

  2. MEDLINE PubMed Production Statistics. https://www.nlm.nih.gov/bsd/medline_pubmed_production_stats.html. Accessed 13 Sept 2021

  3. Herrmannova, D., Kannan, R., Lim, S., Potok, T.E.: Finding novel links in COVID-19 knowledge graph; smoky mountains data challange (2021). https://smc-datachallenge.ornl.gov/2021-challenge-2/. Accessed 13 Sept 2021

  4. Kannan, R., et al.: Scalable knowledge graph analytics at 136 Petaflop/s. In: SC20, pp. 1–13. IEEE (2020)

    Google Scholar 

  5. Swanson, D.R., Smalheiser, N.R.: Artificial intelligence an interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intell. 9, 183–203 (1997)

    Article  Google Scholar 

  6. Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25, 211–230 (2003). https://doi.org/10.1016/S0378-8733(03)00009-1

    Article  Google Scholar 

  7. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab, Stanford (1999)

    Google Scholar 

  8. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58, 1019–1031 (2007). https://doi.org/10.1002/ASI.20591

    Article  Google Scholar 

  9. Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and performance: a survey. Knowl.-Based Syst. 151, 78–94 (2018). https://doi.org/10.1016/J.KNOSYS.2018.03.022

    Article  Google Scholar 

  10. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: WWW 2015 - Proceedings 24th of International Conference World Wide Web, pp. 1067–1077 (2015). https://doi.org/10.1145/2736277.2741093

  11. Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: Proceedings 22nd ACM SIGKDD International Conference Knowledge Discovery Data Mining (2016). https://doi.org/10.1145/2939672

  12. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings 22nd ACM SIGKDD International Conference Knowledge Discovery Data Mining (2016). https://doi.org/10.1145/2939672

  13. Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: 31st Conference Neural Information Processing Systems (2017)

    Google Scholar 

  14. Graphia—visualisation tool for the creation and analysis of graphs. https://graphia.app/. Accessed 13 Sept 2021

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeel Shrivallabh Pai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Patel, A., Pai, S.S., Rajamohan, H.R., Bongarala, M., Samyak, R. (2022). Finding Novel Links in COVID-19 Knowledge Graph Using Graph Embedding Techniques. In: Nichols, J., et al. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation. SMC 2021. Communications in Computer and Information Science, vol 1512. Springer, Cham. https://doi.org/10.1007/978-3-030-96498-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96498-6_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96497-9

  • Online ISBN: 978-3-030-96498-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics