Skip to main content

On the Utilization of Structural and Textual Information of a Scientific Knowledge Graph to Discover Future Research Collaborations: A Link Prediction Perspective

  • Conference paper
  • First Online:
Discovery Science (DS 2020)

Abstract

We consider the discovery of future research collaborations as a link prediction problem applied on scientific knowledge graphs. Our approach integrates into a single knowledge graph both structured and unstructured textual data through a novel representation of multiple scientific documents. The Neo4j graph database is used for the representation of the proposed scientific knowledge graph. For the implementation of our approach, we use the Python programming language and the scikit-learn ML library. We benchmark our approach against classical link prediction algorithms using accuracy, recall, and precision as our performance metrics. Our initial experimentations demonstrate a significant improvement of the accuracy of the future collaboration prediction task. The experimentations reported in this paper use the new COVID-19 Open Research Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Adamic, L.A., Adar, E.: Friends and neighbors on the Web. Soc. Networks 25, 211–230 (2003)

    Article  Google Scholar 

  • Aggarwal, C.C.: Machine Learning for Text. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73531-3

    Book  MATH  Google Scholar 

  • Albert, R., Barabási, A.: Statistical mechanics of complex networks. ArXiv, cond-mat/0106096 (2001)

    Google Scholar 

  • Arnab, S., Zhihong, S., Yang Song, H.M., Darrin Eide, B.H., Kuansan, W.: An overview of microsoft academic service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web (WWW 2015 Companion), pp. 243–246. ACM, New York (2015)

    Google Scholar 

  • Fire, M., et al.: Link prediction in social networks using computationally efficient topological features. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pp. 73–80 (2011)

    Google Scholar 

  • Giarelis, N., Kanakaris, N., Karacapilidis, N.: An innovative graph-based approach to advance feature selection from multiple textual documents. In: Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2020. IAICT, vol. 583, pp. 96–106. Springer, Cham (2020a). https://doi.org/10.1007/978-3-030-49161-1_9

    Chapter  Google Scholar 

  • Giarelis, N., Kanakaris, N., Karacapilidis, N.: On a novel representation of multiple textual documents in a single graph. In: Czarnowski, I., Howlett, Robert J., Jain, Lakhmi C. (eds.) IDT 2020. SIST, vol. 193, pp. 105–115. Springer, Singapore (2020b). https://doi.org/10.1007/978-981-15-5925-9_9

    Chapter  Google Scholar 

  • Guns, R., Rousseau, R.: Recommending research collaborations using link prediction and random forest classifiers. Scientometrics 101(2), 1461–1473 (2014). https://doi.org/10.1007/s11192-013-1228-9

    Article  Google Scholar 

  • Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In Advances in neural information processing systems, pp. 1024–1034 (2017)

    Google Scholar 

  • Huang, J., Zhuang, Z., Li, J., and Giles, C. L.: Collaboration over time: characterizing and modeling network evolution. In: Proceedings of the 2008 international conference on web search and data mining, pp. 107–116 (2008)

    Google Scholar 

  • Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vandoise Sci Nat 37, 547–579 (1901)

    Google Scholar 

  • Julian, K., Lu, W.: Application of machine learning to link prediction (2016)

    Google Scholar 

  • Kanterakis, A., et al.: Towards reproducible bioinformatics: the OpenBio-C scientific workflow environment. In: Proceedings of the 19th IEEE International Conference on Bioinformatics and Bioengineering (BIBE), Athens, Greece, pp. 221–226 (2019)

    Google Scholar 

  • Li, S., Huang, J., Zhang, Z., Liu, J., Huang, T., Chen, H.: Similarity-based future common neighbors model for link prediction in complex networks. Sci. Rep. 8, 1–11 (2018)

    Article  Google Scholar 

  • Liben-Nowell, D., Kleinberg, J.M.: The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58, 1019–1031 (2007)

    Article  Google Scholar 

  • Manghi, P., et al.: OpenAIRE Research Graph Dump (Version 1.0.0-beta) [Data set]. Zenodo. (2019). http://doi.org/10.5281/zenodo.3516918

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (NeurIPS), pp. 3111–3119 (2013)

    Google Scholar 

  • Nathani, D., Chauhan, J., Sharma, C., Kaul, M.: Learning attention-based embeddings for relation prediction in knowledge graphs. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4710–4723 (2019)

    Google Scholar 

  • Nikolentzos, G., Meladianos, P., Vazirgiannis, M.: Matching node embeddings for graph similarity. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  • Panagopoulos, G., Tsatsaronis, G., Varlamis, I.: Detecting rising stars in dynamic collaborative networks. J. Informetrics 11, 198–222 (2017)

    Article  Google Scholar 

  • Ponomariov, B., Boardman, C.: What is co-authorship? Scientometrics 109(3), 1939–1963 (2016). https://doi.org/10.1007/s11192-016-2127-7

    Article  Google Scholar 

  • Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1, pp. 1702–1712 (2015)

    Google Scholar 

  • Rousseau, F., Vazirgiannis, M.: Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 59–68, ACM Press (2013)

    Google Scholar 

  • Sun, Y., Barber, R., Gupta, M., Aggarwal, C.C., Han, J.: Co-author relationship prediction in heterogeneous bibliographic networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 121–128 IEEE (2011)

    Google Scholar 

  • Vahdati, S., Palma, G., Nath, R.J., Lange, C., Auer, S., Vidal, M.-E.: Unveiling scholarly communities over knowledge graphs. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J.C. (eds.) TPDL 2018. LNCS, vol. 11057, pp. 103–115. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00066-0_9

    Chapter  Google Scholar 

  • Vathy-Fogarassy, Á., Abonyi, J.: Graph-based clustering and data visualization algorithms. Springer, London (2013). https://doi.org/10.1007/978-1-4471-5158-6

    Book  MATH  Google Scholar 

  • Veira, N., Keng, B., Padmanabhan, K., Veneris, A.: Unsupervised embedding enhancements of knowledge graphs using textual associations. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 5218–5225. AAAI Press (2019)

    Google Scholar 

  • Wang, L., et al.: CORD-19: The Covid-19 Open Research Dataset. arXiv preprint arXiv:2004.10706 (2020)

  • Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017)

    Article  Google Scholar 

  • Wang, Z., Li, J., Liu, Z., Tang, J.: Text-enhanced representation learning for knowledge graph. In: Proceedings of International Joint Conference on Artificial Intelligent (IJCAI), pp. 4–17 (2016)

    Google Scholar 

  • Yu, Q., Long, C., Lv, Y., Shao, H., He, P., Duan, Z.: Predicting co-author relationship in medical co-authorship networks. PLoS ONE 9(7), 101214 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

The work presented in this paper is supported by the OpenBio-C project (www.openbio.eu), which is co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH – CREATE – INNOVATE (Project id: T1EDK- 05275). The authors would also like to thank Stamatis Karlos for his assistance with the statistical analysis of the data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikos Karacapilidis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Giarelis, N., Kanakaris, N., Karacapilidis, N. (2020). On the Utilization of Structural and Textual Information of a Scientific Knowledge Graph to Discover Future Research Collaborations: A Link Prediction Perspective. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds) Discovery Science. DS 2020. Lecture Notes in Computer Science(), vol 12323. Springer, Cham. https://doi.org/10.1007/978-3-030-61527-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61527-7_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61526-0

  • Online ISBN: 978-3-030-61527-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics