Skip to main content
Log in

Link prediction in complex networks using node centrality and light gradient boosting machine

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Link prediction is amongst the most crucial tasks in network science and graph data analytics. Given the snapshot of a network at a particular instance of time, the study of link prediction pertains to predicting possible future links amongst the nodes of the networks. It finds applications in recommender systems, traffic prediction in networks, biological interactions, and many others. In this paper, we propose a novel generic approach to link prediction based on using various node centralities and different machine learning classifiers. We utilize some popular and recently introduced node centralities to capture better the network’s local, quasi-local and global structure. The value of various node centralities acts as the feature labels for the nodes in the network. The existent and non-existent edge in the network is labeled as positive and negative samples, respectively. The features of the nodes at the end of the edges, along with the positive or negative label, form a well-defined dataset for the task of link prediction. The dataset is then fed into various machine learning classifiers, and the best results are obtained with Light Gradient Boosted Machine (LGBM) classifier. We investigate the performance of the proposed model on multiple real-life networks using various performance metrics and reveal that our approach outperforms many popular and recently proposed link prediction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Adamic, LA, Adar, E: Friends and neighbors on the Web. Soc Netw 25(3), 211–30 (2003)

    Article  Google Scholar 

  2. Barabási, AL, Albert, R: Emergence of scaling in random networks. Science 286(5439), 509–12 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bel, L, Allard, D, Laurent, JM, Cheddadi, R, Bar-Hen, A: CART algorithm for spatial data: application to environmental and ecological data. Comput Stat Data Anal 53(8), 3082–3093 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bergstra, J, Bengio, Y: Random search for hyper-parameter optimization. J Mach Learn Res 13, 2 (2012)

    MathSciNet  MATH  Google Scholar 

  5. Bonchi, F, Castillo, C, Gionis, A, Jaimes, A: Social network analysis and mining for business applications. ACM Trans Intell Syst Technol (TIST) 2(3), 1–37 (2011)

    Article  Google Scholar 

  6. Brin, S, Page, L: The anatomy of a large-scale hypertextual Web search engine. Comput Netw and ISDN Syst 30(1-7), 107–17 (1998)

    Article  Google Scholar 

  7. Chen, J, Geyer, W, Dugan, C, Muller, M, Guy, I: Make new friends, but keep the old: recommending people on social networking sites. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 201–210 (2009)

  8. Chuan, PM, Ali, M, Khang, TD, Dey, N: Link prediction in co-authorship networks based on hybrid content similarity metric. Appl Intell 48(8), 2470–2486 (2018)

    Article  Google Scholar 

  9. Conover, MD, Ratkiewicz, J, Francisco, M, Gonçalves, B, Menczer, F, Flammini, A: Political polarization on Twitter. ICWSM (2011)

  10. Dai, C, Chen, L, Li, B: Network link prediction based on direct optimization of area under curve. Appl Intell 46(2), 427–437 (2017)

    Article  Google Scholar 

  11. De Domenico, M, Lima, A, Mougel, P, Musolesi, M: The anatomy of a scientific rumor. Sci Rep 3(1), 1–9 (2013)

    Google Scholar 

  12. Eirinaki, M, Gao, J, Varlamis, I, Tserpes, K: Recommender systems for large-scale social networks: a review of challenges and solutions. Futur Gener Comput Syst 78(1), 413–418 (2018)

    Article  Google Scholar 

  13. Friedman, JH: Stochastic gradient boosting. Comput Stat Data Anal 38(4), 367–78 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  14. Friedman, N, Getoor, L, Koller, D, Pfeffer, A: Learning probabilistic relational models. In IJCAI 99, 1300–1309 (1999)

    Google Scholar 

  15. Grover, A, Leskovec, J: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 855–864 (2016)

  16. Haghani, S, Keyvanpour, MR: A systemic analysis of link prediction in social network. Artif Intell Rev 52(3), 1961–1995 (2019)

    Article  Google Scholar 

  17. Hamilton, WL, Ying, R, Leskovec, J: Inductive representation learning on large graphs. IN: Proceedings of the 31st International conference on neural information processing systems, pp 1025–1035 (2017)

  18. Heckerman, D, Meek, C, Koller, D: Probabilistic entity-relationship models, PRMs, and plate models. In: Proceedings of the 21st international conference on machine learning. Banff, p 55 (2004)

  19. Ibrahim, NMA, Chen, L: Link prediction in dynamic social networks by integrating different types of information. Appl Intell 42(4), 738–750 (2015)

    Article  Google Scholar 

  20. Jaccard, P: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat 37, 547–79 (1901)

    Google Scholar 

  21. Jeh, G, Widom, J: Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 538–543 (2002)

  22. Katz, L: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)

    Article  MATH  Google Scholar 

  23. Ke, G, Meng, Q, Finley, T, Wang, T, Chen, W, Ma, W, Ye, Q, Liu, TY: Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inform Process Syst 30, 3146–54 (2017)

    Google Scholar 

  24. Kelsic, E, Mucha, P, Porter, M: Comparing community structure to characteristics in online collegiate social networks. SIAM Rev 53,10, 1137/080734315 (2008)

    Google Scholar 

  25. Kleinberg, JM: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5) (1999)

  26. Kumar, S, Lohia, D, Pratap, D, Krishna, A, Panda, BS: MDER: modified degree with exclusion ratio algorithm for influence maximisation in social networks. Computing, 1–24 (2021)

  27. Kumar, S, Panda, BS, Aggarwal, D: Community detection in complex networks using network embedding and gravitational search algorithm. J Intell Inform Syst 57, 51–72 (2020)

    Article  Google Scholar 

  28. Kumar, A, Singh, SS, Singh, K, Biswas, B: Link prediction techniques, applications, and performance: a survey. Physica A: Stat Mech Applic 553, 124289 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  29. Kumar, S, Singhla, L, Jindal, K, Grover, K, Panda, BS: IM-ELPR: influence maximization in social networks using label propagation based community structure. Applied Intelligence, 1–19 (2021)

  30. Leskovec, J, Huttenlocher, D, Kleinberg, J: Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World Wide Web, pp 641–650 (2010)

  31. Leskovec, J, Kleinberg, J, Faloutsos, C: Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data (TKDD) 1, 1 (2007)

    Google Scholar 

  32. Li, K, Tu, L, Chai, L: Ensemble-model-based link prediction of complex networks. Comput Netw 166, 106978 (2020)

    Article  Google Scholar 

  33. Liu, W, Lü, L: Link prediction based on local random walk. EPL (Europhysics Letters) 89(5), 58007 (2010)

    Article  Google Scholar 

  34. Lü, L, Jin, CH, Zhou, T: Similarity index based on local paths for link prediction of complex networks. Phys Rev E 80(4), 046122 (2009)

    Article  Google Scholar 

  35. Martínez, V, Berzal, F, Cubero, JC: A survey of link prediction in complex networks. ACM Comput Surv (CSUR) 49(4), 1–33 (2016)

    Article  Google Scholar 

  36. Mutlu, EC, Oghaz, TA: Review on graph feature learning and feature extraction techniques for link prediction. arXiv:1901.03425 (2019)

  37. Ng, AY, Jordan, MI, Weiss, Y: On spectral clustering: analysis and an algorithm. Adv Neural Inform Process Syst 2, 849–856 (2002)

    Google Scholar 

  38. Pan, JY, Yang, HJ, Faloutsos, C, Duygulu, P: Automatic multimedia cross-modal correlation discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 653–658 (2004)

  39. Pandey, B, Bhanodia, PK, Khamparia, A, Pandey, DK: A comprehensive survey of edge prediction in social networks: techniques, parameters and challenges. Exp Syst Applic 124, 164–81 (2019)

    Article  Google Scholar 

  40. Perozzi, B, Al-Rfou, R, Skiena, S: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710 (2014)

  41. Qi, Y, Luo, J: Prediction of essential proteins based on local interaction density. IEEE/ACM Trans Comput Biol Bioinform 13(6), 1170–1182 (2015)

    Article  Google Scholar 

  42. Qiu, J, Tang, J, Ma, H, Dong, Y, Wang, K, Tang, J: Deepinf: social influence prediction with deep learning. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2110–2119 (2018)

  43. Richardson, M, Agrawal, R, Domingos, P: Trust management for the semantic Web. In: International semantic Web conference, pp 351–368. Springer, Berlin (2003)

  44. Riquelme, F, Gonzalez-Cantergiani, P, Molinero, X, Serna, M: Centrality measure in social networks based on linear threshold model. Knowl-Based Syst 140, 92–102 (2018)

    Article  MATH  Google Scholar 

  45. Rossi, R, Ahmed, N: The network data repository with interactive graph analytics and visualization. Proceedings of the AAAI Conference on Artificial Intelligence 29, 1 (2015)

    Article  Google Scholar 

  46. Rozemberczki, B, Davies, R, Sarkar, R, Sutton, C: Gemsec: graph embedding with self clustering. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp 65–72 (2019)

  47. Salton, G, McGill, MJ: Introduction to modern information retrieval. Mcgraw-Hill (1983)

  48. Scardoni, G, Petterlini, M, Laudanna, C: Analyzing biological network parameters with CentiScaPe. Bioinformatics 25(21), 2857–2859 (2009)

    Article  Google Scholar 

  49. Van Mieghem, P: Performance analysis of complex networks and systems. Cambridge University Press (2014)

  50. Wang, Z, Liao, J, Cao, Q, Qi, H, Wang, Z: Friendbook: a semantic-based friend recommendation system for social networks. IEEE Trans Mobile Comput 14(3), 538–551 (2014)

    Article  Google Scholar 

  51. Wang, H, Wang, J, Wang, J, Zhao, M, Zhang, W, Zhang, F, Xie, X, Guo, M: Graphgan: graph representation learning with generative adversarial nets. In Proceedings of the AAAI conference on artificial intelligence, vol 32(1) (2018)

  52. Wang, W, Wu, L, Huang, Y, Wang, H, Zhu, R: Link prediction based on deep convolutional neural network. Information 10(5), 172 (2019)

    Article  Google Scholar 

  53. Wu, J, Shen, J, Zhou, B, Zhang, X, Huang, B: General link prediction with influential node identification. Physica A: Stat Mech Applic 523, 996–1007 (2019)

    Article  Google Scholar 

  54. Yu, K, Chu, W, Yu, S, Tresp, V, Xu, Z: Stochastic relational models for discriminative link prediction. In NIPS 6, 1553–1560 (2006)

    Google Scholar 

  55. Zhang, M, Chen, Y: Weisfeiler-Lehman neural machine for link prediction. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 575–583 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjay Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Computational Aspects of Network Science Guest Editors: Apostolos N. Papadopoulos and Richard Chbeir

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, S., Mallik, A. & Panda, B.S. Link prediction in complex networks using node centrality and light gradient boosting machine. World Wide Web 25, 2487–2513 (2022). https://doi.org/10.1007/s11280-021-01000-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-01000-3

Keywords

Navigation