Abstract
Link prediction is amongst the most crucial tasks in network science and graph data analytics. Given the snapshot of a network at a particular instance of time, the study of link prediction pertains to predicting possible future links amongst the nodes of the networks. It finds applications in recommender systems, traffic prediction in networks, biological interactions, and many others. In this paper, we propose a novel generic approach to link prediction based on using various node centralities and different machine learning classifiers. We utilize some popular and recently introduced node centralities to capture better the network’s local, quasi-local and global structure. The value of various node centralities acts as the feature labels for the nodes in the network. The existent and non-existent edge in the network is labeled as positive and negative samples, respectively. The features of the nodes at the end of the edges, along with the positive or negative label, form a well-defined dataset for the task of link prediction. The dataset is then fed into various machine learning classifiers, and the best results are obtained with Light Gradient Boosted Machine (LGBM) classifier. We investigate the performance of the proposed model on multiple real-life networks using various performance metrics and reveal that our approach outperforms many popular and recently proposed link prediction methods.
Similar content being viewed by others
References
Adamic, LA, Adar, E: Friends and neighbors on the Web. Soc Netw 25(3), 211–30 (2003)
Barabási, AL, Albert, R: Emergence of scaling in random networks. Science 286(5439), 509–12 (1999)
Bel, L, Allard, D, Laurent, JM, Cheddadi, R, Bar-Hen, A: CART algorithm for spatial data: application to environmental and ecological data. Comput Stat Data Anal 53(8), 3082–3093 (2009)
Bergstra, J, Bengio, Y: Random search for hyper-parameter optimization. J Mach Learn Res 13, 2 (2012)
Bonchi, F, Castillo, C, Gionis, A, Jaimes, A: Social network analysis and mining for business applications. ACM Trans Intell Syst Technol (TIST) 2(3), 1–37 (2011)
Brin, S, Page, L: The anatomy of a large-scale hypertextual Web search engine. Comput Netw and ISDN Syst 30(1-7), 107–17 (1998)
Chen, J, Geyer, W, Dugan, C, Muller, M, Guy, I: Make new friends, but keep the old: recommending people on social networking sites. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 201–210 (2009)
Chuan, PM, Ali, M, Khang, TD, Dey, N: Link prediction in co-authorship networks based on hybrid content similarity metric. Appl Intell 48(8), 2470–2486 (2018)
Conover, MD, Ratkiewicz, J, Francisco, M, Gonçalves, B, Menczer, F, Flammini, A: Political polarization on Twitter. ICWSM (2011)
Dai, C, Chen, L, Li, B: Network link prediction based on direct optimization of area under curve. Appl Intell 46(2), 427–437 (2017)
De Domenico, M, Lima, A, Mougel, P, Musolesi, M: The anatomy of a scientific rumor. Sci Rep 3(1), 1–9 (2013)
Eirinaki, M, Gao, J, Varlamis, I, Tserpes, K: Recommender systems for large-scale social networks: a review of challenges and solutions. Futur Gener Comput Syst 78(1), 413–418 (2018)
Friedman, JH: Stochastic gradient boosting. Comput Stat Data Anal 38(4), 367–78 (2002)
Friedman, N, Getoor, L, Koller, D, Pfeffer, A: Learning probabilistic relational models. In IJCAI 99, 1300–1309 (1999)
Grover, A, Leskovec, J: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 855–864 (2016)
Haghani, S, Keyvanpour, MR: A systemic analysis of link prediction in social network. Artif Intell Rev 52(3), 1961–1995 (2019)
Hamilton, WL, Ying, R, Leskovec, J: Inductive representation learning on large graphs. IN: Proceedings of the 31st International conference on neural information processing systems, pp 1025–1035 (2017)
Heckerman, D, Meek, C, Koller, D: Probabilistic entity-relationship models, PRMs, and plate models. In: Proceedings of the 21st international conference on machine learning. Banff, p 55 (2004)
Ibrahim, NMA, Chen, L: Link prediction in dynamic social networks by integrating different types of information. Appl Intell 42(4), 738–750 (2015)
Jaccard, P: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat 37, 547–79 (1901)
Jeh, G, Widom, J: Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 538–543 (2002)
Katz, L: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)
Ke, G, Meng, Q, Finley, T, Wang, T, Chen, W, Ma, W, Ye, Q, Liu, TY: Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inform Process Syst 30, 3146–54 (2017)
Kelsic, E, Mucha, P, Porter, M: Comparing community structure to characteristics in online collegiate social networks. SIAM Rev 53,10, 1137/080734315 (2008)
Kleinberg, JM: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5) (1999)
Kumar, S, Lohia, D, Pratap, D, Krishna, A, Panda, BS: MDER: modified degree with exclusion ratio algorithm for influence maximisation in social networks. Computing, 1–24 (2021)
Kumar, S, Panda, BS, Aggarwal, D: Community detection in complex networks using network embedding and gravitational search algorithm. J Intell Inform Syst 57, 51–72 (2020)
Kumar, A, Singh, SS, Singh, K, Biswas, B: Link prediction techniques, applications, and performance: a survey. Physica A: Stat Mech Applic 553, 124289 (2020)
Kumar, S, Singhla, L, Jindal, K, Grover, K, Panda, BS: IM-ELPR: influence maximization in social networks using label propagation based community structure. Applied Intelligence, 1–19 (2021)
Leskovec, J, Huttenlocher, D, Kleinberg, J: Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World Wide Web, pp 641–650 (2010)
Leskovec, J, Kleinberg, J, Faloutsos, C: Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data (TKDD) 1, 1 (2007)
Li, K, Tu, L, Chai, L: Ensemble-model-based link prediction of complex networks. Comput Netw 166, 106978 (2020)
Liu, W, Lü, L: Link prediction based on local random walk. EPL (Europhysics Letters) 89(5), 58007 (2010)
Lü, L, Jin, CH, Zhou, T: Similarity index based on local paths for link prediction of complex networks. Phys Rev E 80(4), 046122 (2009)
Martínez, V, Berzal, F, Cubero, JC: A survey of link prediction in complex networks. ACM Comput Surv (CSUR) 49(4), 1–33 (2016)
Mutlu, EC, Oghaz, TA: Review on graph feature learning and feature extraction techniques for link prediction. arXiv:1901.03425 (2019)
Ng, AY, Jordan, MI, Weiss, Y: On spectral clustering: analysis and an algorithm. Adv Neural Inform Process Syst 2, 849–856 (2002)
Pan, JY, Yang, HJ, Faloutsos, C, Duygulu, P: Automatic multimedia cross-modal correlation discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 653–658 (2004)
Pandey, B, Bhanodia, PK, Khamparia, A, Pandey, DK: A comprehensive survey of edge prediction in social networks: techniques, parameters and challenges. Exp Syst Applic 124, 164–81 (2019)
Perozzi, B, Al-Rfou, R, Skiena, S: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710 (2014)
Qi, Y, Luo, J: Prediction of essential proteins based on local interaction density. IEEE/ACM Trans Comput Biol Bioinform 13(6), 1170–1182 (2015)
Qiu, J, Tang, J, Ma, H, Dong, Y, Wang, K, Tang, J: Deepinf: social influence prediction with deep learning. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2110–2119 (2018)
Richardson, M, Agrawal, R, Domingos, P: Trust management for the semantic Web. In: International semantic Web conference, pp 351–368. Springer, Berlin (2003)
Riquelme, F, Gonzalez-Cantergiani, P, Molinero, X, Serna, M: Centrality measure in social networks based on linear threshold model. Knowl-Based Syst 140, 92–102 (2018)
Rossi, R, Ahmed, N: The network data repository with interactive graph analytics and visualization. Proceedings of the AAAI Conference on Artificial Intelligence 29, 1 (2015)
Rozemberczki, B, Davies, R, Sarkar, R, Sutton, C: Gemsec: graph embedding with self clustering. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp 65–72 (2019)
Salton, G, McGill, MJ: Introduction to modern information retrieval. Mcgraw-Hill (1983)
Scardoni, G, Petterlini, M, Laudanna, C: Analyzing biological network parameters with CentiScaPe. Bioinformatics 25(21), 2857–2859 (2009)
Van Mieghem, P: Performance analysis of complex networks and systems. Cambridge University Press (2014)
Wang, Z, Liao, J, Cao, Q, Qi, H, Wang, Z: Friendbook: a semantic-based friend recommendation system for social networks. IEEE Trans Mobile Comput 14(3), 538–551 (2014)
Wang, H, Wang, J, Wang, J, Zhao, M, Zhang, W, Zhang, F, Xie, X, Guo, M: Graphgan: graph representation learning with generative adversarial nets. In Proceedings of the AAAI conference on artificial intelligence, vol 32(1) (2018)
Wang, W, Wu, L, Huang, Y, Wang, H, Zhu, R: Link prediction based on deep convolutional neural network. Information 10(5), 172 (2019)
Wu, J, Shen, J, Zhou, B, Zhang, X, Huang, B: General link prediction with influential node identification. Physica A: Stat Mech Applic 523, 996–1007 (2019)
Yu, K, Chu, W, Yu, S, Tresp, V, Xu, Z: Stochastic relational models for discriminative link prediction. In NIPS 6, 1553–1560 (2006)
Zhang, M, Chen, Y: Weisfeiler-Lehman neural machine for link prediction. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 575–583 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Computational Aspects of Network Science Guest Editors: Apostolos N. Papadopoulos and Richard Chbeir
Rights and permissions
About this article
Cite this article
Kumar, S., Mallik, A. & Panda, B.S. Link prediction in complex networks using node centrality and light gradient boosting machine. World Wide Web 25, 2487–2513 (2022). https://doi.org/10.1007/s11280-021-01000-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-021-01000-3