Abstract
Link prediction finds missing links in static networks or future (or new) links in dynamic networks. Its study is crucial to the analysis of the evolution of networks. In the last decade, lots of works have been presented on link prediction in social networks. Link prediction has been playing a pivotal role in course of analyzing complex networks including social networks, biological networks, etc. In this work, we propose a new approach to link prediction based on level-2 node clustering coefficient. This approach defines the notion of level-2 common node and its corresponding clustering coefficient that extracts clustering information of level-2 common neighbors of the seed node pair and computes the similarity score based on this information. We performed the simulation of the existing methods (i.e. three classical methods viz., common neighbors, resource allocation, preferential attachment, clustering coefficient-based methods (CCLP and NLC), local naive based common neighbor (LNBCN), Cannistrai-Alanis-Ravai (CAR), recent Node2vec method) and the proposed method over 11 real-world network datasets. Accuracy is estimated in terms of four well-known single point summary statistics viz., area under the ROC curve (AUROC), area under the precision-recall curve (AUPR), average precision and recall. The comprehensive experiment on four metric and 11 datasets show the better performance results of the proposed method. The time complexity of the proposed method is also given and is of the order of time required by the existing method CCLP. The statistical test (The Friedman Test) justifies that the proposed method is significantly different from the existing methods in the paper.
Similar content being viewed by others
Notes
References
Liben-Nowell D, Kleinberg J The link-prediction problem for social networks. J Am Soc Inf Sci Technol
Adafre SF, de Rijke M Discovering missing links in wikipedia. In: Proceedings of the 3rd international workshop on link discovery, LinkKDD ’05, pp 90–97
Zhu J, Hong J, Hughes JG Using Markov models for web site link prediction. In: Proceedings of the thirteenth ACM conference on hypertext and hypermedia, HYPERTEXT ’02, pp 169–170
Huang Z, Li X, Chen H Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries, JCDL ’05, pp 141–142
Airodi E, Blei D, Xing E, Fienberg S Mixed membership stochastic block models for relational data, with applications to protein-protein interactions. In: Proceedings of international biometric society-ENAR annual meetings
Newman MEJ (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64:025102. https://doi.org/10.1103/PhysRevE.64.025102
Jaccard P (1901) Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat 37:241–272
Lada A, Adar E (2003) Friends and neighbors on the web. Soc Netw 25:211–230. https://doi.org/10.1016/S0378-8733(03)00009-1
Zhou T, Lu L, Zhang Y-C (2009) Predicting missing links via local information. Europ Phys J B 71:623–630. https://doi.org/10.1140/epjb/e2009-00335-8
Barabasi A, Jeong H, Neda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Physica A Stat Mech Appl 311:590–614. https://doi.org/10.1016/S0378-4371(02)00736-7
Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43
Liu W, Lü L (2010) Link prediction based on local random walk. EPL (Europhys Lett) 89(5):58007. http://stacks.iop.org/0295-5075/89/i=5/a=58007
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on World Wide Web 7, WWW7. http://dl.acm.org/citation.cfm?id=297805.297827. Elsevier Science Publishers B. V., Amsterdam, pp 107–117
Leicht EA, Holme P, Newman MEJ (2006) Vertex similarity in networks. Phys Rev E 73:026120. https://doi.org/10.1103/PhysRevE.73.026120
Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In: Proceedings of the sixth international conference on data mining, ICDM ’06. IEEE Computer Society, Washington, pp 613–622, DOI https://doi.org/10.1109/ICDM.2006.70
Wu Z, Lin Y, Wang J, Gregory S (2016) Link prediction with node clustering coefficient. Physica A: Stat Mech Appl 452:1–8. https://doi.org/10.1016/j.physa.2016.01.038
Liu Y, Zhao C, Wang X, Huang Q, Zhang X, Yi D (2016) The degree-related clustering coefficient and its application to link prediction. Physica A: Stat Mech Appl 454:24–33. https://doi.org/10.1016/j.physa.2016.02.014
Wu Z, Lin Y, Wan H, Jamil W (2016) Predicting top-L missing links with node and link clustering information in large-scale networks. J Stat Mech Theory Exper 8:083202. https://doi.org/10.1088/1742-5468/2016/08/083202
Hasan MA, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: Proc. of SDM 06 workshop on link analysis, counterterrorism and security
Popescul A, Popescul R, Ungar LH (2003) Statistical relational learning for link prediction
Popescul A, Popescul R, Ungar LH (2003) Structural logistic regression for link analysis
Taskar B, Wong M-F, Abbeel P, Koller D (2003) Link prediction in relational data. In: Proceedings of the 16th international conference on neural information processing systems, NIPS’03. MIT Press, Cambridge, pp 659–666. http://dl.acm.org/citation.cfm?id=2981345.2981428
Sarukkai RR (2000) Link prediction and path analysis using Markov chains1. Comput Netw 33(1-6):377–386
Shapiro EY (1983) Algorithmic program debugging. MIT Press, Cambridge
Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707
Nallapati RM, Ahmed A, Xing EP, Cohen WW Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, pp 542–550
Fu W, Song L, Xing EP Dynamic mixed membership blockmodel for evolving networks. In: Proceedings of the 26th annual international conference on machine learning, ICML ’09, pp 329–336
Xu Z, Tresp V, Yu S, Yu K (2008) Nonparametric relational learning for social network analysis. In: KDD’2008 Workshop on social network mining and analysis
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the 14th international conference on neural information processing systems: natural and synthetic, NIPS’01. MIT Press, Cambridge, pp 585–591. http://dl.acm.org/citation.cfm?id=2980539.2980616
Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ACM, New York, pp 855-864, DOI https://doi.org/10.1145/2939672.2939754
Mehran Kazemi S, Poole D SimplE embedding for link prediction in knowledge graphs. arXiv:1802.04868
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14. ACM, New York, pp 701–710, DOI https://doi.org/10.1145/2623330.2623732
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290 (5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323. http://science.sciencemag.org/content/290/5500/2323
Mikolov T, Chen K, Corrado G, Dean J Efficient estimation of word representations in vector space. arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J Distributed representations of words and phrases and their compositionality. arXiv:1310.4546
Tylenda T, Angelova R, Bedathur S Towards time-aware link prediction in evolving social networks. In: Proceedings of the 3rd workshop on social network mining and analysis, SNA-KDD ’09, pp 9:1–9:10
Song HH, Cho TW, Dave V, Zhang Y, Qiu L Scalable proximity estimation and link prediction in online social networks. In: Proceedings of the 9th ACM SIGCOMM conference on internet measurement, IMC ’09, pp 322–335
Acar E, Dunlavy DM, Kolda TG (2009) Link prediction on evolving data using matrix and tensor factorizations. In: 2009 IEEE International conference on data mining workshops, pp 262–269. https://doi.org/10.1109/ICDMW.2009.54
Zan H (2006) Link prediction based on graph topology: the predictive value of generalized clustering coefficient
Cannistraci CV, Alanis-Lobato G, Ravasi T (2013) From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Sci Rep 3:1613. https://doi.org/10.1038/srep01613
Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97. https://doi.org/10.1103/RevModPhys.74.47
Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393(6684):440–442. https://doi.org/10.1038/30918
Kleinberg JM (2000) Navigation in a small world. Nature 406(6798):845
Milgram S (1967) The small world problem. Psychol Today 2:60–67
Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512. https://doi.org/10.1126/science.286.5439.509. http://science.sciencemag.org/content/286/5439/509
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36. https://doi.org/10.1148/radiology.143.1.7063747
Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. ACM, New York, pp 233-240, DOI https://doi.org/10.1145/1143844.1143874
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10:1–21. https://doi.org/10.1371/journal.pone.0118432
Markov NT, Ercsey-Ravasz MM, Ribeiro Gomes AR, Lamy C, Magrou L, Vezoli J, Misery P, Falchier A, Quilodran R, Gariel MA, Sallet J, Gamanut R, Huissoud C, Clavagnier S, Giroud P, Sappey-Marinier D, Barone P, Dehay C, Toroczkai Z, Knoblauch K, Van Essen DC, Kennedy H (2014) A weighted and directed interareal connectivity matrix for macaque cerebral cortex. Cereb Cortex 24 (1):17–36. https://doi.org/10.1093/cercor/bhs270
Girvan MM, Newman EJ (2002) Community structure in social and biological networks. Proc Nat Acad Sci USA 99(12):7821–7826. https://doi.org/10.1073/pnas.122653799
Adamic LA, Glance N (2005) The political blogosphere and the 2004 U.S. election: Divided they blog. In: Proceedings of the 3rd international workshop on link discovery, LinkKDD ’05. ACM, New York, pp 36–43, DOI https://doi.org/10.1145/1134271.1134277
Bu D, Zhao Y, Cai L, Xue H, Zhu X, Lu H, Zhang J, Sun S, Ling L, Zhang N, Li G, Chen R (2003) Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic Acids Res 31(9):2443–2450. https://doi.org/10.1093/nar/gkg340
Šubelj L, Bajec M (2012) Ubiquitousness of link-density and link-pattern communities in real-world networks. Europ Phys J B 85(1):32. https://doi.org/10.1140/epjb/e2011-20448-7
Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Int Math 1:226–251
Ou Q, Jin Y-D, Zhou T, Wang B-H, Yin B-Q (2007) Power-law strength-degree correlation from resource-allocation dynamics on weighted networks. Phys Rev E 75:021102. https://doi.org/10.1103/PhysRevE.75.021102
Liu Z, Zhang Q-M, Lü L, Zhou T (2011) Link prediction in complex networks: a local naïve bayes model. EPL (Europhys Lett) 96(4):48007. http://stacks.iop.org/0295-5075/96/i=4/a=48007
Schank T, Wagner D (2005) Approximating clustering coefficient and transitivity. J Graph Algorithms Appl 9:265–275
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30. http://dl.acm.org/citation.cfm?id=1248547.1248548
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32 (200):675–701. https://doi.org/10.1080/01621459.1937.10503522
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Statist 11(1):86–92. https://doi.org/10.1214/aoms/1177731944
Lü L, Pan L, Zhou T, Zhang Y-C, Stanley HE (2015) Toward link predictability of complex networks. Proc Natl Acad Sci 112(8):2325–2330. https://doi.org/10.1073/pnas.1424644112
Wang X, Sukthankar G (2013) Link prediction in multi-relational collaboration networks. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM ’13. ACM, New YorkA, pp 1445–1447, DOI https://doi.org/10.1145/2492517.2492584
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kumar, A., Singh, S.S., Singh, K. et al. Level-2 node clustering coefficient-based link prediction. Appl Intell 49, 2762–2779 (2019). https://doi.org/10.1007/s10489-019-01413-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01413-8