Skip to main content
Log in

Level-2 node clustering coefficient-based link prediction

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Link prediction finds missing links in static networks or future (or new) links in dynamic networks. Its study is crucial to the analysis of the evolution of networks. In the last decade, lots of works have been presented on link prediction in social networks. Link prediction has been playing a pivotal role in course of analyzing complex networks including social networks, biological networks, etc. In this work, we propose a new approach to link prediction based on level-2 node clustering coefficient. This approach defines the notion of level-2 common node and its corresponding clustering coefficient that extracts clustering information of level-2 common neighbors of the seed node pair and computes the similarity score based on this information. We performed the simulation of the existing methods (i.e. three classical methods viz., common neighbors, resource allocation, preferential attachment, clustering coefficient-based methods (CCLP and NLC), local naive based common neighbor (LNBCN), Cannistrai-Alanis-Ravai (CAR), recent Node2vec method) and the proposed method over 11 real-world network datasets. Accuracy is estimated in terms of four well-known single point summary statistics viz., area under the ROC curve (AUROC), area under the precision-recall curve (AUPR), average precision and recall. The comprehensive experiment on four metric and 11 datasets show the better performance results of the proposed method. The time complexity of the proposed method is also given and is of the order of time required by the existing method CCLP. The statistical test (The Friedman Test) justifies that the proposed method is significantly different from the existing methods in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://sanchom.wordpress.com/tag/average-precision/

  2. https://ils.unc.edu/courses/2013_spring/inls509_001/lectures/10-EvaluationMetrics.pdf

  3. https://neurodata.io/project/connectomes/

  4. http://www-personal.umich.edu/~mejn/netdata/

  5. http://vlado.fmf.uni-lj.si/pub/networks/data/

  6. https://icon.colorado.edu/#!/networks

  7. https://snap.stanford.edu/data/

References

  1. Liben-Nowell D, Kleinberg J The link-prediction problem for social networks. J Am Soc Inf Sci Technol

  2. Adafre SF, de Rijke M Discovering missing links in wikipedia. In: Proceedings of the 3rd international workshop on link discovery, LinkKDD ’05, pp 90–97

  3. Zhu J, Hong J, Hughes JG Using Markov models for web site link prediction. In: Proceedings of the thirteenth ACM conference on hypertext and hypermedia, HYPERTEXT ’02, pp 169–170

  4. Huang Z, Li X, Chen H Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries, JCDL ’05, pp 141–142

  5. Airodi E, Blei D, Xing E, Fienberg S Mixed membership stochastic block models for relational data, with applications to protein-protein interactions. In: Proceedings of international biometric society-ENAR annual meetings

  6. Newman MEJ (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64:025102. https://doi.org/10.1103/PhysRevE.64.025102

    Article  Google Scholar 

  7. Jaccard P (1901) Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat 37:241–272

    Google Scholar 

  8. Lada A, Adar E (2003) Friends and neighbors on the web. Soc Netw 25:211–230. https://doi.org/10.1016/S0378-8733(03)00009-1

    Article  Google Scholar 

  9. Zhou T, Lu L, Zhang Y-C (2009) Predicting missing links via local information. Europ Phys J B 71:623–630. https://doi.org/10.1140/epjb/e2009-00335-8

    Article  MATH  Google Scholar 

  10. Barabasi A, Jeong H, Neda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Physica A Stat Mech Appl 311:590–614. https://doi.org/10.1016/S0378-4371(02)00736-7

    Article  MathSciNet  MATH  Google Scholar 

  11. Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43

    Article  MATH  Google Scholar 

  12. Liu W, Lü L (2010) Link prediction based on local random walk. EPL (Europhys Lett) 89(5):58007. http://stacks.iop.org/0295-5075/89/i=5/a=58007

    Article  Google Scholar 

  13. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on World Wide Web 7, WWW7. http://dl.acm.org/citation.cfm?id=297805.297827. Elsevier Science Publishers B. V., Amsterdam, pp 107–117

  14. Leicht EA, Holme P, Newman MEJ (2006) Vertex similarity in networks. Phys Rev E 73:026120. https://doi.org/10.1103/PhysRevE.73.026120

    Article  Google Scholar 

  15. Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In: Proceedings of the sixth international conference on data mining, ICDM ’06. IEEE Computer Society, Washington, pp 613–622, DOI https://doi.org/10.1109/ICDM.2006.70

  16. Wu Z, Lin Y, Wang J, Gregory S (2016) Link prediction with node clustering coefficient. Physica A: Stat Mech Appl 452:1–8. https://doi.org/10.1016/j.physa.2016.01.038

    Article  Google Scholar 

  17. Liu Y, Zhao C, Wang X, Huang Q, Zhang X, Yi D (2016) The degree-related clustering coefficient and its application to link prediction. Physica A: Stat Mech Appl 454:24–33. https://doi.org/10.1016/j.physa.2016.02.014

    Article  Google Scholar 

  18. Wu Z, Lin Y, Wan H, Jamil W (2016) Predicting top-L missing links with node and link clustering information in large-scale networks. J Stat Mech Theory Exper 8:083202. https://doi.org/10.1088/1742-5468/2016/08/083202

    Article  Google Scholar 

  19. Hasan MA, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: Proc. of SDM 06 workshop on link analysis, counterterrorism and security

  20. Popescul A, Popescul R, Ungar LH (2003) Statistical relational learning for link prediction

  21. Popescul A, Popescul R, Ungar LH (2003) Structural logistic regression for link analysis

  22. Taskar B, Wong M-F, Abbeel P, Koller D (2003) Link prediction in relational data. In: Proceedings of the 16th international conference on neural information processing systems, NIPS’03. MIT Press, Cambridge, pp 659–666. http://dl.acm.org/citation.cfm?id=2981345.2981428

  23. Sarukkai RR (2000) Link prediction and path analysis using Markov chains1. Comput Netw 33(1-6):377–386

    Article  Google Scholar 

  24. Shapiro EY (1983) Algorithmic program debugging. MIT Press, Cambridge

    MATH  Google Scholar 

  25. Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707

    MathSciNet  MATH  Google Scholar 

  26. Nallapati RM, Ahmed A, Xing EP, Cohen WW Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, pp 542–550

  27. Fu W, Song L, Xing EP Dynamic mixed membership blockmodel for evolving networks. In: Proceedings of the 26th annual international conference on machine learning, ICML ’09, pp 329–336

  28. Xu Z, Tresp V, Yu S, Yu K (2008) Nonparametric relational learning for social network analysis. In: KDD’2008 Workshop on social network mining and analysis

  29. Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the 14th international conference on neural information processing systems: natural and synthetic, NIPS’01. MIT Press, Cambridge, pp 585–591. http://dl.acm.org/citation.cfm?id=2980539.2980616

  30. Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ACM, New York, pp 855-864, DOI https://doi.org/10.1145/2939672.2939754

  31. Mehran Kazemi S, Poole D SimplE embedding for link prediction in knowledge graphs. arXiv:1802.04868

  32. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14. ACM, New York, pp 701–710, DOI https://doi.org/10.1145/2623330.2623732

  33. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290 (5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323. http://science.sciencemag.org/content/290/5500/2323

    Article  Google Scholar 

  34. Mikolov T, Chen K, Corrado G, Dean J Efficient estimation of word representations in vector space. arXiv:1301.3781

  35. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J Distributed representations of words and phrases and their compositionality. arXiv:1310.4546

  36. Tylenda T, Angelova R, Bedathur S Towards time-aware link prediction in evolving social networks. In: Proceedings of the 3rd workshop on social network mining and analysis, SNA-KDD ’09, pp 9:1–9:10

  37. Song HH, Cho TW, Dave V, Zhang Y, Qiu L Scalable proximity estimation and link prediction in online social networks. In: Proceedings of the 9th ACM SIGCOMM conference on internet measurement, IMC ’09, pp 322–335

  38. Acar E, Dunlavy DM, Kolda TG (2009) Link prediction on evolving data using matrix and tensor factorizations. In: 2009 IEEE International conference on data mining workshops, pp 262–269. https://doi.org/10.1109/ICDMW.2009.54

  39. Zan H (2006) Link prediction based on graph topology: the predictive value of generalized clustering coefficient

  40. Cannistraci CV, Alanis-Lobato G, Ravasi T (2013) From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Sci Rep 3:1613. https://doi.org/10.1038/srep01613

    Article  Google Scholar 

  41. Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97. https://doi.org/10.1103/RevModPhys.74.47

    Article  MathSciNet  MATH  Google Scholar 

  42. Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393(6684):440–442. https://doi.org/10.1038/30918

    Article  MATH  Google Scholar 

  43. Kleinberg JM (2000) Navigation in a small world. Nature 406(6798):845

    Article  Google Scholar 

  44. Milgram S (1967) The small world problem. Psychol Today 2:60–67

    Google Scholar 

  45. Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512. https://doi.org/10.1126/science.286.5439.509. http://science.sciencemag.org/content/286/5439/509

    Article  MathSciNet  MATH  Google Scholar 

  46. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  47. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36. https://doi.org/10.1148/radiology.143.1.7063747

    Article  Google Scholar 

  48. Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010

    Article  MathSciNet  Google Scholar 

  49. Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. ACM, New York, pp 233-240, DOI https://doi.org/10.1145/1143844.1143874

  50. Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10:1–21. https://doi.org/10.1371/journal.pone.0118432

    Google Scholar 

  51. Markov NT, Ercsey-Ravasz MM, Ribeiro Gomes AR, Lamy C, Magrou L, Vezoli J, Misery P, Falchier A, Quilodran R, Gariel MA, Sallet J, Gamanut R, Huissoud C, Clavagnier S, Giroud P, Sappey-Marinier D, Barone P, Dehay C, Toroczkai Z, Knoblauch K, Van Essen DC, Kennedy H (2014) A weighted and directed interareal connectivity matrix for macaque cerebral cortex. Cereb Cortex 24 (1):17–36. https://doi.org/10.1093/cercor/bhs270

    Article  Google Scholar 

  52. Girvan MM, Newman EJ (2002) Community structure in social and biological networks. Proc Nat Acad Sci USA 99(12):7821–7826. https://doi.org/10.1073/pnas.122653799

    Article  MathSciNet  MATH  Google Scholar 

  53. Adamic LA, Glance N (2005) The political blogosphere and the 2004 U.S. election: Divided they blog. In: Proceedings of the 3rd international workshop on link discovery, LinkKDD ’05. ACM, New York, pp 36–43, DOI https://doi.org/10.1145/1134271.1134277

  54. Bu D, Zhao Y, Cai L, Xue H, Zhu X, Lu H, Zhang J, Sun S, Ling L, Zhang N, Li G, Chen R (2003) Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic Acids Res 31(9):2443–2450. https://doi.org/10.1093/nar/gkg340

    Article  Google Scholar 

  55. Šubelj L, Bajec M (2012) Ubiquitousness of link-density and link-pattern communities in real-world networks. Europ Phys J B 85(1):32. https://doi.org/10.1140/epjb/e2011-20448-7

    Article  Google Scholar 

  56. Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Int Math 1:226–251

    MathSciNet  MATH  Google Scholar 

  57. Ou Q, Jin Y-D, Zhou T, Wang B-H, Yin B-Q (2007) Power-law strength-degree correlation from resource-allocation dynamics on weighted networks. Phys Rev E 75:021102. https://doi.org/10.1103/PhysRevE.75.021102

    Article  Google Scholar 

  58. Liu Z, Zhang Q-M, Lü L, Zhou T (2011) Link prediction in complex networks: a local naïve bayes model. EPL (Europhys Lett) 96(4):48007. http://stacks.iop.org/0295-5075/96/i=4/a=48007

    Article  Google Scholar 

  59. Schank T, Wagner D (2005) Approximating clustering coefficient and transitivity. J Graph Algorithms Appl 9:265–275

    Article  MathSciNet  MATH  Google Scholar 

  60. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30. http://dl.acm.org/citation.cfm?id=1248547.1248548

    MathSciNet  MATH  Google Scholar 

  61. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32 (200):675–701. https://doi.org/10.1080/01621459.1937.10503522

    Article  MATH  Google Scholar 

  62. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Statist 11(1):86–92. https://doi.org/10.1214/aoms/1177731944

    Article  MathSciNet  MATH  Google Scholar 

  63. Lü L, Pan L, Zhou T, Zhang Y-C, Stanley HE (2015) Toward link predictability of complex networks. Proc Natl Acad Sci 112(8):2325–2330. https://doi.org/10.1073/pnas.1424644112

    Article  MathSciNet  MATH  Google Scholar 

  64. Wang X, Sukthankar G (2013) Link prediction in multi-relational collaboration networks. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM ’13. ACM, New YorkA, pp 1445–1447, DOI https://doi.org/10.1145/2492517.2492584

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ajay Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, A., Singh, S.S., Singh, K. et al. Level-2 node clustering coefficient-based link prediction. Appl Intell 49, 2762–2779 (2019). https://doi.org/10.1007/s10489-019-01413-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01413-8

Keywords

Navigation