Skip to main content
Log in

Spark’s GraphX-based link prediction for social communication using triangle counting

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Link prediction in a given instance of a network topology is a crucial task for extracting and inspecting the evolution of social networks. It predicts missing links in existing community networks and new or terminating links in future systems. It also attracted much attention in many fields. In the past decade, many methodologies have been compiled to predict the suitable links in a given social network. Analyzing link prediction methods is difficult when the network is very complex due to restrictive computing cost. It is still a very challenging task to predict missing links efficiently and accurately in an incomplete complex network. Depending on the certainty, the nodes with an incredible number of normal neighbors will probably be connected. Numerous similarity indices have accomplished extensive exactness and efficiency that greatly optimized this task. To accommodate this instance, in this paper, we propose one such index, namely Clustering Coefficient Index, using triangle counting implemented on the component of Apache Spark’s GraphX methodology. The proposed index uses the property of formation of triangles in the given network topology and clustering coefficients. Experimental results show that the proposed methodology outperforms in linking the suitable communications compared to other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Adamic L, Adar E (2005) How to search a social network. Soc Netw 27(3):187–203

    Article  Google Scholar 

  • Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery. ACM, pp 36–43

  • Al Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. In: Social network data analytics. Springer, New York, pp 243–275

  • Barzel B, Barabási AL (2013) Network link prediction by global silencing of indirect correlations. Nat Biotechnol 31(8):720–725

    Article  Google Scholar 

  • Batagelj, V., & Mrvar, A. (2014). Pajek. In: Encyclopedia of Social Network Analysis and Mining, Springer, New York. pp. 1245–1256. https://doi.org/10.1007/978-1-4614-6170-8_310

  • Benchettara N, Kanawati R, Rouveirol C (2010) A supervised machine learning link prediction approach for academic collaboration recommendation. In: Proceedings of the fourth ACM conference on recommender systems. ACM, pp 253–256

  • Bu D, Zhao Y, Cai L, Xue H, Zhu X, Lu H, Li G (2003) Topological structure analysis of the protein–protein interaction network in budding yeast. Nucl Acids Res 31(9):2443–2450

    Article  Google Scholar 

  • Cannistraci CV, Alanis-Lobato G, Ravasi T (2013) From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Sci Rep 3:1613

    Article  Google Scholar 

  • Chelliah PR (2017) The hadoop ecosystem technologies and tools. In: Advances in Computers, Elsevier

  • Chen J, Geyer W, Dugan C, Muller M, Guy I (2009) Make new friends, but keep the old: recommending people on social networking sites. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 201–210

  • Clauset A, Moore C, Newman ME (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101

    Article  Google Scholar 

  • Cukier K (2010) The data deluge: businesses, governments and society are only starting to tap its vast potential. Economist 23

  • Dharavath R, Singh AK (2016) Entity resolution-based jaccard similarity coefficient for heterogeneous distributed databases. In: Proceedings of the second international conference on computer and communication technologies. Springer, New Delhi, pp 497–507

  • Diestel R (2010) Graph theory, 4th edn. Springer, Heidelberg

    Book  Google Scholar 

  • Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104

    Article  Google Scholar 

  • Facebook (NIPS) Network Dataset—KONECT (2017). http://konect.uni-koblenz.de/networks/ego-facebook. Accessed April 2017

  • Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144

    Article  Google Scholar 

  • Gantz J, Reinsel D (2011) Extracting value from chaos. IDC iview 1142(2011):1–12

    Google Scholar 

  • Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: OSDI, vol 14, pp 599–613

  • Guimerà R, Sales-Pardo M (2009) Missing and spurious interactions and the reconstruction of complex networks. Proc Natl Acad Sci 106(52):22073–22078

    Article  Google Scholar 

  • Hamsterster Friendships Network Dataset—{KONECT} (2015) http://konect.uni-koblenz.de/networks/petster-friendships-hamster. Accessed April 2017

  • Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36

    Article  Google Scholar 

  • Huynen MA, Snel B, von Mering C, Bork P (2003) Function prediction and protein networks. Curr Opin Cell Biol 15(2):191–198

    Article  Google Scholar 

  • Jaccard P (1901) Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat 37:547–579

    Google Scholar 

  • Jeh G, Widom J (2003) Scaling personalized web search. In: Proceedings of the 12th international conference on World Wide Web. ACM, pp 271–279

  • Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43

    Article  Google Scholar 

  • Krebs V (2002) Uncloaking terrorist networks. First Monday. https://doi.org/10.5210/fm.v7i4.941

    Article  Google Scholar 

  • Latora V, Marchiori M (2004) How the science of complex networks can help developing strategies against terrorism. Chaos, Solitons Fractals 20(1):69–75

    Article  Google Scholar 

  • Leicht EA, Holme P, Newman ME (2006) Vertex similarity in networks. Phys Rev E 73(2):026120

    Article  Google Scholar 

  • Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Assoc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  • Liu W, Lü L (2010) Link prediction based on local random walk. EPL (Europhys Lett) 89(5):58007

    Article  Google Scholar 

  • Liu Z, Zhang QM, Lü L, Zhou T (2011) Link prediction in complex networks: a local naïve Bayes model. EPL (Europhys Lett) 96(4):48007

    Article  Google Scholar 

  • Lorrain F, White HC (1977) Structural equivalence of individuals in social networks. Soc Netw Dev Paradig 1:67

    Google Scholar 

  • Lu LH (2012) Financial slack, board composition and the explorative and exploitative innovation behavior of firms. In: Academy of management proceedings, vol 2012, no 1, pp 1–1. Academy of Management

  • Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A Stat Mech Appl 390(6):1150–1170

    Article  Google Scholar 

  • Lü L, Jin CH, Zhou T (2009) Similarity index based on local paths for link prediction of complex networks. Phys Rev E 80(4):046122

    Article  Google Scholar 

  • Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405

    Article  Google Scholar 

  • Mohan A, Venkatesan R, Pramod KV (2017) A scalable method for link prediction in large real world networks. J Parallel Distrib Comput 109:89–101

    Article  Google Scholar 

  • Newman ME (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64(2):025102

    Article  Google Scholar 

  • Papadimitriou A, Symeonidis P, Manolopoulos Y (2012) Fast and accurate link prediction in social networking systems. J Syst Softw 85(9):2119–2132

    Article  Google Scholar 

  • Pavlov M, Ichise R (2007) Finding experts by link prediction in co-authorship networks. In: Proceedings of the 2nd international conference on finding experts on the web with semantics, vol 290, pp 42–55

  • Petersen AM, Fortunato S, Pan RK, Kaski K, Penner O, Rungi A, Riccaboni M, Stanley HE, Pammolli F (2014) Reputation and impact in academic careers. Proc Natl Acad Sci 111(43):15316–15321

    Article  Google Scholar 

  • Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3(1):88

    Article  Google Scholar 

  • Shyam R, Bharathi Ganesh HB, Kumar S, Poornachandran P, Soman KP (2015) Apache Spark a big data analytics platform for smart grid. Procedia Technol 21:171–178

    Article  Google Scholar 

  • Singh H, Bawa S (2017) A MapReduce-based scalable discovery and indexing of structured big data. Future Gen Comput Syst 73:32–43

    Article  Google Scholar 

  • Sun Y, Barber R, Gupta M, Aggarwal CC, Han J (2011) Co-author relationship prediction in heterogeneous bibliographic networks. In: International conference on advances in social networks analysis and mining (ASONAM), pp 121–128. IEEE

  • Tang J, Hu X, Liu H (2013) Social recommendation: a review. Soc Netw Anal Min 3(4):1113–1133

    Article  Google Scholar 

  • Tasgin M, Herdagdelen A, Bingol H (2007) Community detection in complex networks using genetic algorithms. arXiv preprint arXiv:0711.0491

  • Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503

  • Wang G (2013) Analysis of complex diseases: a mathematical perspective. CRC Press, Boca Raton

    Book  Google Scholar 

  • Wang D, Pedreschi D, Song C, Giannotti F, Barabasi AL (2011) Human mobility, social ties, and link prediction. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1100–1108

  • Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442

    Article  Google Scholar 

  • White JG, Southgate E, Thomson JN, Brenner S (1986) The structure of the nervous system of the nematode Caenorhabditis elegans: the mind of a worm. Philos Trans R Soc Lond 314:1–340

    Google Scholar 

  • Wu Z, Menichetti G, Rahmede C, Bianconi G (2015) Emergent complex network geometry. Sci Rep 5:10073

    Article  Google Scholar 

  • Wu Z, Lin Y, Wang J, Gregory S (2016) Link prediction with node clustering coefficient. Phys A Stat Mech Appl 452:1–8

    Article  Google Scholar 

  • Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/index.html. Accessed April 2017

  • Yuan W, He K, Guan D, Zhou L, Li C (2019) Graph kernel based link prediction for signed social networks. Inf Fusion 46:1–10

    Article  Google Scholar 

  • Zhang S, Wang RS, Zhang XS (2007) Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Phys A Stat Mech Appl 374(1):483–490

    Article  Google Scholar 

  • Zheleva E, Getoor L, Golbeck J, Kuter U (2008) Using friendship ties and family circles for link prediction. In: Advances in social network mining and analysis. Springer, Berlin, pp 97–113

  • Zhou T, Lü L, Zhang YC (2009) Predicting missing links via local information. Eur Phys J B Condens Matter Complex Syst 71(4):623–630

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Ministry of Human Resource Development, Indian Institute of Technology (ISM), Govt. of India, with the Grant Number TEQIP-III/2018. The authors would like to express their gratitude and heartiest thanks to the Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, India, for providing their research support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramesh Dharavath.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dharavath, R., Arora, N.S. Spark’s GraphX-based link prediction for social communication using triangle counting. Soc. Netw. Anal. Min. 9, 28 (2019). https://doi.org/10.1007/s13278-019-0573-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-019-0573-y

Keywords

Navigation