Skip to main content

Advertisement

Log in

Credible seed identification for large-scale structural network alignment

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Structural network alignment utilizes the topological structure information to find correspondences between nodes of two networks. Researchers have proposed a line of useful algorithms which usually require a prior mapping of seeds acting as landmark points to align the rest nodes. Several seed-free algorithms are developed to solve the cold-start problem. However, existing approaches suffer high computational cost and low reliability, limiting their applications to large-scale network alignment. Moreover, there is a lack of useful metrics to quantify the credibility of seed mappings. To address these issues, we propose a credible seed identification framework and develop a metric to assess the reliability of a mapping. To tackle the cold-start problem, we employ graph embedding techniques to represent nodes by structural feature vectors in a latent space. We then leverage point set registration algorithms to match nodes algebraically and obtain an initial mapping of nodes. Besides, we propose a heuristic algorithm to improve the credibility of the initial mapping by filtering out mismatched node pairs. To tackle the computational problem in large-scale network alignment, we propose a divide-and-conquer scheme to divide large networks into smaller ones and then match them individually. It significantly improves the recall of mapping results. Finally, we conduct extensive experiments to evaluate the effectiveness and efficiency of our new approach. The results illustrate that the proposed method outperforms the state-of-the-art approaches in terms of both effectiveness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Aladağ AE, Erten C (2013) Spinal: scalable protein interaction network alignment. Bioinformatics 29(7):917–924

    Article  Google Scholar 

  • Backstrom L, Dwork C, Kleinberg J (2007) Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography. In: International world wide web conference. ACM, pp 181–190

  • Bayati M, Gleich DF, Saberi A, Wang Y (2013) Message-passing algorithms for sparse network alignment. ACM TKDD 7(1):1–31

    Article  Google Scholar 

  • Bayati M, Gerritsen M, Gleich DF, Saberi A, Wang Y (2009) Algorithms for large, sparse network alignment problems. In: International conference on data mining (ICDM). IEEE, pp 705–710

  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008

    Article  Google Scholar 

  • Box GE, Hunter WG, Hunter JS et al (1978) Statistics for experimenters. Wiley, New York

    MATH  Google Scholar 

  • Chen Z, Yu X, Song B, Gao J, Hu X, Yang WS (2017) Community-based network alignment for large attributed network. In: Conference on information and knowledge management (CIKM). ACM, pp 587–596

  • Chui H, Rangarajan A (2000) A feature registration framework using mixture models. In: IEEE workshop on mathematical methods in biomedical image analysis. IEEE, pp 190–197

  • Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, Holstege FC, Weissman JS, Krogan NJ (2007) Toward a comprehensive atlas of the physical interactome of saccharomyces cerevisiae. Mol Cell Proteom 6(3):439–450

    Article  Google Scholar 

  • Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(03):265–298

    Article  Google Scholar 

  • Du B, Tong H (2018) Fasten: fast Sylvester equation solver for graph mining. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 1339–1347

  • Emms D, Hancock ER, Wilson RC (2007) A correspondence measure for graph matching using the discrete quantum walk. International workshop on graph-based representations in pattern recognition. Springer, Berlin, pp 81–91

    Chapter  Google Scholar 

  • Fabiana C, Garetto M, Leonardi E (2015) De-anonymizing scale-free social networks by percolation graph matching. In: IEEE conference on computer communications. IEEE, pp 1571–1579

  • Fitzgibbon AW (2003) Robust registration of 2d and 3d point sets. Image Vis Comput 21(13):1145–1153

    Article  Google Scholar 

  • Garey MR, Johnson DS (2002) Computers and intractability, vol 29. WH Freeman, New York

    Google Scholar 

  • Gold S, Rangarajan A (1996) A graduated assignment algorithm for graph matching. IEEE Trans Pattern Anal Mach Intell 18(4):377–388

    Article  Google Scholar 

  • Gold S, Lu CP, Rangarajan A, Pappu S, Mjolsness E (1995) New algorithms for 2d and 3d point matching: pose estimation and correspondence. In: Annual conference on neural information processing systems, pp 957–964

  • Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 855–864

  • Hashemifar S, Ma J, Naveed H, Canzar S, Xu J (2016) Modulealign: module-based global alignment of protein-protein interaction networks. Bioinformatics 32(17):i658–i664

    Article  Google Scholar 

  • Heimann M, Shen H, Koutra D (2018) Node representation learning for multiple networks: the case of graph alignment. arXiv preprint arXiv:1802.06257

  • Hu W, Qu Y, Cheng G (2008) Matching large ontologies: a divide-and-conquer approach. Data Knowl Eng 67(1):140–160

    Article  Google Scholar 

  • Ji S, Li W, Srivatsa M, Beyah R (2014a) Structural data de-anonymization. In: ACM conference on computer and communications security (CCS). ACM, pp 1040–1053

  • Ji S, Li W, Srivatsa M, He JS, Beyah R (2014b) Structure based data de-anonymization of social networks and mobility traces. In: Information security. Springer, pp 237–254

  • Ji S, Li W, Mittal P, Hu X, Beyah R (2015) Secgraph: a uniform and open-source evaluation system for graph data anonymization and de-anonymization. In: USENIX security, pp 303–318

  • Ji S, Li W, Yang S, Mittal P, Beyah R (2016) On the relative de-anonymizability of graph data: quantification and evaluation. In: IEEE conference on computer communications. IEEE, pp 1–9

  • Klau GW (2009) A new graph-based method for pairwise global network alignment. Bioinformatics 10(1):S59

    Google Scholar 

  • Kollias G, Mohammadi S, Grama A (2012) Network similarity decomposition (nsd): a fast and scalable approach to network alignment. IEEE Trans Knowl Data Eng 24(12):2232–2243

    Article  Google Scholar 

  • Korula N, Lattanzi S (2014) An efficient reconciliation algorithm for social networks. In: International conference on very large data bases, VLDB endowment, pp 377–388

  • Koutra D, Tong H, Lubensky D (2013) Big-align: fast bipartite graph alignment. In: International conference on data mining (ICDM). IEEE, pp 389–398

  • Kuchaiev O, Milenković T, Memišević V, Hayes W, Pržulj N (2010) Topological network alignment uncovers biological function and phylogeny. J R Soc Interface 7(50):1341–1354

    Article  Google Scholar 

  • Leskovec J, Krevl A (2014) Snap datasets: Stanford large network dataset collection. http://snap.stanford.edu/data. Accessed 10 Apr 2020

  • Liu L, Cheung WK, Li X, Liao L (2016) Aligning users across social networks using network embedding. In: International joint conferences on artificial intelligence (IJCAI), pp 1774–1780

  • Luo B, Hancock ER (2001) Structural graph matching using the em algorithm and singular value decomposition. IEEE Trans Pattern Anal Mach Intell 23(10):1120–1136

    Article  Google Scholar 

  • Malod-Dognin N, Pržulj N (2015) L-graal: Lagrangian graphlet-based network aligner. Bioinformatics 31(13):2182–2189

    Article  Google Scholar 

  • Mamano N, Hayes W (2016) Sana: simulated annealing network alignment applied to biological networks. arXiv preprint arXiv:1607.02642

  • Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: International conference on data engineering, vol 2002. IEEE Computer Society, pp 117–128

  • Myronenko A, Song X (2010) Point set registration: coherent point drift. IEEE Trans Pattern Anal Mach Intell 32(12):2262–2275

    Article  Google Scholar 

  • Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: IEEE symposium on security and privacy. IEEE, pp 173–187

  • Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. In: International conference on very large data bases. VLDB endowment, pp 102–114

  • Nilizadeh S, Kapadia A, Ahn YY (2014) Community-enhanced de-anonymization of online social networks. In: ACM conference on computer and communications security (CCS). ACM, pp 537–548

  • Pedarsani P, Figueiredo DR, Grossglauser M (2013) A Bayesian method for matching two similar graphs without seeds. In: Allerton. IEEE, pp 1598–1607

  • Peng W, Li F, Zou X, Wu J (2014) A two-stage deanonymization attack against anonymized social networks. IEEE Trans Comput 63(2):290–303

    Article  MathSciNet  Google Scholar 

  • Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 701–710

  • Pfeiffer JJ, Neville J (2011) Methods to determine node centrality and clustering in graphs with uncertain structure. In: International AAAI conference on web and social media (ICWSM), pp 590–593

  • Pothen A, Simon HD, Liou K (2000) Partitioning sparse matrices with eigenvectors of graphs. In: International conference on very large data bases. VLDB endowment, pp 102–114

  • Ribeiro LF, Saverese PH, Figueiredo DR (2017) struc2vec: learning node representations from structural identity. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 385–394

  • Robles-Kelly A, Hancock ER (2003) Graph matching using spectral seriation. In: International workshop on energy minimization methods in computer vision and pattern recognition. Springer, pp 517–532

  • Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, vol 10. Wiley, New York

    Book  Google Scholar 

  • Rusinkiewicz S, Levoy M (2001) Efficient variants of the ICP algorithm. In: International conference on 3-D digital imaging and modeling. IEEE, pp 145–152

  • Scott J (2000) Social network analysis: a handbook. In: International conference on very large data bases. VLDB endowment, pp 102–114

  • Singh R, Xu J, Berger B (2007) Pairwise global alignment of protein interaction networks by matching neighborhood topology. In: Annual international conference on research in computational molecular biology. Springer, pp 16–31

  • Srivatsa M, Hicks M (2012) Deanonymizing mobility traces: using social network as a side-channel. In: ACM conference on computer and communications security (CCS). ACM, pp 628–637

  • Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: International world wide web conference. ACM, pp 1067–1077

  • Vijayan V, Saraph V, Milenković T (2015) Magna++: maximizing accuracy in global network alignment via both node and edge conservation. Bioinformatics 31(14):2409–2411

    Article  Google Scholar 

  • Wang C, Zhao Z, Wang Y, Qin D, Luo X, Qin T (2018) Deepmatching: a structural seed identification framework for social network alignment. In: International conference on distributed computing systems, pp 459–470

  • Wondracek G, Holz T, Kirda E, Kruegel C (2010) A practical attack to de-anonymize social network users. In: IEEE symposium on security and privacy. IEEE, pp 223–238

  • Yang Q, Sze SH (2007) Path matching and graph matching in biological networks. J Comput Biol 14(1):56–67

    Article  MathSciNet  Google Scholar 

  • Yartseva L, Grossglauser M (2013) On the performance of percolation graph matching. In: ACM conference on Online social networks. ACM, pp 119–130

  • Zaslavskiy M, Bach F, Vert JP (2009) Global alignment of protein-protein interaction networks by graph matching methods. Bioinformatics 25(12):i259–1267

    Article  Google Scholar 

  • Zhang Z (1994) Iterative point matching for registration of free-form curves and surfaces. Int J Comput Vis 13(2):119–152

    Article  Google Scholar 

  • Zhang J, Philip SY (2015) Integrated anchor and social link predictions across social networks. In: Twenty-fourth international joint conference on artificial intelligence

  • Zhang J, Yu PS (2015) Multiple anonymized social network alignment. In: International conference on data mining (ICDM). IEEE, pp 599–608

  • Zhang S, Tong H (2016) Final: fast attributed network alignment. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 1345–1354

  • Zhang Y, Tang J, Yang Z, Pei J, Yu PS (2015) Cosnet: connecting heterogeneous social networks with local and global consistency. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 1485–1494

  • Zhang S, Tong H, Tang J, Xu J, Fan W (2017) ineat: incomplete network alignment. In: International conference on data mining (ICDM). IEEE, pp 1189–1194

  • Zhang S, Tong H, Maciejewski R, Eliassi-Rad T (2019) Multilevel network alignment. In: International world wide web conference. ACM, pp 2344–2354

  • Zhou X, Liang X, Zhang H, Ma Y (2016) Cross-platform identification of anonymous identical users in multiple social media networks. IEEE Trans Knowl Data Eng 28(2):411–424

    Article  Google Scholar 

Download references

Acknowledgements

The research presented in this paper is supported in part by National Natural Science Foundation (Nos. 61602370, 61672026, 61772411, U1736205), Postdoctoral Foundation (Nos. 201659M2806, 2018T111066), Fundamental Research Funds for the Central Universities (Nos. 1191320006, PY3A022), Shaanxi Postdoctoral Foundation, Project JCYJ20170816100819428 supported by SZSTI, CCF-NSFOCUS KunPeng Research Fund (No. CCF-NSFOCUS 2018006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenxu Wang.

Additional information

Responsible editor: Shuiwang Ji.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is extended from a conference paper published in ICDCS 2018.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Wang, Y., Zhao, Z. et al. Credible seed identification for large-scale structural network alignment. Data Min Knowl Disc 34, 1744–1776 (2020). https://doi.org/10.1007/s10618-020-00699-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00699-4

Keywords

Navigation