Abstract
Structural network alignment utilizes the topological structure information to find correspondences between nodes of two networks. Researchers have proposed a line of useful algorithms which usually require a prior mapping of seeds acting as landmark points to align the rest nodes. Several seed-free algorithms are developed to solve the cold-start problem. However, existing approaches suffer high computational cost and low reliability, limiting their applications to large-scale network alignment. Moreover, there is a lack of useful metrics to quantify the credibility of seed mappings. To address these issues, we propose a credible seed identification framework and develop a metric to assess the reliability of a mapping. To tackle the cold-start problem, we employ graph embedding techniques to represent nodes by structural feature vectors in a latent space. We then leverage point set registration algorithms to match nodes algebraically and obtain an initial mapping of nodes. Besides, we propose a heuristic algorithm to improve the credibility of the initial mapping by filtering out mismatched node pairs. To tackle the computational problem in large-scale network alignment, we propose a divide-and-conquer scheme to divide large networks into smaller ones and then match them individually. It significantly improves the recall of mapping results. Finally, we conduct extensive experiments to evaluate the effectiveness and efficiency of our new approach. The results illustrate that the proposed method outperforms the state-of-the-art approaches in terms of both effectiveness and efficiency.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aladağ AE, Erten C (2013) Spinal: scalable protein interaction network alignment. Bioinformatics 29(7):917–924
Backstrom L, Dwork C, Kleinberg J (2007) Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography. In: International world wide web conference. ACM, pp 181–190
Bayati M, Gleich DF, Saberi A, Wang Y (2013) Message-passing algorithms for sparse network alignment. ACM TKDD 7(1):1–31
Bayati M, Gerritsen M, Gleich DF, Saberi A, Wang Y (2009) Algorithms for large, sparse network alignment problems. In: International conference on data mining (ICDM). IEEE, pp 705–710
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008
Box GE, Hunter WG, Hunter JS et al (1978) Statistics for experimenters. Wiley, New York
Chen Z, Yu X, Song B, Gao J, Hu X, Yang WS (2017) Community-based network alignment for large attributed network. In: Conference on information and knowledge management (CIKM). ACM, pp 587–596
Chui H, Rangarajan A (2000) A feature registration framework using mixture models. In: IEEE workshop on mathematical methods in biomedical image analysis. IEEE, pp 190–197
Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, Holstege FC, Weissman JS, Krogan NJ (2007) Toward a comprehensive atlas of the physical interactome of saccharomyces cerevisiae. Mol Cell Proteom 6(3):439–450
Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(03):265–298
Du B, Tong H (2018) Fasten: fast Sylvester equation solver for graph mining. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 1339–1347
Emms D, Hancock ER, Wilson RC (2007) A correspondence measure for graph matching using the discrete quantum walk. International workshop on graph-based representations in pattern recognition. Springer, Berlin, pp 81–91
Fabiana C, Garetto M, Leonardi E (2015) De-anonymizing scale-free social networks by percolation graph matching. In: IEEE conference on computer communications. IEEE, pp 1571–1579
Fitzgibbon AW (2003) Robust registration of 2d and 3d point sets. Image Vis Comput 21(13):1145–1153
Garey MR, Johnson DS (2002) Computers and intractability, vol 29. WH Freeman, New York
Gold S, Rangarajan A (1996) A graduated assignment algorithm for graph matching. IEEE Trans Pattern Anal Mach Intell 18(4):377–388
Gold S, Lu CP, Rangarajan A, Pappu S, Mjolsness E (1995) New algorithms for 2d and 3d point matching: pose estimation and correspondence. In: Annual conference on neural information processing systems, pp 957–964
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 855–864
Hashemifar S, Ma J, Naveed H, Canzar S, Xu J (2016) Modulealign: module-based global alignment of protein-protein interaction networks. Bioinformatics 32(17):i658–i664
Heimann M, Shen H, Koutra D (2018) Node representation learning for multiple networks: the case of graph alignment. arXiv preprint arXiv:1802.06257
Hu W, Qu Y, Cheng G (2008) Matching large ontologies: a divide-and-conquer approach. Data Knowl Eng 67(1):140–160
Ji S, Li W, Srivatsa M, Beyah R (2014a) Structural data de-anonymization. In: ACM conference on computer and communications security (CCS). ACM, pp 1040–1053
Ji S, Li W, Srivatsa M, He JS, Beyah R (2014b) Structure based data de-anonymization of social networks and mobility traces. In: Information security. Springer, pp 237–254
Ji S, Li W, Mittal P, Hu X, Beyah R (2015) Secgraph: a uniform and open-source evaluation system for graph data anonymization and de-anonymization. In: USENIX security, pp 303–318
Ji S, Li W, Yang S, Mittal P, Beyah R (2016) On the relative de-anonymizability of graph data: quantification and evaluation. In: IEEE conference on computer communications. IEEE, pp 1–9
Klau GW (2009) A new graph-based method for pairwise global network alignment. Bioinformatics 10(1):S59
Kollias G, Mohammadi S, Grama A (2012) Network similarity decomposition (nsd): a fast and scalable approach to network alignment. IEEE Trans Knowl Data Eng 24(12):2232–2243
Korula N, Lattanzi S (2014) An efficient reconciliation algorithm for social networks. In: International conference on very large data bases, VLDB endowment, pp 377–388
Koutra D, Tong H, Lubensky D (2013) Big-align: fast bipartite graph alignment. In: International conference on data mining (ICDM). IEEE, pp 389–398
Kuchaiev O, Milenković T, Memišević V, Hayes W, Pržulj N (2010) Topological network alignment uncovers biological function and phylogeny. J R Soc Interface 7(50):1341–1354
Leskovec J, Krevl A (2014) Snap datasets: Stanford large network dataset collection. http://snap.stanford.edu/data. Accessed 10 Apr 2020
Liu L, Cheung WK, Li X, Liao L (2016) Aligning users across social networks using network embedding. In: International joint conferences on artificial intelligence (IJCAI), pp 1774–1780
Luo B, Hancock ER (2001) Structural graph matching using the em algorithm and singular value decomposition. IEEE Trans Pattern Anal Mach Intell 23(10):1120–1136
Malod-Dognin N, Pržulj N (2015) L-graal: Lagrangian graphlet-based network aligner. Bioinformatics 31(13):2182–2189
Mamano N, Hayes W (2016) Sana: simulated annealing network alignment applied to biological networks. arXiv preprint arXiv:1607.02642
Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: International conference on data engineering, vol 2002. IEEE Computer Society, pp 117–128
Myronenko A, Song X (2010) Point set registration: coherent point drift. IEEE Trans Pattern Anal Mach Intell 32(12):2262–2275
Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: IEEE symposium on security and privacy. IEEE, pp 173–187
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. In: International conference on very large data bases. VLDB endowment, pp 102–114
Nilizadeh S, Kapadia A, Ahn YY (2014) Community-enhanced de-anonymization of online social networks. In: ACM conference on computer and communications security (CCS). ACM, pp 537–548
Pedarsani P, Figueiredo DR, Grossglauser M (2013) A Bayesian method for matching two similar graphs without seeds. In: Allerton. IEEE, pp 1598–1607
Peng W, Li F, Zou X, Wu J (2014) A two-stage deanonymization attack against anonymized social networks. IEEE Trans Comput 63(2):290–303
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 701–710
Pfeiffer JJ, Neville J (2011) Methods to determine node centrality and clustering in graphs with uncertain structure. In: International AAAI conference on web and social media (ICWSM), pp 590–593
Pothen A, Simon HD, Liou K (2000) Partitioning sparse matrices with eigenvectors of graphs. In: International conference on very large data bases. VLDB endowment, pp 102–114
Ribeiro LF, Saverese PH, Figueiredo DR (2017) struc2vec: learning node representations from structural identity. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 385–394
Robles-Kelly A, Hancock ER (2003) Graph matching using spectral seriation. In: International workshop on energy minimization methods in computer vision and pattern recognition. Springer, pp 517–532
Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, vol 10. Wiley, New York
Rusinkiewicz S, Levoy M (2001) Efficient variants of the ICP algorithm. In: International conference on 3-D digital imaging and modeling. IEEE, pp 145–152
Scott J (2000) Social network analysis: a handbook. In: International conference on very large data bases. VLDB endowment, pp 102–114
Singh R, Xu J, Berger B (2007) Pairwise global alignment of protein interaction networks by matching neighborhood topology. In: Annual international conference on research in computational molecular biology. Springer, pp 16–31
Srivatsa M, Hicks M (2012) Deanonymizing mobility traces: using social network as a side-channel. In: ACM conference on computer and communications security (CCS). ACM, pp 628–637
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: International world wide web conference. ACM, pp 1067–1077
Vijayan V, Saraph V, Milenković T (2015) Magna++: maximizing accuracy in global network alignment via both node and edge conservation. Bioinformatics 31(14):2409–2411
Wang C, Zhao Z, Wang Y, Qin D, Luo X, Qin T (2018) Deepmatching: a structural seed identification framework for social network alignment. In: International conference on distributed computing systems, pp 459–470
Wondracek G, Holz T, Kirda E, Kruegel C (2010) A practical attack to de-anonymize social network users. In: IEEE symposium on security and privacy. IEEE, pp 223–238
Yang Q, Sze SH (2007) Path matching and graph matching in biological networks. J Comput Biol 14(1):56–67
Yartseva L, Grossglauser M (2013) On the performance of percolation graph matching. In: ACM conference on Online social networks. ACM, pp 119–130
Zaslavskiy M, Bach F, Vert JP (2009) Global alignment of protein-protein interaction networks by graph matching methods. Bioinformatics 25(12):i259–1267
Zhang Z (1994) Iterative point matching for registration of free-form curves and surfaces. Int J Comput Vis 13(2):119–152
Zhang J, Philip SY (2015) Integrated anchor and social link predictions across social networks. In: Twenty-fourth international joint conference on artificial intelligence
Zhang J, Yu PS (2015) Multiple anonymized social network alignment. In: International conference on data mining (ICDM). IEEE, pp 599–608
Zhang S, Tong H (2016) Final: fast attributed network alignment. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 1345–1354
Zhang Y, Tang J, Yang Z, Pei J, Yu PS (2015) Cosnet: connecting heterogeneous social networks with local and global consistency. In: SIGKDD conference on knowledge discovery and data mining. ACM, pp 1485–1494
Zhang S, Tong H, Tang J, Xu J, Fan W (2017) ineat: incomplete network alignment. In: International conference on data mining (ICDM). IEEE, pp 1189–1194
Zhang S, Tong H, Maciejewski R, Eliassi-Rad T (2019) Multilevel network alignment. In: International world wide web conference. ACM, pp 2344–2354
Zhou X, Liang X, Zhang H, Ma Y (2016) Cross-platform identification of anonymous identical users in multiple social media networks. IEEE Trans Knowl Data Eng 28(2):411–424
Acknowledgements
The research presented in this paper is supported in part by National Natural Science Foundation (Nos. 61602370, 61672026, 61772411, U1736205), Postdoctoral Foundation (Nos. 201659M2806, 2018T111066), Fundamental Research Funds for the Central Universities (Nos. 1191320006, PY3A022), Shaanxi Postdoctoral Foundation, Project JCYJ20170816100819428 supported by SZSTI, CCF-NSFOCUS KunPeng Research Fund (No. CCF-NSFOCUS 2018006).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Shuiwang Ji.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is extended from a conference paper published in ICDCS 2018.
Rights and permissions
About this article
Cite this article
Wang, C., Wang, Y., Zhao, Z. et al. Credible seed identification for large-scale structural network alignment. Data Min Knowl Disc 34, 1744–1776 (2020). https://doi.org/10.1007/s10618-020-00699-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-020-00699-4