Efficient graph similarity join for information integration on graphs

Wang, Yue; Wang, Hongzhi; Li, Jianzhong; Gao, Hong

doi:10.1007/s11704-015-4505-3

Efficient graph similarity join for information integration on graphs

Research Article
Published: 24 November 2015

Volume 10, pages 317–329, (2016)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Yue Wang¹,
Hongzhi Wang¹,
Jianzhong Li¹ &
…
Hong Gao¹

139 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient similarity join for certain graphs

Article 27 May 2019

Similarity Search in Large-Scale Graph Databases

Collective Entity Linking with Joint Subgraphs

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Zhao X, Xiao C, Lin X, Wang W. Efficient graph similarity joins with edit distance constraints. In: Proceedings of the 28th IEEE International Conference on Data Engineer. 2012, 834–845
Google Scholar
Qin J, Wang W, Lu Y, Xiao C, Lin X. Efficient exact edit similarity query processing with the asymmetric signature schemes. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 1033–1044
Chapter Google Scholar
Fan W, Li J, Ma S, Tang N, Wu Y. Graph pattern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment, 2011, 3(1): 264–275
Google Scholar
Ma S, Cao Y, FanW, Huai J, Wo T. Capturing topology in graph pattern matching. Proceedings of the VLDB Endowment, 2011, 5(4): 310–321
Article Google Scholar
Sanfeliu A, Fu K S. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 1983, 13(3): 353–362
Article MATH Google Scholar
Bunke H, Allermann G. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters, 1983, 1(4): 245–253
Article MATH Google Scholar
Gouda K, Arafa M. An improved global lower bound for graph edit similarity search. Pattern Recognition Letters, 2015 58: 8–14
Article Google Scholar
Ibragimov R. Exact and heuristic algorithms for network alignment using graph edit distance models. Dissertation for the Doctoral Degree. Fachrichtung 6.2 — Informatik, 2015
Google Scholar
Baumbach J, Guo J, Ibragimov R. Multiple graph edit distance: simultaneous topological alignment of multiple protein–protein interaction networks with an evolutionary algorithm. In: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation. 2014: 277–284
Google Scholar
Justice D, Hero A. A binary linear programming formulation of the graph edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(8): 1200–1214
Article Google Scholar
Fankhauser S, Riesen K, Bunke H. Speeding up graph edit distance computation through fast bipartite matching. In: Proceedings of the 8th International Workshop on Graph–Based Representations in Pattern Recognition. 2011, 102–111
Chapter Google Scholar
Wang G, Wang B, Yang X, G. Yu G. Efficiently indexing large sparse graphs for similarity search. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(3): 440–451
Article Google Scholar
Wang Y, Wang H, Li J, Gao H. Graph similarity join with k–hop tree indexing. In: Proceedings of the International Conference of Young Computer Scientists, Engineers and Educators. 2015, 38–47
Google Scholar
Zaki M J. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8): 1021–1035
Article Google Scholar
Gao X, Xiao B, Tao D, Li X. A survey of graph edit distance. Pattern Analysis and Applications, 2010, 13(1): 113–129
Article MathSciNet Google Scholar
Conte D, Ramel JY, Sidère N, Luqman MM, Gaüzère B, Gibert J, Brun L, Vento M. A comparison of explicit and implicit graph embedding methods for pattern recognition. In: Proceedings of the 9th International Workshop on Graph–Based Representations in Pattern Recognition. 2013, 81–90
Chapter Google Scholar
Shao Y, Cui B, Chen L, Liu M, Xie X. An efficient similarity search framework for SimRank over large dynamic graphs. Proceedings of the VLDB Endowment, 2015, 8(8): 838–849
Article Google Scholar
Shao Y, CuiM, Ma L. PAGE: a partition aware engine for parallel graph computation. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(2): 518–530
Article MathSciNet Google Scholar
Xu N, Chen L, Cui B. LogGP: a log–based dynamic graph partitioning method. Proceedings of the VLDB Endowment, 2014, 7(14): 1917–1928
Article Google Scholar
Shao Y, Chen L, Cui B. Efficient cohesive subgraphs detection in parallel. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 613–624
Google Scholar
Shao Y, Cui B, Chen L, Ma L, Yao J, Xu N. Parallel subgraph listing in a large–scale graph. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 625–636
Google Scholar
Shao Y, Yao J, Cui B, Ma L. PAGE: a partition aware graph computation engine. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 823–828
Google Scholar
Cui B, Mei H, Ooi B C. Big data: the driver for innovation in databases. National Science Review, 2014, 1 (1): 27–30
Article Google Scholar
Shang H, Lin X, Zhang Y, Yu J X, Wang W. Connected substructure similarity search. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 903–914
Chapter Google Scholar
Yan X, Yu P S, Han J. Substructure similarity search in graph databases. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005, 766–777
Chapter Google Scholar
Zhu Y, Qin L, Yu J X, Ke Y, Lin X. High efficiency and quality: large graphs matching. The VLDB Journal — The International Journal on Very Large Data Bases, 2013, 22(3): 345–368
Article Google Scholar
Williams D W, Huan J, Wang W. Graph database indexing using structured graph decomposition. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 976–985
Google Scholar
Zou L, Chen L, Özsu M T. Distance–join: pattern match query in a large graph databases. Proceedings of the VLDB Endowment, 2009, 2(1): 886–897
Article Google Scholar
Zeng Z, Tung A K, Wang J, Feng J, Zhou L. Comparing stars: on approximating graph edit distance. Proceedings of the VLDB Endowment, 2009, 2(1): 25–36
Article Google Scholar
Zheng W, Zou L, Feng Y, Chen L, Zhao D. Efficient SimRank–based similarity join over large graphs. Proceedings of the VLDB Endowment, 2013, 6(7): 493–504
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
Yue Wang, Hongzhi Wang, Jianzhong Li & Hong Gao

Authors

Yue Wang
View author publications
Search author on:PubMed Google Scholar
Hongzhi Wang
View author publications
Search author on:PubMed Google Scholar
Jianzhong Li
View author publications
Search author on:PubMed Google Scholar
Hong Gao
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Hongzhi Wang.

Additional information

Yue Wang is a master student majoring in computer science and technology in Harbin Institute of Technology, China. Her main research area is XML data management.

Hongzhi Wang is an associate professor at Harbin Institute of Technology, China. His research area is data management, including data quality, XML data management and graph management. He is a recipient of the outstanding dissertation award of CCF, Microsoft Fellow and IBM PhD Fellowship.

Jianzhong Li is a professor and doctoral supervisor at Harbin Institute of Technology, China. He is a senior member of CCF. His research interests include database, parallel computing and wireless sensor networks, etc.

Hong Gao is a professor and doctoral supervisor at Harbin Institute of Technology, China. She is a senior member of CCF. Her research interests include data management, wireless sensor networks and graph database, etc.

Electronic supplementary material

Supplementary material, approximately 357 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Wang, H., Li, J. et al. Efficient graph similarity join for information integration on graphs. Front. Comput. Sci. 10, 317–329 (2016). https://doi.org/10.1007/s11704-015-4505-3

Download citation

Received: 10 November 2014
Accepted: 02 August 2015
Published: 24 November 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s11704-015-4505-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient graph similarity join for information integration on graphs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient similarity join for certain graphs

Similarity Search in Large-Scale Graph Databases

Collective Entity Linking with Joint Subgraphs

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 357 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now