Efficient distributed subgraph similarity matching

Yuan, Ye; Wang, Guoren; Xu, Jeffery Yu; Chen, Lei

doi:10.1007/s00778-015-0381-6

Efficient distributed subgraph similarity matching

Regular Paper
Published: 07 March 2015

Volume 24, pages 369–394, (2015)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Ye Yuan¹,
Guoren Wang¹,
Jeffery Yu Xu² &
…
Lei Chen³

11k Accesses
28 Citations
Explore all metrics

Abstract

Given a query graph \(q\) and a data graph \(G\), subgraph similarity matching is to retrieve all matches of \(q\) in \(G\) with the number of missing edges bounded by a given threshold \(\epsilon \). Many works have been conducted to study the problem of subgraph similarity matching due to its ability to handle applications involved with noisy or erroneous graph data. In practice, a data graph can be extremely large, e.g., a web-scale graph containing hundreds of millions of vertices and billions of edges. The state-of-the-art approaches employ centralized algorithms to process the subgraph similarity queries, and thus, they are infeasible for such a large graph due to the limited computational power and storage space of a centralized server. To address this problem, in this paper, we investigate subgraph similarity matching for a web-scale graph deployed in a distributed environment. We propose distributed algorithms and optimization techniques that exploit the properties of subgraph similarity matching, so that we can well utilize the parallel computing power and lower the communication cost among the distributed data centers for query processing. Specifically, we first relax and decompose \(q\) into a minimum number of sub-queries. Next, we send each sub-query to conduct the exact matching in parallel. Finally, we schedule and join the exact matches to obtain final query answers. Moreover, our workload-balance strategy further speeds up the query processing. Our experimental results demonstrate the feasibility of our proposed approach in performing subgraph similarity matching over web-scale graph data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Subgraph Matching on Large RDF Graphs Using MapReduce

Article Open access 04 April 2019

Xin Wang, Lele Chai, … Yunpeng Chai

Approximate Subgraph Matching Query over Large Graph

Holistic Subgraph Search over Large Graphs

Notes

We assume \(q(v)>0\) in the following of this paper.

References

Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. In: ICDE (2013)
Aggarwal, C., Wang, H.: Managing and Mining Graph Data. Springer, Berlin (2010)
Book MATH Google Scholar
Andreev, K., Racke, H.: Balanced graph partitioning. Theory Comput. Syst. 39(6), 929–939 (2006)
Article MATH MathSciNet Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: SDM, vol. 4, pp. 442–446. SIAM (2004)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
MATH Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Fan, W., Wang, X., Wu, Y.: Performance guarantees for distributed reachability queries. In: VLDB, pp. 1304–1316 (2012)
Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010)
Article MathSciNet Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco (1979)
MATH Google Scholar
Hochbaum, D. (ed.) Approximation Algorithms for NP-Hard Problems. PWS (1997)
http://research.microsoft.com/en-us/projects/trinity/
http://www.facebook.com/press/info.php?statistics
http://www.w3.org/
http://www.worldwidewebsize.com/
Kang, U., Tsourakakis, C.E.: Pegasus: a peta-scale graph mining system implementation and observations. In: ICDM (2009)
Kwak, H., Lee, C., Park, H., Moon, S.B.: What is twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)
Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW, pp. 949–958. ACM (2012)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146. ACM (2010)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: SIGMOD (2003)
Plantenga, T.: Inexact subgraph isomorphism in mapreduce. J. Parallel Distrib. Comput. 73(2), 164–175 (2013)
Article Google Scholar
Shang, Z., Yu, J.X.: Catch the wind: graph workload balancing on cloud. In: ICDE, pp.553–564 (2013)
Shang, H., Zhu, K., Lin, X., Zhang, Y., Ichise, R.: Similarity search on supergraph containment. In: Proceedings of ICDE, pp. 637–648 (2010)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: MSST, pp. 1–10. IEEE (2010)
Srivatsa, M., Kawadia, V., Yang, S.: Distributed graph query processing in dynamic networks. In: Pervasive Computing and Communications Workshops (PERCOM Workshops), 2012 IEEE International Conference on, pp. 20–25. IEEE (2012)
Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: KDD, pp. 1222–1230. ACM (2012)
Sun, Z., Wang, H., Shao, B., Wang, H., Li, J.: Efficient subgraph matching on billion node graphs. In: VLDB (2012)
Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, Berlin (2011)
Google Scholar
Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: Proceedings of SIGMOD, pp. 766–777 (2005)
Yang, S., Yan, X., Zong, B., Khan, A.: Towards effective partition management for large graphs. In: SIGMOD, pp. 517–528 (2012)
Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient subgraph similarity search on large probabilistic graph databases. In: Proceedings of VLDB, pp. 800–811 (2012)
Yuan, Y., Wang, G., Chen, L., Wang, H.: Graph similarity search on large uncertain graph databases. VLDB J. pp. 1–26 (2014)
Yuan, Y., Wang, G., Wang, H., Chen, L.: Efficient subgraph search over large uncertain graphs. In: Proceedings of VLDB, pp. 876–886 (2011)
Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient keyword search on uncertain graph data. TKDE 25(12), 2767–2779 (2013)
Google Scholar
Zeng, Z., Tung, A.K.H., Wang, J., Zhou, L., Feng, J.: Comparing stars: on approximating graph edit distance. In: VLDB (2009)
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale rdf data. In: VLDB (2013)
Zhang, S., Yang, J., Jin, W.: Sapper: subgraph indexing and approximate matching in large graphs. In: VLDB (2010)
Zhao, P., Han, J.: On graph query optimization in large networks. Proc. VLDB Endow. 3(1–2), 340–351 (2010)
Article Google Scholar
Zhu, G., Lin, X., Zhu, K., Zhang, W., Yu, J.X.: Treespan: efficiently computing similarity all-matching. In: SIGMOD (2012)

Download references

Acknowledgments

This work is supported in part by the NSFC (Grant No. 61100024, 61332006, U1401256), the Fundamental Research Funds for the Central Universities (Grant No. N130504006), the National Basic Research Program of China (973, Grant No. 2011CB302200-G), the Research Grants Council of the Hong Kong SAR, China (Grant No. 14209314 and 418512), the NSFC (Grant No. 61328202), the Hong Kong RGC Project N HKUST637/13, the National Grand Fundamental Research 973 Program of China under Grant 2014CB340300, Microsoft Research Asia Gift Grant and Google Faculty Award 2013.

Author information

Authors and Affiliations

Northeastern University, Shenyang, China
Ye Yuan & Guoren Wang
Chinese University of Hong Kong, Shatin, Hong Kong
Jeffery Yu Xu
Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Lei Chen

Authors

Ye Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Guoren Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jeffery Yu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ye Yuan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, Y., Wang, G., Xu, J.Y. et al. Efficient distributed subgraph similarity matching. The VLDB Journal 24, 369–394 (2015). https://doi.org/10.1007/s00778-015-0381-6

Download citation

Received: 16 April 2014
Accepted: 13 February 2015
Published: 07 March 2015
Issue Date: June 2015
DOI: https://doi.org/10.1007/s00778-015-0381-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient distributed subgraph similarity matching

Abstract

Access this article

Similar content being viewed by others

Efficient Subgraph Matching on Large RDF Graphs Using MapReduce

Approximate Subgraph Matching Query over Large Graph

Holistic Subgraph Search over Large Graphs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient distributed subgraph similarity matching

Abstract

Access this article

Similar content being viewed by others

Efficient Subgraph Matching on Large RDF Graphs Using MapReduce

Approximate Subgraph Matching Query over Large Graph

Holistic Subgraph Search over Large Graphs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation