Skip to main content
Log in

Answering subgraph queries over massive disk resident graphs

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Due to its wide applications, subgraph query has attracted lots of attentions in database community. In this paper, we focus on subgraph query over a single large graph G, i.e., finding all embeddings of query Q in G. Different from existing feature-based approaches, we map all edges into a two-dimensional space R 2 and propose a bitmap structure to index R 2. At run time, we find a set of adjacent edge pairs (AEP) or star-style patterns (SSP) to cover Q. We develop edge join (EJ) algorithms to address both AEP and SSP subqueries. Based on the bitmap index, our method can optimize I/O and CPU cost. More importantly, our index has the linear space complexity instead of exponential complexity in feature-based approaches, which indicates that our index can scale well with respect to large data size. Furthermore, our index has light maintenance overhead, which has not been considered in most of existing work. Extensive experiments show that our method significantly outperforms existing ones in both online and offline processing with respect to query response time, index building time, index size and index maintenance overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Albert, R., Barabási, A.-L.: Statistical mechanics of complex networks. Rev. Mod. Phys., 74 (2002)

  2. Chan, E.P.F., Lim, H.: Optimization and evaluation of shortest path queries. VLDB J. 16(3) (2007)

  3. Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: ICDE (2008)

  4. Cheng, J., Ke, Y., Ng, W., Lu, A.: fg-index: Towards verification-free query processing on graph databases. In: SIGMOD (2007)

  5. Cheng, J., Yu, J.X.: On-line exact shortest distance query processing. In: EDBT (2009)

  6. Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32(5) (2003)

  7. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)

    Article  Google Scholar 

  8. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: From intractable to polynomial time. PVLDB 3(1), 264–275 (2010)

    Google Scholar 

  9. Fan, W., Li, J., Wang, X., Wu, Y.: Query preserving graph compression. In: SIGMOD Conference, pp. 157–168 (2012)

  10. Fan, W., Wang, X., Wu, Y.: Answering graph pattern queries using views. In: IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pp. 184–195 (2014)

  11. Gao, J., Zhou, C., Zhou, J., Yu, J.X.: Continuous pattern detection over billion-edge graph using distributed framework. In: IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pp. 556–567 (2014)

  12. Han, W.-S., Lee, J., Lee, J.-H.: Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD Conference, pp. 337–348 (2013)

  13. He, H., Singh, A.K.: Closure-tree: An index structure for graph queries. In: ICDE (2006)

  14. He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD Conference, pp. 405–418 (2008)

  15. http://giraph.apache.org

  16. Jiang, P.Y.H., Wang, H., Zhou, S.: Gstring: A novel approach for efficient search in graph databases. In: ICDE (2007)

  17. Jing, N., Huang, Y.-W., Rundensteiner, E.A.: Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation. IEEE Trans. Knowl. Data Eng. 10(3) (1998)

  18. Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S., Tao, S.: Neighborhood based fast graph search in large networks. In: SIGMOD Conference, pp. 901–912 (2011)

  19. Klein, K., Kriege, N., Mutzel, P.: Ct-index: Fingerprint-based graph indexing combining cycles and trees. In: ICDE, pp. 1115–1126 (2011)

  20. Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)

    Google Scholar 

  21. Murty, K.G.: Network Programming. Prentice Hall (1992)

  22. Sakr, S., Elnikety, S., He, Y.: G-sparql: a hybrid engine for querying large attributed graphs. In: CIKM, pp. 335–344 (2012)

  23. Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD Conference, pp. 505–516 (2013)

  24. Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: PODS (2002)

  25. Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)

    Google Scholar 

  26. Tian, Y., McEachin, R.C., Santos, C., States, D.J., Patel, J.M.: Saga: a subgraph matching tool for biological graphs. Bioinformatics 23(2) (2007)

  27. Tian, Y., Patel, J.M.: Tale: A tool for approximate large graph matching. In: ICDE, pp. 963–972 (2008)

  28. Tong, H., Faloutsos, C., Gallagher, B., Eliassi-Rad, T.: Fast best-effort pattern matching in large attributed graphs. In: KDD, pp. 737–746 (2007)

  29. TriSSl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD (2007)

  30. Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1) (1976)

  31. Wang, H., He, H., Yang, J., Yu, P.S., Yu, J.X.: Dual labeling: Answering graph reachability queries in constant time. In: ICDE (2006)

  32. Williams, J. H. D.W., Wang, W.: Graph database indexing using structured graph decomposition. In: ICDE (2007)

  33. Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure-based approach. In: SIGMOD (2004)

  34. Yang, S., Wu, Y., Sun, H., Yan, X.: Schemaless and structureless graph querying. PVLDB 7(7), 565–576 (2014)

    Google Scholar 

  35. Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: On approximating graph edit distance. PVLDB 2(1), 25–36 (2009)

    Google Scholar 

  36. Zhang, S., Hu, M., Yang, J.: Treepi: A novel graph indexing method. In: ICDE (2007)

  37. Zhang, S., Li, S., Yang, J.: Gaddi: distance index based subgraph matching in biological networks. In: EDBT, pp. 192–203 (2009)

  38. Zhang, S., Yang, J., Jin, W.: Sapper: Subgraph indexing and approximate matching in large graphs. PVLDB 3(1), 1185–1194 (2010)

    Google Scholar 

  39. Zhao, P., Han, J.: On graph query optimization in large networks. In: VLDB (2010)

  40. Zhao, P., Yu, J.X., Yu, P.S.: Graph indexing: Tree + delta >= graph. In: VLDB (2007)

  41. Zheng, W., Zou, L., Lian, X., Wang, D., Zhao, D.: Graph similarity search with edit distance constraint in large graph databases. In: CIKM, pp. 1595–1600 (2013)

  42. Zhu, K., Zhang, Y., Lin, X., Zhu, G., Wang, W.: Nova: A novel and efficient framework for finding subgraph isomorphism mappings in large graphs. In: DASFAA (1) (2010)

  43. Zou, L., Chen, L., Özsu, M.T.: Distancejoin: Pattern match query in a large graph database. PVLDB 2(1), 886–897 (2009)

    Google Scholar 

  44. Zou, L., Chen, L., Yu, J.X., Lu, Y.: A novel spectral coding in a large graph database. In: EDBT (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zou.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, P., Zou, L., Chen, L. et al. Answering subgraph queries over massive disk resident graphs. World Wide Web 19, 417–448 (2016). https://doi.org/10.1007/s11280-014-0322-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-014-0322-0

Keywords

Navigation