Skip to main content
Log in

PANDA: toward partial topology-based search on large networks in a single machine

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

A large body of research has focused on efficient and scalable processing of subgraph search queries on large networks. In these efforts, a query is posed in the form of a connected query graph. Unfortunately, in practice end users may not always have precise knowledge about the topological relationships between nodes in a query graph to formulate a connected query. In this paper, we present a novel graph querying paradigm called partial topology-based network search and propose a query processing framework called panda to efficiently process partial topology query (ptq) in a single machine. A ptq is a disconnected query graph containing multiple connected query components. ptqs allow an end user to formulate queries without demanding precise information about the complete topology of a query graph. To this end, we propose an exact and an approximate algorithm called sen-panda and po-panda, respectively, to generate top-k matches of a ptq. We also present a subgraph simulation-based optimization technique to further speedup the processing of ptqs. Using real-life networks with millions of nodes, we experimentally verify that our proposed algorithms are superior to several baseline techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. As we shall see later, our solution framework can easily handle overlapping cases by mapping it to a Steiner tree problem.

  2. http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/yeast.zip.

References

  1. Bruckner, S., Huffner, F., Karp, R.M., Shamir, R., Sharan, R.: Torque: topology-free querying of protein interaction networks. Nucl. Acids Res. 37(2), 106–108 (2009)

    Article  Google Scholar 

  2. Bruckner, S., Huffner, F., Karp, R.M., Shamir, R., Sharan, R.: Topology-free querying of protein interaction networks. J. Comput. Biol. 17(3), 237–252 (2010)

    Article  MathSciNet  Google Scholar 

  3. Buchan, N., Croson, R.: The boundaries of trust: own and others actions in the US and china. J. Econ. Behav. Organ. 55(4), 485–504 (2004)

    Article  Google Scholar 

  4. Cordella, L., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. Pattern Anal. Mach. Intell. IEEE Trans. 26(10), 1367–1372 (2004)

  5. Ding, B., Xu Yu, J., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)

  6. Duin, C., Volgenant, A., Voß, S.: Solving group steiner problems as steiner problems. Eur. J. Oper. Res. 154(1), 323–329 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  7. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. VLDB 3(1–2), 264–275 (2010)

    Google Scholar 

  8. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y.: Adding regular expressions to graph reachability and pattern queries. In: ICDE (2011)

  9. Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. In: PVLDB (2010)

  10. Fernández, M.-L., Valiente, G.: A graph distance metric combining maximum common subgraph and minimum common supergraph. Pattern Recognit. Lett. 22(6–7), 753–758 (2001)

    Article  MATH  Google Scholar 

  11. Han, W.-S., Lee, J., Lee, J.-H.: TurboISO: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD (2013)

  12. He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: ranked keyword searches on graphs. In: SIGMOD, pp. 305–316 (2007)

  13. Helvig, C.S., Robins, G., Zelikovsky, A.: An improved approximation scheme for the group steiner problem. Networks 37(1), 8–20 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  14. Henzinger, M.R., Henzinger, T., Kopke, P.: Computing simulations on finite and infinite graphs. In: FOCS (1995)

  15. Ihler, E.: Bounds on the quality of approximate solutions to the group steiner problem. In: Graph-Theoretic Concepts in Computer Science, pp. 109–118 (1991)

  16. Karp, R.M.: Reducibility Among Combinatorial Problems. Springer, Berlin (1972)

  17. Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: NeMa: fast graph search with label similarity. VLDB 6(3), 181–192 (2013)

    Google Scholar 

  18. Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection (2014)

  19. Ma, S., Cao, Y., Fan, W., Huai, J., Wo, T.: Strong simulation: Capturing topology in graph pattern matching, vol. 39. In: TODS (2014)

  20. Morsey, M., Lehmann, J., Auer, S., Ngomo, A.-C.N.: DBpedia SPARQL benchmark-performance assessment with real queries on real data. In: ISWC, volume 7031 of LNCS, pp. 454–469. Springer, Berlin (2011)

  21. Pearl, J.: Reverend Bayes on inference engines: a distributed hierarchical approach. In: AAAI (1982)

  22. Pinter, R.Y., Shachnai, H., Zehavi, M.: Partial information network queries. J. Discrete Algorithms 31, 129–145 (2015)

  23. Pinter, R.Y., Shachnai, H., Zehavi, M.: Improved parameterized algorithms for network query problems. In: Parameterized and Exact Computation, pp. 294–306. Springer (2014)

  24. Shang, H., Lin, X., Zhang, Y., Yu, J. X., Wang, W.: Connected substructure similarity search. In: SIGMOD, pp. 903–914 (2010)

  25. Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. In: PVLDB (2012)

  26. Tian, Y., Patel, J.M.: TALE: a tool for approximate large graph matching. In: ICDE, pp. 963–972 (2008)

  27. Xie, Y., Yu, P.S.: CP-index: on the efficient indexing of large graphs. In: CIKM (2011)

  28. Yang, S., Wu, Y., Sun, H., Yan, X.: Schemaless and structureless graph querying. VLDB 7(7), 565–576 (2014)

  29. Yuan, Y., Wang, G., Xu, J. Y., Chen, L.: Efficient distributed subgraph similarity matching. VLDB J. 24(3), 369–394 (2010)

  30. Zhang, S., Yang, J., Jin, W.: SAPPER: subgraph indexing and approximate matching in large graphs. VLDB 3, 1185–1194 (2010)

    Google Scholar 

  31. Zeng, Z., Tung, A. K. H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. In: VLDB (2009)

  32. Zhang, S., Li, S., Yang, J.: GADDI: distance index based subgraph matching in biological networks. In: EDBT (2009)

  33. Zhu, G., Lin, X., Zhu, K., Zhang, W., Yu, J.X.: TreeSpan: efficiently computing similarity all-matching. In: SIGMOD, pp. 529–540 (2012)

Download references

Acknowledgements

Qing Wang is supported by the National Natural Science Foundation of China under grants 61432001, 91318301, 91218302.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sourav S. Bhowmick.

Additional information

This work was primarily done when the first author was visiting Nanyang Technological University.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 28 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, M., Bhowmick, S.S., Cong, G. et al. PANDA: toward partial topology-based search on large networks in a single machine. The VLDB Journal 26, 203–228 (2017). https://doi.org/10.1007/s00778-016-0447-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-016-0447-0

Keywords

Navigation