Skip to main content

Advertisement

Log in

Using partial evaluation in holistic subgraph search

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Because of its wide application, the subgraph matching problem has been studied extensively during the past decade. However, most existing solutions assume that a data graph is a vertex/edge-labeled graph (i.e., each vertex/edge has a simple label). These solutions build structural indices by considering the vertex labels. However, some real graphs contain rich-content vertices such as user profiles in social networks and HTML pages on the World Wide Web. In this study, we consider the problem of subgraph matching using a more general scenario. We build a structural index that does not depend on any vertex content. Based on the index, we design a holistic subgraph matching algorithm that considers the query graph as a whole and finds one match at a time. In order to further improve efficiency, we propose a “partial evaluation and assembly” framework to find subgraph matches over large graphs. Last but not least, our index has light maintenance overhead. Therefore, our method can work well on dynamic graphs. Extensive experiments on real graphs show that our method outperforms the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhang S J, Li S R, Yang J. GADDI: distance index based subgraph matching in biological networks. In: Proceedings of the 12th International Conference on Extending Database Technology. 2009, 192–203

    Chapter  Google Scholar 

  2. Watts D J, Dodds P S, Newman M E J. Identity and search in social networks. Science, 2002, 296(5571): 1302–1305

    Article  Google Scholar 

  3. Stocker M, Seaborne A, Bernstein A, Kiefer C, Reynolds D. SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 595–604

    Google Scholar 

  4. Cohen E, Halperin E, Kaplan H, Zwick U. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 2003, 32(5): 1338–1355

    Article  MathSciNet  MATH  Google Scholar 

  5. Chan E P F, Lim H. Optimization and evaluation of shortest path queries. The VLDB Journal, 2007, 16(3): 343–369

    Article  Google Scholar 

  6. Jing N, Huang Y W, Rundensteiner E A. Hierarchical encoded path views for path query processing: an optimal model and its performance evaluation. IEEE Transactions on Knowledge and Data Engineering, 1998, 10(3): 409–432

    Article  Google Scholar 

  7. Cheng J F, Yu J X. On-line exact shortest distance query processing. In: Proceedings of the 12th International Conference on Extending Database Technology. 2009, 481–492

    Chapter  Google Scholar 

  8. Wang H X, He H, Yang J, Yu P S, Yu J X. Dual labeling: answering graph reachability queries in constant time. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 75

    Google Scholar 

  9. Trißl S, Leser U. Fast and practical indexing and querying of very large graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2007, 845–856

    Google Scholar 

  10. Chen Y J, Chen Y B. An efficient algorithm for answering graph reachability queries. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 893–902

    Google Scholar 

  11. Shasha D, Wang J T L, Giugno R. Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems. 2002, 39–52

    Google Scholar 

  12. Yan X F, Yu P S, Han J W. Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2004, 335–346

    Google Scholar 

  13. He H H, Singh A K. Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 405–418

    Google Scholar 

  14. Zhang S J, Hu M, Yang J. Tree Pi: a novel graph indexing method. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 966–975

    Google Scholar 

  15. Zhao P X, Yu J X, Yu P S. Graph indexing: tree + delta >= graph. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 938–949

    Google Scholar 

  16. He H H, Singh A K. Closure-tree: an index structure for graph queries. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 38

    Google Scholar 

  17. Tian Y Y, Patel J M. TALE: a tool for approximate large graph matching. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 963–972

    Google Scholar 

  18. Zhao P X, Han J W. On graph query optimization in large networks. Proceedings of the VLDB Endowment, 2010, 3(1-2): 340–351

    Article  Google Scholar 

  19. Peng P, Zou L, Chen L, Lin X M, Zhao D Y. Subgraph search over massive disk resident graphs. In: Proceedings of the 23rd International Conference on Scientific and Statistical Database Management. 2011, 312–321

    Chapter  Google Scholar 

  20. Sakr S, Elnikety S, He Y X. G-SPARQL: a hybrid engine for querying large attributed graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 335–344

    Google Scholar 

  21. Sun Z, Wang H Z, Wang H X, Shao B, Li J Z. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment, 2012, 5(9): 788–799

    Article  Google Scholar 

  22. Han W S, Lee J, Lee J H. Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 337–348

    Google Scholar 

  23. Zhu K, Zhang Y, Lin X M, Zhu G P, Wang W. NOVA: a novel and efficient framework for finding subgraph isomorphism mappings in large graphs. In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications. 2010, 140–154

    Chapter  Google Scholar 

  24. Zou L, Chen L, Özsu M T. Distance Join: pattern match query in a large graph database. Proceedings of the VLDB Endowment, 2009, 2(1): 886–897

    Article  Google Scholar 

  25. Yu J X, Zeng X G, Cheng J F. Top-k graph pattern matching over large graphs. In: Proceedings of IEEE International Conference on Data Engineering. 2013, 1033–1044

    Google Scholar 

  26. Bruno N, Koudas N, Srivastava D. Holistic twig joins: optimal XML pattern matching. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2002, 310–321

    Google Scholar 

  27. Jiang H F, Wang W, Lu H J, Yu J X. Holistic twig joins on indexed XML documents. In: Proceedings of the 29th International Conference on Very Large Data Bases. 2003, 273–284

    Google Scholar 

  28. Karypis G, Kumar V. Analysis of multilevel graph partitioning. In: Proceedings of ACM/IEEE Conference on Supercomputing. 1995, 29

    Google Scholar 

  29. Wang L, Xiao Y H, Shao B, Wang H X. How to partition a billion-node graph. In: Proceedings of the 30th International Conference on Data Engineering. 2014, 568–579

    Google Scholar 

  30. Tretyakov K, Armas-Cervantes A, García-Banuelos L, Vilo J, Dumas M. Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management. 2011, 1785–1794

    Google Scholar 

  31. Hoffart J, Suchanek F M, Berberich K, Weikum G. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence, 2013, 194: 28–61

    Article  MathSciNet  MATH  Google Scholar 

  32. Kim J, Shin H, Han W S, Hong S, Chafi H. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8(11): 1238–1249

    Article  Google Scholar 

  33. Ullmann J R. An algorithm for subgraph isomorphism. Journal of the ACM, 1976, 23(1): 31–42

    Article  MathSciNet  Google Scholar 

  34. Cordella L P, Foggia P, Sansone C, Vento M. A (sub) graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(10): 1367–1372

    Article  Google Scholar 

  35. Peng P, Zou L, Chen L, Lin X M, Zhao D Y. Answering subgraph queries over massive disk resident graphs. World Wide Web: Internet and Web Information Systems, 2016, 19(3): 417–448

    Article  Google Scholar 

  36. Lee J, Han W S, Kasperovics R, Lee J H. An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proceedings of the VLDB Endowment, 2012, 6(2): 133–144

    Article  Google Scholar 

  37. Ren X G, Wang J H. Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proceedings of the VLDB Endowment, 2015, 8(5): 617–628

    Article  Google Scholar 

  38. Afrati F N, Fotakis D, Ullman J D. Enumerating subgraph instances using Map-Reduce. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 62–73

    Google Scholar 

  39. Fan W F, Wang X, Wu Y H, Deng D. Distributed graph simulation: impossibility and possibility. Proceedings of the VLDB Endowment, 2014, 7(12): 1083–1094

    Article  Google Scholar 

  40. Fan W F, Li J Z, Ma S, Tang N, Wu Y H, Wu Y P. Graph pattern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment, 2010, 3(1): 264–275

    Article  Google Scholar 

  41. Fan W F, Wang X, Wu Y H. Diversified top-k graph pattern matching. Proceedings of the VLDB Endowment, 2013, 6(13): 1510–1521

    Article  Google Scholar 

  42. Ma S, Cao Y, Huai J P, Wo T Y. Distributed graph pattern matching. In: Proceedings of the 21st World Wide Web Conference. 2012, 949–958

    Google Scholar 

  43. Khan A, Li N, Yan X F, Guan Z Y, Chakraborty S, Tao S. Neighborhood based fast graph search in large networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 901–912

    Google Scholar 

  44. Buneman P, Cong G, Fan W F, Kementsietsidis A. Using partial evaluation in distributed query evaluation. In: Proceedings of the 32nd International Conference on Very Large Data Bases. 2006, 211–222

    Google Scholar 

  45. Cong G, Fan W F, Kementsietsidis A. Distributed query evaluation with performance guarantees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2007, 509–520

    Google Scholar 

  46. Cong G, Fan W F, Kementsietsidis A, Li J Z, Liu X M. Partial evaluation for distributed XPath query processing and beyond. ACM Transactions on Database Systems, 2012, 37(4): 32

    Article  Google Scholar 

  47. Fan W F, Wang X, Wu Y H. Performance guarantees for distributed reachability queries. Proceedings of the VLDB Endowment, 2012, 5(11): 1304–1315

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Key Research and Development Program of China (2016YFB1000603), Fundamental Research Funds for the Central Universities, the National Natural Science Foundation of China (Grant Nos. 61622201, 61472131, and 61272546), and Science and Technology Key Projects of Hunan Province (2015 TP1004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zou.

Additional information

Peng Peng received his BS and PhD degrees in computer science from Beijing Normal University, China and Peking University, China in 2009 and 2016, respectively. He is currently an assistant professor at Hunan University, China. His research interests include graph databases and distributed database systems.

Lei Zou received his BS and PhD degrees in computer science from Huazhong University of Science and Technology, China in 2003 and 2009, respectively. He is currently an associate professor at Peking University, China. His research interests include graph databases and knowledge graph data management.

Zhenqin Du was an intern student in the Institute of Computer Science and Technology of Peking University (PKU), China. He joined the research project in the ICST of PKU concerning graph data management and knowledge graph application.

Dongyan Zhao received the BS, MS, and PhD degrees from Peking University (PKU), China in 1991, 1994, and 2000, respectively. He is currently a professor at PKU. His research interests are in information processing and knowledge management, including computer networks and intelligent agents.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, P., Zou, L., Du, Z. et al. Using partial evaluation in holistic subgraph search. Front. Comput. Sci. 12, 966–983 (2018). https://doi.org/10.1007/s11704-016-5522-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-5522-6

Keywords