Skip to main content
Log in

Path-based holistic detection plan for multiple patterns in distributed graph frameworks

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Multiple pattern detection is needed in applications like disease analysis over gene networks, bug detection in program flow networks. This paper takes pattern detection to investigate the evaluation and optimization of multiple jobs in existing distributed graph processing frameworks. The evaluation plan for multiple pattern detection should be parallelizable and can capture and reuse the shared parts among pattern queries easily. In this paper, we design a path-based holistic plan for multiple pattern queries. Specifically, (1) we design a path-based edge-covered plan for an individual pattern. The paths in the plan can be easily captured and reused among different queries. Additionally, the evaluation plan is fully parallelizable, in which each data vertex performs necessary join operations independently during exploring graph. (2) We extend the individual plan to a holistic evaluation plan for multiple queries, whose results are equivalent to those of individual queries. The plan reduces the overall cost by finding frequent paths among queries and reusing the shared part in the holistic plan. (3) We devise various optimization strategies over the holistic plan. The experimental studies, conducted on Giraph, illustrate the high effectiveness of our holistic approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://snap.stanford.edu/data/.

  2. http://konect.uni-koblenz.de.

References

  1. McCune, R., Weninger, T., Madey, G.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48(2), 25 (2015)

    Article  Google Scholar 

  2. Cheng, J., Yu, J., Ding, B., Yu, P., Wang, H.: Fast graph pattern matching. In: ICDE, pp. 913–922 (2008)

  3. Lee, J., Han, W., Kasperovics, R., Lee, J.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)

    Google Scholar 

  4. Ullmann, J.: An algorithm for subgraph isomorphism. J. ACM 23, 31–42 (1976)

    Article  MathSciNet  Google Scholar 

  5. Gao, J., Zhou, C., Yu, J.: Towards continuous pattern detection over evolving large graph with snapshot isolation. VLDB J. 25(2), 269–290 (2016)

  6. Gao, J., Zhou, C., Zhou, J., Yu, J.: Continuous pattern detection over billion-edge graph using distributed framework. In: ICDE, pp. 556–567 (2014)

  7. Sun, P.: The human drug-disease-gene network. Inf. Sci. 306, 70–80 (2015)

    Article  Google Scholar 

  8. Nguyen, T., Nguyen, H., Pham, N., AI-Kofahi, J., Nguyen, T.: Graph-based mining of multiple object usage patterns. In: SIGSOFT, pp. 383–392 (2009)

  9. Apache Giraph. http://incubator.apache.org/giraph/

  10. Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)

  11. Salihoglu, S., Widom, J.: Optimizing graph algorithms on pregel-like systems. PVLDB 7, 577–588 (2014)

    Google Scholar 

  12. Wang, G., Chan, C.: Multiquery optimization in mapreduce framework. PVLDB 7(3), 145–156 (2013)

    Google Scholar 

  13. Elghandour, I., Aboulnaga, A.: Restore: reusing results of mapreduce jobs. PVLDB 5(6), 586–597 (2012)

    Google Scholar 

  14. Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: Mrshare: sharing across multiple queries in MapReduce. PVLDB 3(1), 137–150 (2010)

    MATH  Google Scholar 

  15. Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)

  16. Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)

    Google Scholar 

  17. Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: SIGMOD, pp. 249–260 (2000)

  18. Sellis, T., Ghosh, S.: On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng. 2(2), 262–266 (1990)

    Article  Google Scholar 

  19. Diao, Y., Franklin, M.J.: High-performance XML filtering: an overview of YFilter. IEEE Data Eng. Bull. (DEBU) 26(1), 41–48 (2003)

    Google Scholar 

  20. Diao, Y., Rizvi, S., Franklin, M.J.: Towards an internet-scale XML dissemination service. In: VLDB, pp. 612–623 (2002)

  21. Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for SPARQL. In: ICDE, pp. 666–677 (2012)

  22. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)

    Google Scholar 

  23. Haewoon, K., Changhyun, L., Hosung, P., Sue, M.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)

  24. Huang, J., Abadi, D., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)

    Google Scholar 

  25. He, H., Singh, A.: Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD, pp. 405–418 (2008)

Download references

Acknowledgements

This work was partially supported by NSFC under Grant Nos. 61272156 and 61572040, Shenzhen Gov Research Project No. JCYJ20151014093505032, National Key Research and Development Program No. 2016YFB1000700, and Research Grants Council of the Hong Kong SAR, China Nos. 14209314 and 418512.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Gao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, J., Liu, Y., Zhou, C. et al. Path-based holistic detection plan for multiple patterns in distributed graph frameworks. The VLDB Journal 26, 327–345 (2017). https://doi.org/10.1007/s00778-016-0452-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-016-0452-3

Keywords

Navigation