Abstract
Multiple pattern detection is needed in applications like disease analysis over gene networks, bug detection in program flow networks. This paper takes pattern detection to investigate the evaluation and optimization of multiple jobs in existing distributed graph processing frameworks. The evaluation plan for multiple pattern detection should be parallelizable and can capture and reuse the shared parts among pattern queries easily. In this paper, we design a path-based holistic plan for multiple pattern queries. Specifically, (1) we design a path-based edge-covered plan for an individual pattern. The paths in the plan can be easily captured and reused among different queries. Additionally, the evaluation plan is fully parallelizable, in which each data vertex performs necessary join operations independently during exploring graph. (2) We extend the individual plan to a holistic evaluation plan for multiple queries, whose results are equivalent to those of individual queries. The plan reduces the overall cost by finding frequent paths among queries and reusing the shared part in the holistic plan. (3) We devise various optimization strategies over the holistic plan. The experimental studies, conducted on Giraph, illustrate the high effectiveness of our holistic approaches.
Similar content being viewed by others
References
McCune, R., Weninger, T., Madey, G.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48(2), 25 (2015)
Cheng, J., Yu, J., Ding, B., Yu, P., Wang, H.: Fast graph pattern matching. In: ICDE, pp. 913–922 (2008)
Lee, J., Han, W., Kasperovics, R., Lee, J.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)
Ullmann, J.: An algorithm for subgraph isomorphism. J. ACM 23, 31–42 (1976)
Gao, J., Zhou, C., Yu, J.: Towards continuous pattern detection over evolving large graph with snapshot isolation. VLDB J. 25(2), 269–290 (2016)
Gao, J., Zhou, C., Zhou, J., Yu, J.: Continuous pattern detection over billion-edge graph using distributed framework. In: ICDE, pp. 556–567 (2014)
Sun, P.: The human drug-disease-gene network. Inf. Sci. 306, 70–80 (2015)
Nguyen, T., Nguyen, H., Pham, N., AI-Kofahi, J., Nguyen, T.: Graph-based mining of multiple object usage patterns. In: SIGSOFT, pp. 383–392 (2009)
Apache Giraph. http://incubator.apache.org/giraph/
Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
Salihoglu, S., Widom, J.: Optimizing graph algorithms on pregel-like systems. PVLDB 7, 577–588 (2014)
Wang, G., Chan, C.: Multiquery optimization in mapreduce framework. PVLDB 7(3), 145–156 (2013)
Elghandour, I., Aboulnaga, A.: Restore: reusing results of mapreduce jobs. PVLDB 5(6), 586–597 (2012)
Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: Mrshare: sharing across multiple queries in MapReduce. PVLDB 3(1), 137–150 (2010)
Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)
Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)
Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: SIGMOD, pp. 249–260 (2000)
Sellis, T., Ghosh, S.: On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng. 2(2), 262–266 (1990)
Diao, Y., Franklin, M.J.: High-performance XML filtering: an overview of YFilter. IEEE Data Eng. Bull. (DEBU) 26(1), 41–48 (2003)
Diao, Y., Rizvi, S., Franklin, M.J.: Towards an internet-scale XML dissemination service. In: VLDB, pp. 612–623 (2002)
Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for SPARQL. In: ICDE, pp. 666–677 (2012)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)
Haewoon, K., Changhyun, L., Hosung, P., Sue, M.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)
Huang, J., Abadi, D., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
He, H., Singh, A.: Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD, pp. 405–418 (2008)
Acknowledgements
This work was partially supported by NSFC under Grant Nos. 61272156 and 61572040, Shenzhen Gov Research Project No. JCYJ20151014093505032, National Key Research and Development Program No. 2016YFB1000700, and Research Grants Council of the Hong Kong SAR, China Nos. 14209314 and 418512.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gao, J., Liu, Y., Zhou, C. et al. Path-based holistic detection plan for multiple patterns in distributed graph frameworks. The VLDB Journal 26, 327–345 (2017). https://doi.org/10.1007/s00778-016-0452-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-016-0452-3