Skip to main content
Log in

Para-G: Path pattern query processing on large graphs

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

There are plentiful and diverse applications of graph data management and mining techniques in the real-world scientific research and business activities. As one of the most basic operations, uniform path pattern query processing on graph data faces three big challenges. In this paper, we deal with these challenges by the following points. Firstly, a new query language on graph, called G-Path, is presented, which focuses on complex path pattern query processing on a very large graph. Also, the design of a system called Para-G is proposed, which is based on a BSP-like model as well as MapReduce model, and can effectively handle distributed graph data operations and queries. Secondly, the implementation of Para-G on the de facto cloud platform — Hadoop — is brought forward. Based on the concept of distributed path finite state automaton, the query processing of a G-Path statement in Para-G is detailed. In addition, as the query optimization of G-Path queries, several tricks are utilized to dramatically improve the performance of query execution. Finally, extensive experiments on several graph data sets are conducted to show the usability of the G-Path query language and the effectiveness of Para-G.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

References

  • Abiteboul, S., Quass, D., McHugh, J., Widom, J., Wiener, J.L.: The lorel query language for semistructured data. Int. J. Dig. Libr. 1(1), 68–88 (1997).

    Google Scholar 

  • Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data. ACM, 253–262 (1989).

  • Bai, Y., Wang, C., Ning, Y., Wu, H., Wang, H.: G-Path: Flexible path pattern query on large graphs. Proceedings of the 22nd International Conference on World Wide WEB (Companion Volume). ACM Press, Rio de Janeiro, Brazil, 333–336 (2013).

  • Bai, Y., Wang, C., Ying, X., Wang, M., Gong, Y.: Path pattern query processing on large graphs. Proceedings of the 3rd International Workshop on Graph Databases and Social Networking (GSN. IEEE Press, Sydney, Australia, 2014 (2014).

  • Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on DAGs. Proceedings of the 31st International Conference on Very Large Data Bases, VLDB ’05. VLDB Endowment, 493–504 (2005).

  • Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. IEEE 24th International Conference on Data Engineering, 2008. ICDE 2008, 893 –902 (2008).

  • Cheng, J., Yu, J.X., Lin, X., Wang, H., Yu, P.S.: Fast computation of reachability labeling for large graphs. Advance Database Technology-EDBT 2006 pp. 961–979, 2006.

  • Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-Hop labels. SIAM J. Comput. 32(5), 1338–1355 (2003).

    Article  MathSciNet  MATH  Google Scholar 

  • Consens, M.P., Mendelzon, A.O.: GraphLog: A visual formalism for real life recursion. Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 404–416 (1990).

  • Detwiler, L.T., Suciu, D., Brinkley, J.F.: Regular paths in SPARQL: Querying the NCI thesaurus. AMIA Annual Symposium Proceedings. American Medical Informatics Association, 161 (2008).

  • Fan, W.: Graph pattern matching revised for social network analysis. Proceedings of the 15th International Conference on Database Theory. ACM, 8–21 (2012).

  • Fan, W., Li, J., Ma, S., Tang, N., Wu, Y.: Adding regular expressions to graph reachability and pattern queries. Front. Comput. Sci. 6(3), 313–338 (2012).

    MathSciNet  MATH  Google Scholar 

  • Florescu, D., Levy, A., Suciu, D.: Query containment for conjunctive queries with regular expressions. Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 139–148 (1998).

  • Giugno, R., Shasha, D.: Graphgrep: A fast and universal method for querying graphs. Proceedings of 16th International Conference on Pattern Recognition. IEEE, 112–115 (2002).

  • Han, W.S., Lee, J., Pham, M.D., Yu, J.X.: iGraph: A framework for comparisons of disk-based graph indexing techniques. Proc. VLDB Endowment. 3(1), 449–459 (2010).

    Article  Google Scholar 

  • He, H., Singh, A.K.: GraphQL: Query language and access methods for graph databases. Technical Report, Technical Report, Department of Computer Science at University of California, Santa Barbara (2007).

  • Husain, M.F., Khan, L., Kantarcioglu, M., Thuraisingham, B.: Data intensive query processing for large RDF graphs using cloud computing tools. 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD). IEEE, 1–10 (2010).

  • Jagadish, H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990).

    Article  MathSciNet  Google Scholar 

  • Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficiently answering reachability queries on very large directed graphs. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 595–608 (2008).

  • Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. Proc. Twentieth Int. Conf. Mach. Learn. 20(1), 321 (2003).

    Google Scholar 

  • Lee, W., Leung, C.K.S., Lee, J.J.H.: Mobile web navigation in digital ecosystems using rooted directed trees. IEEE Trans. Indust. Electron. (TIE). 58(6), 2154–2162 (2011).

    Article  Google Scholar 

  • Liu, Z., Wang, C., Wang, J.: Aggregate Nearest Neighbor Queries in Uncertain Graphs. World Wide WEB J. 17(1), 161–188 (2014).

    Article  Google Scholar 

  • Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. Proceedings of the 2010 International Conference on Management of Data. ACM, 135–146 (2010).

  • McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IRE Transactions on Electronic Computers, 39–47 (1960).

  • Mendelzon, A.O., Wood, P.T.: Finding regular simple paths in graph databases. SIAM J. Comput. 24(6), 1235–1258 (1995).

    Article  MathSciNet  MATH  Google Scholar 

  • Peng, Z., Wang, C.: Member promotion in social networks via skyline. World Wide WEB J. 17(4), 457–492 (2014).

    Article  Google Scholar 

  • Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C Recomm. 15 (2008).

  • Ronen, R., Shmueli, O.: SoQL: A language for querying and creating data in social networks. IEEE 25th International Conference on Data Engineering, 2009. ICDE’09. IEEE, 1595–1602 (2009).

  • Simon, K.: An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci. 58(1–3), 325–346 (1988).

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, Y., Yu, J.X., Gao, H., Pei, J., Li, J.: Mining most frequently changing component in evolving graphs. World Wide WEB J. 17(3), 351–376 (2014).

    Article  Google Scholar 

  • Zou, L., Chen, L., Özsu, M. T.: Distance-join: Pattern match query in a large graph database. Proc. VLDB Endowment. 2(1), 886–897 (2009).

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (No. 61170064, No. 61373023) and the National High Technology Research and Development Program of China (No. 2013AA013204).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaokun Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bai, Y., Wang, C. & Ying, X. Para-G: Path pattern query processing on large graphs. World Wide Web 20, 515–541 (2017). https://doi.org/10.1007/s11280-016-0401-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0401-5

Keywords

Navigation