Skip to main content
Log in

Accelerating reachability query processing based on \(\varvec{DAG}\) reduction

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Answering reachability queries is one of the fundamental graph operations. The existing approaches build indexes and answer reachability queries on a directed acyclic graph (DAG) \(G\), which is constructed by coalescing each strongly connected component of the given directed graph \(\mathcal {G}\) into a node of \(G\). Considering that \(G\) can still be large to be processed efficiently, there are studies to further reduce \(G\) to a smaller graph. However, these approaches suffer from either inefficiency in answering reachability queries, or cannot scale to large graphs. In this paper, we study DAG reduction to accelerate reachability query processing, which reduces the size of \(G\) by computing transitive reduction (TR) followed by computing equivalence reduction (ER). For TR, we propose a bottom-up algorithm, namely buTR, which removes from \(G\) all redundant edges to get the unique smallest DAG \(G^{t}\) satisfying that \(G^{t}\) has the same transitive closure as that of \(G\). For ER, we propose a divide-and-conquer algorithm, namely linear-ER. Given the result \(G^{t}\) of TR, linear-ER gets a smaller DAG \(G^{\varepsilon }\) in linear time based on equivalence relationship between nodes in \(G\). Our DAG reduction approaches (TR and ER) significantly improve the cost of time and space and can be scaled to large graphs. Based on the result of DAG reduction, we further propose a graph decomposition-based algorithm to efficiently answer reachability queries. We confirm the efficiency of our approaches by extensive experimental studies for TR, ER, and reachability query processing using 20 real datasets. The complete source code is available for download at https://pan.baidu.com/s/1skHBXXN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. \(C_{u}\) is defined without topological levels, while \(C_{1}\) in Definition 2 is defined with topological levels. For example, for node \(v_3\) in Fig. 6, \(C_{v_3}=\{v_4, v_8, v_{13}, v_{16}\}\), while \(C_{1}=\emptyset \) for \(v_3\).

  2. The two topo-orders used in FELINE  [26] are not DT-order s, and the cost of getting the second one is \(O(|V|\log |V|+|E|)\).

  3. The value of parameter is \(k=5\).

  4. The values of parameters are \(k=2, h=2\), and \(\mu =100\).

  5. For both BFL and BFL \(^+\), \(k=2\), and \(s=32k\).

  6. https://code.google.com/archive/p/grail/downloads.

  7. http://snap.stanford.edu/data/index.html.

  8. http://www.uniprot.org/.

  9. http://pan.baidu.com/s/1bpHkFJx.

  10. http://pan.baidu.com/s/1c00Jq5E.

  11. https://code.google.com/p/ferrari-index/downloads/list.

References

  1. Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: SIGMOD, pp. 253–262 (1989)

  2. Aho, A.V., Garey, M.R., Ullman, J.D.: The transitive reduction of a directed graph. SIAM J. Comput. 1(2), 131–137 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  3. Boldi, P., Santini, M., Vigna, S.: A large time-aware web graph. SIGIR Forum 42(2), 33–38 (2008)

    Article  Google Scholar 

  4. Cha, M., Haddadi, H., Benevenuto, F., Gummadi, P.K.: Measuring user influence in twitter: the million follower fallacy. In: ICWSM (2010)

  5. Cheng, J., Huang, S., Wu, H., Fu, A.W.: TF-label: a topological-folding labeling scheme for reachability querying in a large graph. In: SIGMOD, pp. 193–204 (2013)

  6. Cohen, E.: Estimating the size of the transitive closure in linear time. In: 35th Annual Symposium on Foundations of Computer Science, pp. 190–200 (1994)

  7. Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. In: ACM-SIAM, pp. 937–946 (2002)

  8. Fan, W., Li, J., Wang, X., Wu, Y.: Query preserving graph compression. In: SIGMOD, pp. 157–168 (2012)

  9. Habib, M., Morvan, M., Rampon, J.: On the calculation of transitive reduction—closure of orders. Discrete Math. 111(1–3), 289–303 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  10. Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic twig joins on indexed XML documents. In: VLDB, pp. 273–284 (2003)

  11. Jin, R., Ruan, N., Dey, S., Yu, J.X.: SCARAB: scaling reachability computation on large graphs. In: SIGMOD, pp. 169–180 (2012)

  12. Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: an efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36(1), 7 (2011)

    Article  Google Scholar 

  13. Jin, R., Wang, G.: Simple, fast, and scalable reachability oracle. PVLDB 6(14), 1978–1989 (2013)

    Google Scholar 

  14. Jin, R., Xiang, Y., Ruan, N., Fuhry, D.: 3-hop: a high-compression indexing scheme for reachability query. In: SIGMOD, pp. 813–826 (2009)

  15. Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficiently answering reachability queries on very large directed graphs. In: SIGMOD, pp. 595–608 (2008)

  16. Katajainen, J., Träff, J.L.: A meticulous analysis of mergesort programs. In: CIAC’97, pp. 217–228 (1997)

  17. Kornaropoulos, E.M., Tollis, I.G.: Weak dominance drawings and linear extension diameter. CoRR. arXiv:1108.1439 [cs.DS] (2011)

  18. Ma, T.-H., Spinrad, J.: Transitive closure for restricted classes of partial orders. Order 8(2), 175–183 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  19. Seufert, S., Anand, A., Bedathur, S.J., Weikum, G.: FERRARI: flexible and efficient reachability range assignment for graph indexing. In: ICDE, pp. 1009–1020 (2013)

  20. Simon, K.: An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci. 58, 325–346 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  21. Su, J., Zhu, Q., Wei, H., Yu, J.X.: Reachability querying: Can it be even faster? TKDE 29(3), 683–697 (2017)

    Google Scholar 

  22. Tarjan, R.E.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  23. Trißl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD, pp. 845–856 (2007)

  24. Valdes, J., Tarjan, R.E., Lawler, E.L.: The recognition of series parallel digraphs. SIAM J. Comput. 11(2), 298–313 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  25. van Schaik, S.J., de Moor, O.: A memory efficient reachability data structure through bit vector compression. In: SIGMOD, pp. 913–924 (2011)

  26. Veloso, R.R., Cerf, L., Junior, W.M., Zaki, M.J.: Reachability queries in very large graphs: a fast refined online search approach. In: EDBT, pp. 511–522 (2014)

  27. Wei, H., Yu, J.X., Lu, C., Jin, R.: Reachability querying: an independent permutation labeling approach. PVLDB 7(12), 1191–1202 (2014)

    Google Scholar 

  28. Williams, V.V.: Multiplying matrices faster than Coppersmith–Winograd. In: STOC, pp. 887–898 (2012)

  29. Yano, Y., Akiba, T., Iwata, Y., Yoshida, Y.: Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths. In: CIKM, pp. 1601–1606 (2013)

  30. Yildirim, H., Chaoji, V., Zaki, M.J.: GRAIL: scalable reachability index for large graphs. PVLDB 3(1), 276–284 (2010)

    Google Scholar 

  31. Yildirim, H., Chaoji, V., Zaki, M.J.: GRAIL: a scalable index for reachability queries in very large graphs. VLDB J. 21(4), 509–534 (2012)

    Article  Google Scholar 

  32. Zhou, J., Zhou, S., Yu, J.X., Wei, H., Chen, Z., Tang, X.: DAG reduction: fast answering reachability queries. In: SIGMOD, pp. 375–390 (2017)

  33. Zhu, A.D., Lin, W., Wang, S., Xiao, X.: Reachability queries on large dynamic graphs: a total order approach. In: SIGMOD, pp. 1323–1334 (2014)

Download references

Acknowledgements

This work was partly supported by grants from the Natural Science Foundation of China (No. 61472339, 61303040, 61572421, 61272124), and Jeffrey Xu Yu was partly supported by the grant of the Research Grants Council of Hong Kong SAR, China, No. 14209314 and No. 14221716.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junfeng Zhou.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 838 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, J., Yu, J.X., Li, N. et al. Accelerating reachability query processing based on \(\varvec{DAG}\) reduction. The VLDB Journal 27, 271–296 (2018). https://doi.org/10.1007/s00778-018-0495-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-018-0495-8

Keywords

Navigation